redis

Commit Graph

Author	SHA1	Message	Date
Yuan Wang	033abd6f57	Async IO threads (#13665 ) ## Introduction Redis introduced IO Thread in 6.0, allowing IO threads to handle client request reading, command parsing and reply writing, thereby improving performance. The current IO thread implementation has a few drawbacks. - The main thread is blocked during IO thread read/write operations and must wait for all IO threads to complete their current tasks before it can continue execution. In other words, the entire process is synchronous. This prevents the efficient utilization of multi-core CPUs for parallel processing. - When the number of clients and requests increases moderately, it causes all IO threads to reach full CPU utilization due to the busy wait mechanism used by the IO threads. This makes it challenging for us to determine which part of Redis has reached its bottleneck. - When IO threads are enabled with TLS and io-threads-do-reads, a disconnection of a connection with pending data may result in it being assigned to multiple IO threads simultaneously. This can cause race conditions and trigger assertion failures. Related issue: https://github.com/redis/redis/issues/12540 Therefore, we designed an asynchronous IO threads solution. The IO threads adopt an event-driven model, with the main thread dedicated to command processing, meanwhile, the IO threads handle client read and write operations in parallel. ## Implementation ### Overall As before, we did not change the fact that all client commands must be executed on the main thread, because Redis was originally designed to be single-threaded, and processing commands in a multi-threaded manner would inevitably introduce numerous race and synchronization issues. But now each IO thread has independent event loop, therefore, IO threads can use a multiplexing approach to handle client read and write operations, eliminating the CPU overhead caused by busy-waiting. the execution process can be briefly described as follows: the main thread assigns clients to IO threads after accepting connections, IO threads will notify the main thread when clients finish reading and parsing queries, then the main thread processes queries from IO threads and generates replies, IO threads handle writing reply to clients after receiving clients list from main thread, and then continue to handle client read and write events. ### Each IO thread has independent event loop We now assign each IO thread its own event loop. This approach eliminates the need for the main thread to perform the costly `epoll_wait` operation for handling connections (except for specific ones). Instead, the main thread processes requests from the IO threads and hands them back once completed, fully offloading read and write events to the IO threads. Additionally, all TLS operations, including handling pending data, have been moved entirely to the IO threads. This resolves the issue where io-threads-do-reads could not be used with TLS. ### Event-notified client queue To facilitate communication between the IO threads and the main thread, we designed an event-notified client queue. Each IO thread and the main thread have two such queues to store clients waiting to be processed. These queues are also integrated with the event loop to enable handling. We use pthread_mutex to ensure the safety of queue operations, as well as data visibility and ordering, and race conditions are minimized, as each IO thread and the main thread operate on independent queues, avoiding thread suspension due to lock contention. And we implemented an event notifier based on `eventfd` or `pipe` to support event-driven handling. ### Thread safety Since the main thread and IO threads can execute in parallel, we must handle data race issues carefully. client->flags The primary tasks of IO threads are reading and writing, i.e. `readQueryFromClient` and `writeToClient`. However, IO threads and the main thread may concurrently modify or access `client->flags`, leading to potential race conditions. To address this, we introduced an io-flags variable to record operations performed by IO threads, thereby avoiding race conditions on `client->flags`. Pause IO thread In the main thread, we may want to operate data of IO threads, maybe uninstall event handler, access or operate query/output buffer or resize event loop, we need a clean and safe context to do that. We pause IO thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To avoid thread suspended, we use busy waiting to confirm the target status. Besides we use atomic variable to make sure memory visibility and ordering. We introduce these functions to pause/resume IO Threads as below. ``` pauseIOThread, resumeIOThread pauseAllIOThreads, resumeAllIOThreads pauseIOThreadsRange, resumeIOThreadsRange ``` Testing has shown that `pauseIOThread` is highly efficient, allowing the main thread to execute nearly 200,000 operations per second during stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can handle up to nearly 56,000 operations per second. But operations performed between pausing and resuming IO threads must be quick; otherwise, they could cause the IO threads to reach full CPU utilization. freeClient and freeClientAsync The main thread may need to terminate a client currently running on an IO thread, for example, due to ACL rule changes, reaching the output buffer limit, or evicting a client. In such cases, we need to pause the IO thread to safely operate on the client. maxclients and maxmemory-clients updating When adjusting `maxclients`, we need to resize the event loop for all IO threads. Similarly, when modifying `maxmemory-clients`, we need to traverse all clients to calculate their memory usage. To ensure safe operations, we pause all IO threads during these adjustments. Client info reading The main thread may need to read a client’s fields to generate a descriptive string, such as for the `CLIENT LIST` command or logging purposes. In such cases, we need to pause the IO thread handling that client. If information for all clients needs to be displayed, all IO threads must be paused. Tracking redirect Redis supports the tracking feature and can even send invalidation messages to a connection with a specified ID. But the target client may be running on IO thread, directly manipulating the client’s output buffer is not thread-safe, and the IO thread may not be aware that the client requires a response. In such cases, we pause the IO thread handling the client, modify the output buffer, and install a write event handler to ensure proper handling. clientsCron In the `clientsCron` function, the main thread needs to traverse all clients to perform operations such as timeout checks, verifying whether they have reached the soft output buffer limit, resizing the output/query buffer, or updating memory usage. To safely operate on a client, the IO thread handling that client must be paused. If we were to pause the IO thread for each client individually, the efficiency would be very low. Conversely, pausing all IO threads simultaneously would be costly, especially when there are many IO threads, as clientsCron is invoked relatively frequently. To address this, we adopted a batched approach for pausing IO threads. At most, 8 IO threads are paused at a time. The operations mentioned above are only performed on clients running in the paused IO threads, significantly reducing overhead while maintaining safety. ### Observability In the current design, the main thread always assigns clients to the IO thread with the least clients. To clearly observe the number of clients handled by each IO thread, we added the new section in INFO output. The `INFO THREADS` section can show the client count for each IO thread. ``` # Threads io_thread_0:clients=0 io_thread_1:clients=2 io_thread_2:clients=2 ``` Additionally, in the `CLIENT LIST` output, we also added a field to indicate the thread to which each client is assigned. `id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name= lib-ver= io-thread=1` ## Trade-off ### Special Clients For certain special types of clients, keeping them running on IO threads would result in severe race issues that are difficult to resolve. Therefore, we chose not to offload these clients to the IO threads. For replica, monitor, subscribe, and tracking clients, main thread may directly write them a reply when conditions are met. Race issues are difficult to resolve, so we have them processed in the main thread. This includes the Lua debug clients as well, since we may operate connection directly. For blocking client, after the IO thread reads and parses a command and hands it over to the main thread, if the client is identified as a blocking type, it will be remained in the main thread. Once the blocking operation completes and the reply is generated, the client is transferred back to the IO thread to send the reply and wait for event triggers. ### Clients Eviction To support client eviction, it is necessary to update each client’s memory usage promptly during operations such as read, write, or command execution. However, when a client operates on an IO thread, it is not feasible to update the memory usage immediately due to the risk of data races. As a result, memory usage can only be updated either in the main thread while processing commands or in the `ClientsCron` periodically. The downside of this approach is that updates might experience a delay of up to one second, which could impact the precision of memory management for eviction. To avoid incorrectly evicting clients. We adopted a best-effort compensation solution, when we decide to eviction a client, we update its memory usage again before evicting, if the memory used by the client does not decrease or memory usage bucket is not changed, then we will evict it, otherwise, not evict it. However, we have not completely solved this problem. Due to the delay in memory usage updates, it may lead us to make incorrect decisions about the need to evict clients. ### Defragment In the majority of cases we do NOT use the data from argv directly in the db. 1. key names We store a copy that we allocate in the main thread, see `sdsdup()` in `dbAdd()`. 2. hash key and value We store key as hfield and store value as sds, see `hfieldNew()` and `sdsdup()` in `hashTypeSet()`. 3. other datatypes They don't even use SDS, so there is no reference issues. But in some cases client the data from argv may be retain by the main thread. As a result, during fragmentation cleanup, we need to move allocations from the IO thread’s arena to the main thread’s arena. We always allocate new memory in the main thread’s arena, but the memory released by IO threads may not yet have been reclaimed. This ultimately causes the fragmentation rate to be higher compared to creating and allocating entirely within a single thread. The following cases below will lead to memory allocated by the IO thread being kept by the main thread. 1. string related command: `append`, `getset`, `mset` and `set`. If `tryObjectEncoding()` does not change argv, we will keep it directly in the main thread, see the code in `tryObjectEncoding()`(specifically `trimStringObjectIfNeeded()`) 2. block related command. the key names will be kept in `c->db->blocking_keys`. 3. watch command the key names will be kept in `c->db->watched_keys`. 4. [s]subscribe command channel name will be kept in `serverPubSubChannels`. 5. script load command script will be kept in `server.lua_scripts`. 7. some module API: `RM_RetainString`, `RM_HoldString` Those issues will be handled in other PRs. ## Testing ### Functional Testing The commit with enabling IO Threads has passed all TCL tests, but we did some changes: Client query buffer: In the original code, when using a reusable query buffer, ownership of the query buffer would be released after the command was processed. However, with IO threads enabled, the client transitions from an IO thread to the main thread for processing. This causes the ownership release to occur earlier than the command execution. As a result, when IO threads are enabled, the client's information will never indicate that a shared query buffer is in use. Therefore, we skip the corresponding query buffer tests in this case. Defragment: Add a new defragmentation test to verify the effect of io threads on defragmentation. Command delay: For deferred clients in TCL tests, due to clients being assigned to different threads for execution, delays may occur. To address this, we introduced conditional waiting: the process proceeds to the next step only when the `client list` contains the corresponding commands. ### Sanitizer Testing The commit passed all TCL tests and reported no errors when compiled with the `fsanitizer=thread` and `fsanitizer=address` options enabled. But we made the following modifications: we suppressed the sanitizer warnings for clients with watched keys when updating `client->flags`, we think IO threads read `client->flags`, but never modify it or read the `CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so there is no actual data race. ## Others ### IO thread number In the new multi-threaded design, the main thread is primarily focused on command processing to improve performance. Typically, the main thread does not handle regular client I/O operations but is responsible for clients such as replication and tracking clients. To avoid breaking changes, we still consider the main thread as the first IO thread. When the io-threads configuration is set to a low value (e.g., 2), performance does not show a significant improvement compared to a single-threaded setup for simple commands (such as SET or GET), as the main thread does not consume much CPU for these simple operations. This results in underutilized multi-core capacity. However, for more complex commands, having a low number of IO threads may still be beneficial. Therefore, it’s important to adjust the `io-threads` based on your own performance tests. Additionally, you can clearly monitor the CPU utilization of the main thread and IO threads using `top -H -p $redis_pid`. This allows you to easily identify where the bottleneck is. If the IO thread is the bottleneck, increasing the `io-threads` will improve performance. If the main thread is the bottleneck, the overall performance can only be scaled by increasing the number of shards or replicas. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: oranagra <oran@redislabs.com>	2024-12-22 19:30:37 +08:00
Moti Cohen	2ec78d262d	Add KEYSIZES section to INFO (#13592 ) This PR adds a new section to the `INFO` command output, called `keysizes`. This section provides detailed statistics on the distribution of key sizes for each data type (strings, lists, sets, hashes and zsets) within the dataset. The distribution is tracked using a base-2 logarithmic histogram. # Motivation Currently, Redis lacks a built-in feature to track key sizes and item sizes per data type at a granular level. Understanding the distribution of key sizes is critical for monitoring memory usage and optimizing performance, particularly in large datasets. This enhancement will allow users to inspect the size distribution of keys directly from the `INFO` command, assisting with performance analysis and capacity planning. # Changes New Section in `INFO` Command: A new section called `keysizes` has been added to the `INFO` command output. This section reports a per-database, per-type histogram of key sizes. It provides insights into how many keys fall into specific size ranges (represented in powers of 2). Example output: ``` 127.0.0.1:6379> INFO keysizes # Keysizes db0_distrib_strings_sizes:1=19,2=655,512=100899,1K=31,2K=29,4K=23,8K=16,16K=3,32K=2 db0_distrib_lists_items:1=5784492,32=3558,64=1047,128=676,256=533,512=218,4K=1,8K=42 db0_distrib_sets_items:1=735564=50612,8=21462,64=1365,128=974,2K=292,4K=154,8K=89, db0_distrib_hashes_items:2=1,4=544,32=141169,64=207329,128=4349,256=136226,1K=1 ``` ## Future Use Cases: The key size distribution is collected per slot as well, laying the groundwork for future enhancements related to Redis Cluster.	2024-10-29 13:07:26 +02:00
debing.sun	d39548c854	Avoid starting defrag after config resetstat for defrag test (#13399 ) If `config resetstat` is executed and a defrag is started after it, the `total_active_defrag_time` will not be 0. When we start the defrag again, we will skip the following steps: 1. waiting for the defrag to start. (s total_active_defrag_time is equal 0) 2. waiting for the test to complete. (active_defrag_running is euqal 0) which result in the test failed. --------- Co-authored-by: oranagra <oran@redislabs.com>	2024-07-12 10:07:09 +08:00
debing.sun	7b9e960690	Hash Field Expiration (#13303 ) ## Background This PR introduces support for field-level expiration in Redis hashes. Previously, Redis supported expiration only at the key level, but this enhancement allows setting expiration times for individual fields within a hash. ## New commands * HEXPIRE * HEXPIREAT * HEXPIRETIME * HPERSIST * HPEXPIRE * HPEXPIREAT * HPEXPIRETIME * HPTTL * HTTL ## Short example from @moticless ```sh 127.0.0.1:6379> hset myhash f1 v1 f2 v2 f3 v3 (integer) 3 127.0.0.1:6379> hpexpire myhash 10000 NX fields 2 f2 f3 1) (integer) 1 2) (integer) 1 127.0.0.1:6379> hpttl myhash fields 3 f1 f2 f3 1) (integer) -1 2) (integer) 9997 3) (integer) 9997 127.0.0.1:6379> hgetall myhash 1) "f3" 2) "v3" 3) "f2" 4) "v2" 5) "f1" 6) "v1" ... after 10 seconds ... 127.0.0.1:6379> hgetall myhash 1) "f1" 2) "v1" 127.0.0.1:6379> ``` ## Expiration strategy 1. Integrate active Redis periodically performs active expiration and deletion of hash keys that contain expired fields, with a maximum attempt limit. 3. Lazy expiration When a client touches fields within a hash, Redis checks if the fields are expired. If a field is expired, it will be deleted. However, we do not delete expired fields during a traversal, we implicitly skip over them. ## RDB changes Add two new rdb type s`RDB_TYPE_HASH_METADATA` and `RDB_TYPE_HASH_LISTPACK_EX`. ## Notification 1. Add `hpersist` notification for `HPERSIST` command. 5. Add `hexpire` notification for `HEXPIRE`, `HEXPIREAT`, `HPEXPIRE` and `HPEXPIREAT` commands. ## Internal 1. Add new data structure `ebuckets`, which is used to store TTL and keys, enabling quick retrieval of keys based on TTL. 2. Add new data structure `mstr` like sds, which is used to store a string with TTL. This work was done by @moticless, @tezc, @ronen-kalish, @sundb, I just release it.	2024-05-30 15:26:19 +08:00
Moti Cohen	71676513dd	Fix commands HEXPIRE and H*TTL to include `FIELDS` constant (#13270 ) The same goes to: HPEXPIRE, HEXPIREAT, HPEXPIREAT, HEXPIRETIME, HPEXPIRETIME, HPTTL, HTTL, HPERSIST	2024-05-16 19:35:58 +03:00
debing.sun	80be2cc291	Add defragment support for HFE (#13229 ) ## Background 1. All hash objects that contain HFE are referenced by db->hexpires. 2. All fields in a dict hash object with HFE are referenced by an ebucket. So when we defrag the hash object or the field in a dict with HFE, we also need to update the references in them. ## Interface 1. Add a new interface `ebDefragItem`, which can accept a defrag callback to defrag items in ebuckets, and simultaneously update their references in the ebucket. ## Mainly changes 1. The key type of dict of hash object is no longer sds, so add new `activeDefragHfieldDict()` to defrag the dict instead of `activeDefragSdsDict()`. 2. When we defrag the dict of hash object by using `dictScanDefrag()`, we always set the defrag callback `defragKey` of `dictDefragFunctions` to NULL, because we can't reallocate a field with out updating it's reference in ebuckets. Instead, we will defrag the field of the dict and update its reference in the callback `dictScanDefrag` of dictScanFunction(). 3. When we defrag the hash robj with HFE, we will use `ebDefragItem` to defrag the robj and update the reference in db->hexpires. ## TODO: Defrag ebucket structure incremently, which will be handler in a future PR. --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Moti Cohen <moti.cohen@redis.com>	2024-05-14 17:32:33 +08:00
Binbin	804110a487	Allocate Lua VM code with jemalloc instead of libc, and count it used memory (#13133 ) ## Background 1. Currently Lua memory control does not pass through Redis's zmalloc.c. Redis maxmemory cannot limit memory problems caused by users abusing lua since these lua VM memory is not part of used_memory. 2. Since jemalloc is much better (fragmentation and speed), and also we know it and trust it. we are going to use jemalloc instead of libc to allocate the Lua VM code and count it used memory. ## Process: In this PR, we will use jemalloc in lua. 1. Create an arena for all lua vm (script and function), which is shared, in order to avoid blocking defragger. 2. Create a bound tcache for the lua VM, since the lua VM and the main thread are by default in the same tcache, and if there is no isolated tcache, lua may request memory from the tcache which has just been freed by main thread, and vice versa On the other hand, since lua vm might be release in bio thread, but tcache is not thread-safe, we need to recreate the tcache every time we recreate the lua vm. 3. Remove lua memory statistics from memory fragmentation statistics to avoid the effects of lua memory fragmentation ## Other Add the following new fields to `INFO DEBUG` (we may promote them to INFO MEMORY some day) 1. allocator_allocated_lua: total number of bytes allocated of lua arena 2. allocator_active_lua: total number of bytes in active pages allocated in lua arena 3. allocator_resident_lua: maximum number of bytes in physically resident data pages mapped in lua arena 4. allocator_frag_bytes_lua: fragment bytes in lua arena This is oranagra's idea, and i got some help from sundb. This solves the third point in #13102. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2024-04-16 12:43:33 +03:00
debing.sun	ad12730333	Implement defragmentation for pubsub kvstore (#13058 ) After #13013 ### This PR make effort to defrag the pubsub kvstore in the following ways: 1. Till now server.pubsub(shard)_channels only share channel name obj with the first subscribed client, now change it so that the clients and the pubsub kvstore share the channel name robj. This would save a lot of memory when there are many subscribers to the same channel. It also means that we only need to defrag the channel name robj in the pubsub kvstore, and then update all client references for the current channel, avoiding the need to iterate through all the clients to do the same things. 2. Refactor the code to defragment pubsub(shard) in the same way as defragment of keys and EXPIRES, with the exception that we only defragment pubsub(without shard) when slot is zero. ### Other Fix an overlook in #11695, if defragment doesn't reach the end time, we should wait for the current db's keys and expires, pubsub and pubsubshard to finish before leaving, now it's possible to exit early when the keys are defragmented. --------- Co-authored-by: oranagra <oran@redislabs.com>	2024-03-04 16:56:50 +02:00
Binbin	13bd3643c2	Re-compute active_defrag_running after adjusting defrag configurations (#13020 ) Currently, once active defrag starts, we can not adjust active_defrag_running downwards. This is because active_defrag_running will be dynamically compute based on the fragmentation, we think we should not lower the effort when the fragmentation drops. However, we need to note that active_defrag_running will also be dynamically computed based on configurations. In this case, we are not respecting cycle-min or cycle-max. Some people may realize halfway through that defrag consumes a lot and want to adjust it. Previously we could only turn off activedefrag and then turn it on again to adjust active_defrag_running downwards. So in this PR, when a active defrag configuration change is made, we will re-compute it. These configuration items are: - active-defrag-cycle-min - active-defrag-cycle-max - active-defrag-threshold-upper	2024-02-06 13:39:07 +02:00
Roshan Khatri	15a048d4f0	re-enable defrag tests in cluster mode. (#12710 ) Reverts the skipping defrag tests in cluster mode (done in #12672. instead it skips only some defrag tests that are relevant for cluster modes. The test now run well after investigating and making the changes in #12674 and #12694. Co-authored-by: Oran Agra <oran@redislabs.com>	2023-11-02 13:55:48 +02:00
Harkrishn Patro	26eb4ce397	Fix defrag test (#12674 ) Fixing issues started after #11695 when the defrag tests are being executed in cluster mode too. For some reason, it looks like the defragmentation is over too quickly, before the test is able to detect that it's running. so now instead of waiting to see that it's active, we wait to see that it did some work ``` [err]: Active defrag big list: cluster in tests/unit/memefficiency.tcl defrag not started. [err]: Active defrag big keys: cluster in tests/unit/memefficiency.tcl defrag didn't stop. ```	2023-10-22 11:56:45 +03:00
Harkrishn Patro	becd50d0da	Disable flaky defrag tests affecting daily run (#12672 ) Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable: Active defrag eval scripts Active defrag big keys Active defrag big list Active defrag edge case	2023-10-19 21:12:58 +03:00
Vitaly	0270abda82	Replace cluster metadata with slot specific dictionaries (#11695 ) This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot. Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data. ## Important changes * Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms. * getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time. * Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree. * scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue. * Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. * Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading. * DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well. ## Performance This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load. ## Interface changes * Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS` * Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored. * New RDB version to support the new op code for SLOT information. --------- Co-authored-by: Vitaly Arbuzov <arvit@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-10-14 23:58:26 -07:00
Binbin	96e9dec419	Bump codespell from 2.2.4 to 2.2.5 (#12557 ) and adjustments.	2023-09-08 16:10:17 +03:00
Binbin	bfe50a30ed	Increase the threshold of the AOF loading defrag test (#11871 ) This test is very sensitive and fragile. It often fails in Daily, in most cases, it failed in test-ubuntu-32bit (the AOF loading one), with the range in (31, 40): ``` [err]: Active defrag in tests/unit/memefficiency.tcl Expected 38 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test) ``` The AOF loading part isn't tightly fixed to the cron hz. It calls processEventsWhileBlocked once in every 1024 command calls. ``` /* Serve the clients from time to time */ if (!(loops++ % 1024)) { off_t progress_delta = ftello(fp) - last_progress_report_size; loadingIncrProgress(progress_delta); last_progress_report_size += progress_delta; processEventsWhileBlocked(); processModuleLoadingProgressEvent(1); } ``` In this case, we can either decrease the 1024 or increase the threshold of just the AOF part of that test. Considering the test machines are sometimes slow, and all sort of quirks could happen (which do not indicate a bug), and we've already set to 30, we suppose we can set it a little bit higher, set it to 40. We can have this instead of adding another testing config (we can add it when we really need it). Fixes #11868	2023-03-04 12:54:36 +02:00
Oran Agra	528bb11d7a	Solve issues with active defrag test failing on fast machines (#11598 ) We do defrag during AOF loading, but aim to detect fragmentation only once a second, so this test aims to slow down the AOF loading and mimic loading of a large file. On fast machines the sleep, plus the actual work we did was insufficient making it sleep longer so the test won't fail. The error we used to get is this one: Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)	2022-12-09 13:33:38 +02:00
蔡相跃	24da71e507	Fix typo "the the" (#10399 )	2022-03-09 13:55:17 +02:00
yoav-steinberg	b59bb9b476	Fix script active defrag test (#10318 ) This includes two fixes: * We forgot to count non-key reallocs in defragmentation stats. * Fix the script defrag tests so to make dict entries less signigicant in fragmentation by making the scripts larger. This assures active defrage will complete and reach desired results. Some inherent fragmentation might exists in dict entries which we need to ignore. This lead to occasional CI failures.	2022-02-21 09:37:25 +02:00
yoav-steinberg	2eb9b19612	Fix Eval scripts defrag (broken 7.0 in RC1) (#10271 ) Remove scripts defragger since it was broken since #10126 (released in 7.0 RC1). would crash the server if defragger starts in a server that contains eval scripts. In #10126 the global `lua_script` dict became a dict to a custom `luaScript` struct with an internal `robj` in it instead of a generic `sds` -> `robj` dict. This means we need custom code to defrag it and since scripts should never really cause much fragmentation it makes more sense to simply remove the defrag code for scripts.	2022-02-11 21:58:05 +02:00
Oran Agra	6add1b7217	Add external test that runs without debug command (#9964 ) - add needs:debug flag for some tests - disable "save" in external tests (speedup?) - use debug_digest proc instead of debug command directly so it can be skipped - use OBJECT ENCODING instead of DEBUG OBJECT to get encoding - add a proc for OBJECT REFCOUNT so it can be skipped - move a bunch of tests in latency_monitor tests to happen later so that latency monitor has some values in it - add missing close_replication_stream calls - make sure to close the temp client if DEBUG LOG fails	2021-12-19 17:41:51 +02:00
Oran Agra	d4e7ffb38c	Improve active defrag in jemalloc 5.2 (#9778 ) Background: Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and started failing consistently (on 32bit), so we disabled it (see #9645). This is a test that i introduced in #7289 when i attempted to solve a rare stagnation problem, and it later turned out i failed to solve it, ans what's more i added a test that caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit. Stagnation can happen when all the slabs of the bin are equally utilized, so the decision to move an allocation from a relatively empty slab to a relatively full one, will never happen, and in that test all the slabs are at 50% utilization, so the defragger could just keep scanning the keyspace and not move anything. What this PR changes: * First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare the utilization of the current slab, we can compare it to the average utilization of the non-full slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game, since they're not candidates for migration (neither source nor target). * Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part that aims to avoid stagnation, and it's especially important since the above mentioned change can get us closer to stagnation. * Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something that's missing from the original PR that merged it), this isn't expected to make any difference since anyway there should be just one shard. How this was benchmarked. What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits and misses the tests reported. At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation was achieved with fewer hits (relocations), and fewer misses (keyspace scans).	2021-11-21 13:35:39 +02:00
menwen	d5ca72e38b	fix defrag test looking at the wrong latency metric (#9723 ) the latency event was renamed in #7726, and the outcome was that the test was ineffective (unable to measure the max latency, always seeing 0)	2021-11-02 15:52:56 +02:00
yoav-steinberg	81095b1bd9	Skip Active-defrag edge case test until we fix it. (#9645 ) Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (#9623).	2021-10-18 13:28:52 +03:00
Oran Agra	1e7ad894d2	Tune timeout of active defrag test (#9426 ) Failed on Raspberry Pi 3b where that single test took about 170 seconds	2021-08-30 12:39:09 +03:00
Binbin	0bfccc55e2	Fixed some typos, add a spell check ci and others minor fix (#8890 ) This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes. This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything, but at least it is also unlikely to cause false positives. Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use. Here's a summary of other changes: 1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces). 2. Outdated function / variable / argument names in comments 3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString. 4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751 5. Some outdated https link URLs. 6. Fix some outdated comment. Such as: - In README: about the rdb, we used to said create a `thread`, change to `process` - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey) - notifyKeyspaceEvent fucntion comment (add type arg) - Some others minor fix in comment (Most of them are incorrectly quoted by variable names) 7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`	2021-06-10 15:39:33 +03:00
Yossi Gottlieb	8a86bca5ed	Improve test suite to handle external servers better. (#9033 ) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.	2021-06-09 15:13:24 +03:00
Oran Agra	5843a45d01	Skip defrag tests on systems with bigger page sizes (#8294 ) The defragger works well on these systems, but the tests and their thresholds are not adjusted for these big pages, so the defragger isn't able to get down the fragmentation to the levels the test expects and it fails on "defrag didn't stop". Randomly choosing 8k as the threshold for the skipping Fixes #8265 (which had 65k pages)	2021-01-08 10:03:21 +02:00
Oran Agra	7d9b09adaa	Tests: fix new defrag test to be skipped when not supported (#8185 ) Additionally the older defrag tests are using an obsolete way to check if the defragger is suuported (the error no longer contains "DISABLED"). this doesn't usually makes a difference since these tests are completely skipped if the allocator is not jemalloc, but that would fail if the allocator is a jemalloc that doesn't support defrag.	2020-12-14 11:13:46 +02:00
Yossi Gottlieb	2faa0f19eb	Fix test failure on slower systems. Not disabling save, slower systems begun background save that did not complete in time, resulting with SAVE failing with "ERR Background save already in progress".	2020-11-04 21:43:55 +02:00
Yossi Gottlieb	843a13e88f	Add a --no-latency tests flag. (#7939 ) Useful for running tests on systems which may be way slower than usual.	2020-10-22 11:10:53 +03:00
Oran Agra	9ef8d2f671	Run active defrag while blocked / loading (#7726 ) During long running scripts or loading RDB/AOF, we may need to do some defragging. Since processEventsWhileBlocked is called periodically at unknown intervals, and many cron jobs either depend on run_with_period (including active defrag), or rely on being called at server.hz rate (i.e. active defrag knows ho much time to run by looking at server.hz), the whileBlockedCron may have to run a loop triggering the cron jobs in it (currently only active defrag) several times. Other changes: - Adding a test for defrag during aof loading. - Changing key-load-delay config to take negative values for fractions of a microsecond sleep	2020-09-03 08:47:29 +03:00
Oran Agra	88d71f4793	fix a rare active defrag edge case bug leading to stagnation There's a rare case which leads to stagnation in the defragger, causing it to keep scanning the keyspace and do nothing (not moving any allocation), this happens when all the allocator slabs of a certain bin have the same % utilization, but the slab from which new allocations are made have a lower utilization. this commit fixes it by removing the current slab from the overall average utilization of the bin, and also eliminate any precision loss in the utilization calculation and move the decision about the defrag to reside inside jemalloc. and also add a test that consistently reproduce this issue.	2020-05-20 16:04:42 +03:00
Oran Agra	b9fa42a197	testsuite run the defrag latency test solo this test is time sensitive and it sometimes fail to pass below the latency threshold, even on strong machines. this test was the reson we're running just 2 parallel tests in the github actions CI, revering this.	2020-04-16 18:09:22 +03:00
Oran Agra	2f1a1c3835	fix github actions failing latency test for active defrag - part 2 it seems that running two clients at a time is ok too, resuces action time from 20 minutes to 10. we'll use this for now, and if one day it won't be enough we'll have to run just the sensitive tests one by one separately from the others. this commit also fixes an issue with the defrag test that appears to be very rare.	2020-02-27 08:34:53 +02:00
Oran Agra	537893420b	fix github actions failing latency test for active defrag seems that github actions are slow, using just one client to reduce false positives. also adding verbose, testing only on latest ubuntu, and building on older one. when doing that, i can reduce the test threshold back to something saner	2020-02-25 17:53:23 +02:00
Oran Agra	62adabd0e0	Fix latency sensitivity of new defrag test I saw that the new defag test for list was failing in CI recently, so i reduce it's threshold from 12 to 60. besides that, i add / improve the latency test for that other two defrag tests (add a sensitive latency and digest / save checks) and fix bad usage of debug populate (can't overrides existing keys). this was the original intention, which creates higher fragmentation.	2020-02-23 13:05:52 +02:00
Oran Agra	485425cec7	Defrag big lists in portions to avoid latency and freeze When active defrag kicks in and finds a big list, it will create a bookmark to a node so that it is able to resume iteration from that node later. The quicklist manages that bookmark, and updates it in case that node is deleted. This will increase memory usage only on lists of over 1000 (see active-defrag-max-scan-fields) quicklist nodes (1000 ziplists, not 1000 items) by 16 bytes. In 32 bit build, this change reduces the maximum effective config of list-compress-depth and list-max-ziplist-size (from 32767 to 8191)	2020-02-18 17:22:32 +02:00
Oran Agra	d0850369c4	fix small test suite race conditions	2018-11-12 10:26:10 +02:00
Oran Agra	c8452ab005	Fix unstable tests on slow machines. Few tests had borderline thresholds that were adjusted. The slave buffers test had two issues, preventing the slave buffer from growing: 1) the slave didn't necessarily go to sleep on time, or woke up too early, now using SIGSTOP to make sure it goes to sleep exactly when we want. 2) the master disconnected the slave on timeout	2018-08-21 11:46:07 +03:00
Oran Agra	f89c93c8ad	make active defrag test more stable on slower machines, the active defrag test tended to fail. although the fragmentation ratio was below the treshold, the defragger was still in the middle of a scan cycle. this commit changes: - the defragger uses the current fragmentation state, rather than the cache one that is updated by server cron every 100ms. this actually fixes a bug of starting one excess scan cycle - the test lets the defragger use more CPU cycles, in hope that the defrag will be faster, but also give it more time before we give up.	2018-07-18 10:16:33 +03:00
Oran Agra	de495ee7ab	minor fix in creating a stream NACK for rdb and defrag tests	2018-06-27 15:34:17 +03:00
Oran Agra	5616d4c603	add active defrag support for streams	2018-06-27 15:00:41 +03:00
antirez	98d5d3f118	Make active defragmentation tests optional. They failed when active defrag could not be activated because the Jemalloc version does not include the additional APIs.	2018-05-24 18:04:21 +02:00
Oran Agra	ad133e1023	Active defrag fixes for 32bit builds problems fixed: * failing to read fragmentation information from jemalloc * overflow in jemalloc fragmentation hint to the defragger * test suite not triggering eviction after population	2018-05-17 09:52:00 +03:00
Oran Agra	806736cdf9	Adding real allocator fragmentation to INFO and MEMORY command + active defrag test other fixes / improvements: - LUA script memory isn't taken from zmalloc (taken from libc malloc) so it can cause high fragmentation ratio to be displayed (which is false) - there was a problem with "fragmentation" info being calculated from RSS and used_memory sampled at different times (now sampling them together) other details: - adding a few more allocator info fields to INFO and MEMORY commands - improve defrag test to measure defrag latency of big keys - increasing the accuracy of the defrag test (by looking at real grag info) this way we can use an even lower threshold and still avoid false positives - keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things - add these the MEMORY DOCTOR command - deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used") - reduce sampling rate of the rss and allocator info	2018-03-12 15:08:52 +02:00
antirez	c861e1e1ee	Defrag: test currently disabled, too many false positives. Related to #3786.	2017-04-22 15:59:57 +02:00
antirez	a17390853d	Defrag: fix test false positive. Apparently 1.4 is too low compared to what you get in certain setups (including mine). I raised it to 1.55 that hopefully is still enough to test that the fragmentation went down from 1.7 but without incurring in issues, however the test setup may be still fragile so certain times this may lead to false positives again, it's hard to test for these things in a determinsitic way. Related to #3786.	2017-04-22 13:21:41 +02:00
oranagra	0fb5c4ebd8	add test for active defrag	2017-04-22 13:17:09 +02:00
antirez	5e3dcc522b	Faster memory efficiency test. This test on Linux was extremely slow, since in Tcl we can't enable easily tcp-nodelay, so the busy loop used to take a lot with bigger writes. Fixed using pipelining.	2015-02-10 14:47:45 +01:00
antirez	fcebd9b0f9	Fix false positive in memory efficiency test. Fixes issue #1298.	2013-11-25 10:21:46 +01:00

1 2

51 Commits