redis

Commit Graph

Author	SHA1	Message	Date
Meir Shpilraien (Spielrein)	885f6b5ceb	Redis Function Libraries (#10004 ) # Redis Function Libraries This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906. Libraries purpose is to provide a better code sharing between functions by allowing to create multiple functions in a single command. Functions that were created together can safely share code between each other without worrying about compatibility issues and versioning. Creating a new library is done using 'FUNCTION LOAD' command (full API is described below) This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library: * name - name of the library * engine - engine used to create the library * code - library code * description - library description * functions - the functions exposed by the library When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo. Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo. As a result, the engine will create one or more functions by calling 'libraryCreateFunction'. The new funcion will be added to the newly created libraryInfo. So far Everything is happening locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply freeing the libraryInfo. After the library info is fully constructed we start the joining phase by which we will join the new library to the other libraries currently exist on Redis. The joining phase make sure there is no function collision and add the library to the librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact same way as functionCtx was used (with respect to RDB loading, replicatio, ...). The only difference is that apart from function dictionary (maps function name to functionInfo object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object. ## New API ### FUNCTION LOAD `FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>` Create a new library with the given parameters: * ENGINE - REPLACE Engine name to use to create the library. * LIBRARY NAME - The new library name. * REPLACE - If the library already exists, replace it. * DESCRIPTION - Library description. * CODE - Library code. Return "OK" on success, or error on the following cases: * Library name already taken and REPLACE was not used * Name collision with another existing library (even if replace was uses) * Library registration failed by the engine (usually compilation error) ## Changed API ### FUNCTION LIST `FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]` Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries. ### INFO MEMORY Added number of libraries to `INFO MEMORY` ### Commands flags `DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands as commands that add new data to the dateset (functions are data) and so we want to disallows to run those commands on OOM. ## Removed API * FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906 * FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899 ## Lua engine changes When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call this run the loading run. Loading run is not a usual script run, it is not possible to invoke any Redis command from within the load run. Instead there is a new API provided by `library` object. The new API's: * `redis.log` - behave the same as `redis.log` * `redis.register_function` - register a new function to the library The loading run purpose is to register functions using the new `redis.register_function` API. Any attempt to use any other API will result in an error. In addition, the load run is has a time limit of 500ms, error is raise on timeout and the entire operation is aborted. ### `redis.register_function` `redis.register_function(<function_name>, <callback>, [<description>])` This new API allows users to register a new function that will be linked to the newly created library. This API can only be called during the load run (see definition above). Any attempt to use it outside of the load run will result in an error. The parameters pass to the API are: * function_name - Function name (must be a Lua string) * callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro * description - Function description, optional (must be a Lua string). ### Example The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively: ``` local function f1(keys, args) return 1 end local function f2(keys, args) return 2 end redis.register_function('f1', f1) redis.register_function('f2', f2) ``` Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the functions and not as global. ### Technical Details On the load run we only want the user to be able to call a white list on API's. This way, in the future, if new API's will be added, the new API's will not be available to the load run unless specifically added to this white list. We put the while list on the `library` object and make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set the `globals` of a function (and all the function it creates). Before starting the load run we create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure to set global protection on this table just like the general global protection already exists today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) to set `g` as the global table of the load run. After the load run finished we update `g` metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals), we also pop out the `library` object as we do not need it anymore. This way, any function that was created on the load run (and will be invoke using `fcall`) will see the default globals as it expected to see them and will not have the `library` API anymore. An important outcome of this new approach is that now we can achieve a distinct global table for each library (it is not yet like that but it is very easy to achieve it now). In the future we can decide to remove global protection because global on different libraries will not collide or we can chose to give different API to different libraries base on some configuration or input. Notice that this technique was meant to prevent errors and was not meant to prevent malicious user from exploit it. For example, the load run can still save the `library` object on some local variable and then using in `fcall` context. To prevent such a malicious use, the C code also make sure it is running in the right context and if not raise an error.	2022-01-06 13:39:38 +02:00
sundb	4d3c4cfac7	Show the elapsed time of single test and speed up some tests (#10058 ) Following #10038. This PR introduces two changes. 1. Show the elapsed time of a single test in the test output, in order to have a more detailed understanding of the changes in test run time. 2. Speedup two tests related to `key-load-delay` configuration. other tests do not seem to be affected by #10003.	2022-01-05 13:49:01 +02:00
chenyang8094	87789fae0b	Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788 ) Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-03 19:14:13 +02:00
Binbin	b8ba942ac2	Add DUMP RESTORE tests for redis-cli -x and -X options (#10041 ) This commit adds DUMP RESTORES tests for the -x and -X options. I wanted to add it in #9980 which introduce the -X option, but back then i failed due to some errors (related to redis-cli call).	2022-01-02 13:58:22 +02:00
Viktor Söderqvist	45a155bd0f	Wait for replicas when shutting down (#9872 ) To avoid data loss, this commit adds a grace period for lagging replicas to catch up the replication offset. Done: * Wait for replicas when shutdown is triggered by SIGTERM and SIGINT. * Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients to call SHUTDOWN in parallel. Note that they don't expect a response unless an error happens and shutdown is aborted. * Log warning for each replica lagging behind when finishing shutdown. * CLIENT_PAUSE_WRITE while waiting for replicas. * Configurable grace period 'shutdown-timeout' in seconds (default 10). * New flags for the SHUTDOWN command: - NOW disables the grace period for lagging replicas. - FORCE ignores errors writing the RDB or AOF files which would normally prevent a shutdown. - ABORT cancels ongoing shutdown. Can't be combined with other flags. * New field in the output of the INFO command: 'shutdown_in_milliseconds'. The value is the remaining maximum time to wait for lagging replicas before finishing the shutdown. This field is present in the Server section only during shutdown. Not directly related: * When shutting down, if there is an AOF saving child, it is killed even if AOF is disabled. This can happen if BGREWRITEAOF is used when AOF is off. * Client pause now has end time and type (WRITE or ALL) per purpose. The different pause purposes are CLIENT PAUSE command, failover and shutdown. If clients are unpaused for one purpose, it doesn't affect client pause for other purposes. For example, the CLIENT UNPAUSE command doesn't affect client pause initiated by the failover or shutdown procedures. A completed failover or a failed shutdown doesn't unpause clients paused by the CLIENT PAUSE command. Notes: * DEBUG RESTART doesn't wait for replicas. * We already have a warning logged when a replica disconnects. This means that if any replica connection is lost during the shutdown, it is either logged as disconnected or as lagging at the time of exit. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:50:15 +02:00
yoav-steinberg	1bf6d6f11e	Generate RDB with Functions only via redis-cli --functions-rdb (#9968 ) This is needed in order to ease the deployment of functions for ephemeral cases, where user needs to spin up a server with functions pre-loaded. #### Details: * Added `--functions-rdb` option to _redis-cli_. * Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only "functions db:3 key-patten:user"` and a complementing `rdb-filter-exclude` `REPLCONF` can also be added. Handle "slave requirements" specification to RDB saving code so we can use the same RDB when different slaves express the same requirements (like functions-only) and not share the RDB when their requirements differ. This is currently just a flags `int`, but can be extended to a more complex structure with various filter fields. * make sure to support filters only in diskless replication mode (not to override the persistence file), we do that by forcing diskless (even if disabled by config) other changes: * some refactoring in rdb.c (extract portion of a big function to a sub-function) * rdb_key_save_delay used in AOFRW too * sendChildInfo takes the number of updated keys (incremental, rather than absolute) Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-02 09:39:01 +02:00
sundb	888e92eb57	Fix a valgrind test failure due to slowly shutdown (#10038 ) This pr is mainly to solve the problem that redis process cannot be exited normally, due to changes in #10003. When a test uses the `key-load-delay` config to delay loading, but does not reset it at the end of the test, will lead to server wait for the loading to reach the event loop (once in 2mb) before actually shutting down.	2022-01-01 17:45:13 +02:00
Binbin	4836ae32c7	redis-cli: Add -X option and extend --cluster call take arg from stdin (#9980 ) There are two changes in this commit: 1. Add -X option to redis-cli. Currently `-x` can only be used to provide the last argument, so you can do `redis-cli dump keyname > key.dump`, and then do `redis-cli -x restore keyname 0 < key.dump`. But what if you want to add the replace argument (which comes last?). oran suggested adding such usage: `redis-cli -X <tag> restore keyname <tag> replace < key.dump` i.e. you're able to provide a string in the arguments that's gonna be substituted with the content from stdin. Note that the tag name should not conflict with others non-replaced args. And the -x and -X options are conflicting. Some usages: ``` [root]# echo mypasswd \| src/redis-cli -X passwd_tag mset username myname password passwd_tag OK [root]# echo username > username.txt [root]# head -c -1 username.txt \| src/redis-cli -X name_tag mget name_tag password 1) "myname" 2) "mypasswd\n" ``` 2. Handle the combination of both `-x` and `--cluster` or `-X` and `--cluster` Extend the broadcast option to receive the last arg or <tag> arg from the stdin. Now we can use `redis-cli -x --cluster call <host>:<port> cmd`, or `redis-cli -X <tag> --cluster call <host>:<port> cmd <tag>`. (support part of #9899)	2021-12-30 12:10:04 +02:00
Binbin	e84ccc3f56	santize dump payload: fix carsh when zset with NAN score (#10002 ) `zslInsert` with a NAN score will crash the server. This one found by the `corrupt-dump-fuzzer`.	2021-12-26 11:40:11 +02:00
Oran Agra	b7567394e1	resolve replication test timing sensitivity - 2nd attempt (#9988 ) issue started failing after #9878 was merged (made an exiting test more sensitive) looks like #9982 didn't help, tested this one and it seems to work better. this commit does two things: 1. reduce the extra delay i added earlier and instead add more keys, the effect no duration of replication is the same, but the intervals in which the server is responsive to the tcl client is higher. 2. improve the test infra to print context when assert_error fails.	2021-12-22 23:37:12 +02:00
Oran Agra	e33e0295bb	resolve replication test timing sensitivity (#9982 ) issue started failing after #9878 was merged (made an exiting test more sensitive)	2021-12-22 16:05:53 +02:00
Oran Agra	41e6e05dee	Allow most CONFIG SET during loading, block some commands in async-loading (#9878 ) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988	2021-12-22 14:11:16 +02:00
zhugezy	1b0968df46	Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812 ) # Background The main goal of this PR is to remove relevant logics on Lua script verbatim replication, only keeping effects replication logic, which has been set as default since Redis 5.0. As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default configuration from users' point of view. There are lots of reasons to remove verbatim replication. Antirez has listed some of the benefits in Issue #5292: >1. No longer need to explain to users side effects into scripts. They can do whatever they want. >2. No need for a cache about scripts that we sent or not to the slaves. >3. No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed. >4. No need to store scripts inside the RDB file in order to startup correctly. >5. No problems about evicting keys during the script execution. When looking back at Redis 5.0, antirez and core team decided to set the config `lua-replicate-commands yes` by default instead of removing verbatim replication directly, in case some bad situations happened. 3 years later now before Redis 7.0, it's time to remove it formally. # Changes - configuration for lua-replicate-commands removed - created config file stub for backward compatibility - Replication script cache removed - this is useless under script effects replication - relevant statistics also removed - script persistence in RDB files is also removed - Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed - Deterministic execution logic in scripts removed (i.e. don't run write commands after random ones, and sorting output of commands with random order) - the flags indicating which commands have non-deterministic results are kept as hints to clients. - `redis.replicate_commands()` & `redis.set_repl()` changed - now `redis.replicate_commands()` does nothing and return an 1 - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now - Relevant TCL cases adjusted - DEBUG lua-always-replicate-commands removed # Other changes - Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780) Co-authored-by: Oran Agra <oran@redislabs.com>	2021-12-21 08:32:42 +02:00
Oran Agra	6add1b7217	Add external test that runs without debug command (#9964 ) - add needs:debug flag for some tests - disable "save" in external tests (speedup?) - use debug_digest proc instead of debug command directly so it can be skipped - use OBJECT ENCODING instead of DEBUG OBJECT to get encoding - add a proc for OBJECT REFCOUNT so it can be skipped - move a bunch of tests in latency_monitor tests to happen later so that latency monitor has some values in it - add missing close_replication_stream calls - make sure to close the temp client if DEBUG LOG fails	2021-12-19 17:41:51 +02:00
sundb	7f0fae947a	Santize dump payload: fix crash when stream with duplicate consumes (#9918 ) When rdb creates a consumer without determining whether it exists in advance, it may return NULL and crash if it encounters corrupt data with duplicate consumers.	2021-12-08 18:11:57 +02:00
Binbin	b947049f85	Fix timing issue in logging.tcl with FreeBSD (#9910 ) A test failure was reported in Daily CI. `Crash report generated on SIGABRT` with FreeBSD. ``` *** [err]: Crash report generated on SIGABRT in tests/integration/logging.tcl Expected [string match crashed by signal ### Starting...(logs) in tests/integration/logging.tcl] ``` It look like `tail -1000` was executed too early, before it printed out all the crash logs. We can give it a few more chances by using `wait_for_log_messages`. Other changes: 1. In `Server is able to generate a stack trace on selected systems`, use `wait_for_log_messages`to reduce the lines of code. And if it fails, there are more detailed logs that can be printed. 2. In `Crash report generated on DEBUG SEGFAULT`, we also use `wait_for_log_messages` to avoid possible timing issues.	2021-12-07 12:02:58 +02:00
sundb	1808618f5d	Santize dump payload: fix invalid listpack entry start with EOF (#9889 ) When an invalid listpack entry starts with EOF, we will skip it when we verify it in the loop.	2021-12-04 16:43:08 +02:00
meir@redislabs.com	cbd463175f	Redis Functions - Added redis function unit and Lua engine Redis function unit is located inside functions.c and contains Redis Function implementation: 1. FUNCTION commands: * FUNCTION CREATE * FCALL * FCALL_RO * FUNCTION DELETE * FUNCTION KILL * FUNCTION INFO 2. Register engine In addition, this commit introduce the first engine that uses the Redis Function capabilities, the Lua engine.	2021-12-02 19:35:52 +02:00
Viktor Söderqvist	acf3495eb8	Sort out the mess around writable replicas and lookupKeyRead/Write (#9572 ) Writable replicas now no longer use the values of expired keys. Expired keys are deleted when lookupKeyWrite() is used, even on a writable replica. Previously, writable replicas could use the value of an expired key in write commands such as INCR, SUNIONSTORE, etc.. This commit also sorts out the mess around the functions lookupKeyRead() and lookupKeyWrite() so they now indicate what we intend to do with the key and are not affected by the command calling them. Multi-key commands like SUNIONSTORE, ZUNIONSTORE, COPY and SORT with the store option now use lookupKeyRead() for the keys they're reading from (which will not allow reading from logically expired keys). This commit also fixes a bug where PFCOUNT could return a value of an expired key. Test modules commands have their readonly and write flags updated to correctly reflect their lookups for reading or writing. Modules are not required to correctly reflect this in their command flags, but this change is made for consistency since the tests serve as usage examples. Fixes #6842. Fixes #7475.	2021-11-28 11:26:28 +02:00
sundb	4512905961	Replace ziplist with listpack in quicklist (#9740 ) Part three of implementing #8702, following #8887 and #9366 . ## Description of the feature 1. Replace the ziplist container of quicklist with listpack. 2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation. ## Interface changes 1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`. 2. Replace `debug ziplist` command with `debug listpack`. ## Internal changes 1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`) 2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`) 3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ). It represent that a quicklistNode is a packed node, as opposed to a plain node. 4. Remove `createZiplistObject` method, which is never used. 5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`. We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k. ## Improvements 1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366. 2. Optimize `quicklistAppendPlainNode` to avoid memcpy data. ## Bugfix 1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366. ## Test 1. Add unittest for `lpMerge`. 2. Modify the old quicklist ziplist corrupt dump test. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-24 13:34:13 +02:00
Binbin	fb4f7be22c	Wait for `asyn_loading` to stop in `short read` test (#9841 ) In #9323, when `repl-diskless-load` is enabled and set to `swapdb`, if the master replication ID hasn't changed, we can load data-set asynchronously, and serving read commands during the full resync. In `diskless loading short read` test, after a loading successfully, we will wait for the loading to stop and continue the for loop. After the introduction of `async_loading`, we also need to check it. Otherwise the next loop will start too soon, may trigger a timing issue.	2021-11-24 12:46:43 +02:00
Oran Agra	a3a014294f	fix invalid read on corrupt ziplist (#9831 ) If the last bytes in ziplist are corrupt and we decode from tail to head, we may reach slightly outside the ziplist.	2021-11-23 14:56:52 +02:00
Oran Agra	f07dedf73f	Fix invalid access in lpFind on corrupted listpack (#9819 ) Issue found by corrupt-dump-fuzzer test with ASAN. The problem was that lpSkip and lpGetWithSize could read the next listpack entry without validating that it's in range. Similarly even the memcmp in lpFind could do that and possibly crash on segfault and now they'll crash on assert first. The naive fix of using lpAssertValidEntry every time, resulted in 30% degradation in the lpFind benchmark of the unit test. The final fix with the condition at the bottom has no performance implications.	2021-11-22 15:30:00 +02:00
Oran Agra	f00a8ad93c	fix string escaping in corrupt-dump test to support TCL8.5 (#9824 ) TCL8.5 can't handle cases where part of the string is escaped and part of it isn't, if there's a single char that needs escaping, we need to escape the whole string.	2021-11-22 12:30:06 +02:00
Oran Agra	183b90a625	Fix false positive leak reported by GCC ASAN (#9816 ) Leak found by the corrupt-dump-fuzzer when using GCC ASAN, which seems to falsely report leaks on pointers kept only on the stack when calling exit. Instead we now use _exit on panic / assert to skip these leak checks. Additionally, check for sanitizer warnings in the corrupt-dump-fuzzer between iterations, so that when something is found we know which test to relate it too (and it prints reproduction command list)	2021-11-21 18:47:10 +02:00
Oran Agra	1417648469	Prevent LCS from allocating temp memory over proto-max-bulk-len (#9817 ) LCS can allocate immense amount of memory (sizes of two inputs multiplied by each other). In the past this caused some possible security issues due to overflows, which we solved and also added use of `trymalloc` to return "Insufficient memory" instead of OOM panic zmalloc. But in case overcommit is enabled, it could be that we won't get the OOM panic, and zmalloc will succeed, and then we can get OOM killed by the kernel. The solution here is to prevent LCS from allocating transient memory that's bigger than `proto-max-bulk-len` config. This config is not directly related to transient memory, but using a hard coded value ad well as introducing a specific config seems wrong. This comes to solve an error in the corrupt-dump-fuzzer test that started in the daily CI see #9799	2021-11-21 14:30:20 +02:00
Ozan Tezcan	b91d8b289b	Add sanitizer support and clean up sanitizer findings (#9601 ) - Added sanitizer support. `address`, `undefined` and `thread` sanitizers are available. - To build Redis with desired sanitizer : `make SANITIZER=undefined` - There were some sanitizer findings, cleaned up codebase - Added tests with address and undefined behavior sanitizers to daily CI. - Added tests with address sanitizer to the per-PR CI (smoke out mem leaks sooner). Basically, there are three types of issues : 1- Unaligned load/store : Most probably, this issue may cause a crash on a platform that does not support unaligned access. Redis does unaligned access only on supported platforms. 2- Signed integer overflow. Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). 3 -Minor leak (redis-cli), use-after-free(just before calling exit()); UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit.	2021-11-11 13:51:33 +02:00
Oran Agra	0927a0dd24	Try solving test timeout on freebsd CI (#9768 ) First, avoid using --accurate on the freebsd CI, we only care about systematic issues there due to being different platform, but not accuracy Secondly, when looking at the test which timed out it seems silly and outdated: - it used KEYS to attempt to trigger lazy expiry, but KEYS doesn't do that anymore. - it used some hard coded sleeps rather than waiting for things to happen and exiting ASAP	2021-11-10 19:39:26 +02:00
YaacovHazan	03406fcb6c	fix short timeout in replication short read tests (#9763 ) In both tests, "diskless loading short read" and "diskless loading short read with module", the timeout of waiting for the replica to respond to a short read and log it, is too short. Also, add --dump-logs in runtest-moduleapi for valgrind runs.	2021-11-09 22:37:18 +02:00
Eduardo Semprebon	91d0c758e5	Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323 ) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. Additional for Release Notes - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 10:46:50 +02:00
menwen	ccf8a651f3	Retry when a blocked connection system call is interrupted by a signal (#9629 ) When repl-diskless-load is enabled, the connection is set to the blocking state. The connection may be interrupted by a signal during a system call. This would have resulted in a disconnection and possibly a reconnection loop. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-04 09:09:28 +02:00
perryitay	f27083a4a8	Add support for list type to store elements larger than 4GB (#9357 ) Redis lists are stored in quicklist, which is currently a linked list of ziplists. Ziplists are limited to storing elements no larger than 4GB, so when bigger items are added they're getting truncated. This PR changes quicklists so that they're capable of storing large items in quicklist nodes that are plain string buffers rather than ziplist. As part of the PR there were few other changes in redis: 1. new DEBUG sub-commands: - QUICKLIST-PACKED-THRESHOLD - set the threshold of for the node type to be plan or ziplist. default (1GB) - QUICKLIST <key> - Shows low level info about the quicklist encoding of <key> 2. rdb format change: - A new type was added - RDB_TYPE_LIST_QUICKLIST_2 . - container type (packed / plain) was added to the beginning of the rdb object (before the actual node list). 3. testing: - Tests that requires over 100MB will be by default skipped. a new flag was added to 'runtest' to run the large memory tests (not used by default) Co-authored-by: sundb <sundbcn@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-03 20:47:18 +02:00
Binbin	58a1d16ff6	Fix timing issue in replication test (#9719 ) So it looks like sampling set loglines [count_log_lines -2] was executed too late, and the replication managed to complete before that. ``` *** [err]: diskless no replicas drop during rdb pipe in tests/integration/replication.tcl log message of '"Diskless rdb transfer, done reading from pipe, 2 replicas still up"' not found in ./tests/tmp/server.6124.69/stdout after line: 52 till line: 52 ``` Changes: 1. when we search the master log file, we start to search from before we sent the REPLICAOF command, to prevent a race in which the replication completed before we sampled the log line count. 2. we don't need to sample the replica loglines sine it's a fresh resplica that's just been started, so the message we're looking for is the first occurrence in the log, we can start search from 0. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-02 10:32:01 +02:00
Binbin	cea7809cea	Fix race condition in psync2-pingoff test (#9712 ) Test failed on freebsd: ``` *** [err]: Make the old master a replica of the new one and check conditions in tests/integration/psync2-pingoff.tcl Expected '162' to be equal to '176' (context: type eval line 18 cmd {assert_equal [status $R(0) master_repl_offset] [status $R(1) master_repl_offset]} proc ::test) ``` There are two possible race conditions in the test. 1. The code waits for sync_full to increment, and assumes that means the master did the fork. But in fact there are cases the master will increment that sync_full counter (after replica asks for sync), but will see that there's already a fork running and will delay the fork creation. In this case the INCR will be executed before the fork happens, so it'll not be in the command stream. Solve that by waiting for `master_link_status: up` on the replica before the INCR. 2. The repl-ping-replica-period is still high (1 second), so there's a chance the master will send an additional PING between the two calls to INFO (the line that fails is the one that samples INFO from both servers). So there's a chance one of them will have an incremented offset due to PING and the other won't have it yet. In theory we can wait for the repl_offset to match, but then we risk facing a situation where that race will hide an offset mis-match. so instead, i think we should just change repl-ping-replica-period to prevent further pings from being pushed. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-11-01 16:07:08 +02:00
Wang Yuan	68886de085	Fix timing issue in replication buffer test (#9697 ) Introduced in #9166	2021-10-29 08:04:12 +03:00
Wang Yuan	37dc2f13b4	Fix not waiting for data loading to complete in AOF tests (#9683 ) Fix timing issue of a new test introduced in #9326	2021-10-26 14:08:09 +03:00
Wang Yuan	9ec3294b97	Add timestamp annotations in AOF (#9326 ) Add timestamp annotation in AOF, one part of #9325. Enabled with the new `aof-timestamp-enabled` config option. Timestamp annotation format is "#TS:${timestamp}\r\n"." TS" is short of timestamp and this method could save extra bytes in AOF. We can use timestamp annotation for some special functions. - know the executing time of commands - restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)	2021-10-25 13:08:34 +03:00
Wang Yuan	c1718f9d86	Replication backlog and replicas use one global shared replication buffer (#9166 ) ## Background For redis master, one replica uses one copy of replication buffer, that is a big waste of memory, more replicas more waste, and allocate/free memory for every reply list also cost much. If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with replicas and can't finish synchronization with replica. If we set client-output-buffer-limit big, master may be OOM when there are many replicas that separately keep much memory. Because replication buffers of different replica client are the same, one simple idea is that all replicas only use one replication buffer, that will effectively save memory. Since replication backlog content is the same as replicas' output buffer, now we can discard replication backlog memory and use global shared replication buffer to implement replication backlog mechanism. ## Implementation I create one global "replication buffer" which contains content of replication stream. The structure of "replication buffer" is similar to the reply list that exists in every client. But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields. ```c /* Replication buffer blocks is the list of replBufBlock. * * +--------------+ +--------------+ +--------------+ * \| refcount = 1 \| ... \| refcount = 0 \| ... \| refcount = 2 \| * +--------------+ +--------------+ +--------------+ * \| / \ * \| / \ * \| / \ * Repl Backlog Replia_A Replia_B * * Each replica or replication backlog increments only the refcount of the * 'ref_repl_buf_node' which it points to. So when replica walks to the next * node, it should first increase the next node's refcount, and when we trim * the replication buffer nodes, we remove node always from the head node which * refcount is 0. If the refcount of the head node is not 0, we must stop * trimming and never iterate the next node. / / Similar with 'clientReplyBlock', it is used for shared buffers between * all replica clients and replication backlog. / typedef struct replBufBlock { int refcount; / Number of replicas or repl backlog using. / long long id; / The unique incremental number. / long long repl_offset; / Start replication offset of the block. */ size_t size, used; char buf[]; } replBufBlock; ``` So now when we feed replication stream into replication backlog and all replicas, we only need to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of replication backlog and replicas to references of the global replication buffer blocks. And we also need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim replication backlog if exceeding `repl-backlog-size`. When sending reply to replicas, we also need to iterate replication buffer blocks and send its content, when totally sending one block for replica, we decrease current node count and increase the next current node count, and then free the block which reference is 0 from the head of replication buffer blocks. Since now we use linked list to manage replication backlog, it may cost much time for iterating all linked list nodes to find corresponding replication buffer node. So we create a rax tree to store some nodes for index, but to avoid rax tree occupying too much memory, i record one per 64 nodes for index. Currently, to make partial resynchronization as possible as much, we always let replication backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks, and this method doesn't increase memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced replication buffer blocks when we need to trim backlog for exceeding backlog size setting, we trim backlog incrementally (free 64 blocks per call now), and make it faster in `beforeSleep` (free 640 blocks). ### Other changes - `mem_total_replication_buffers`: we add this field in INFO command, it means the total memory of replication buffers used. - `mem_clients_slaves`: now even replica is slow to replicate, and its output buffer memory is not 0, but it still may be 0, since replication backlog and replicas share one global replication buffer, only if replication buffer memory is more than the repl backlog setting size, we consider the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption of repl backlog. - Key eviction Since all replicas and replication backlog share global replication buffer, we think only the part of exceeding backlog size the extra separate consumption of replicas. Because we trim backlog incrementally in the background, backlog size may exceeds our setting if slow replicas that reference vast replication buffer blocks disconnect. To avoid massive eviction loop, we don't count the delayed freed replication backlog into used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory. - `client-output-buffer-limit` check for replica clients It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size config (partial sync will succeed and then replica will get disconnected). Such a configuration is ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption implications since the replica client will share the backlog buffers memory. - Drop replication backlog after loading data if needed We always create replication backlog if server is a master, we need it because we put DELs in it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb, it is not possible to support partial resynchronization, to avoid extra memory of replication backlog, we drop it. - Multi IO threads Since all replicas and replication backlog use global replication buffer, if I/O threads are enabled, to guarantee data accessing thread safe, we must let main thread handle sending the output buffer to all replicas. But before, other IO threads could handle sending output buffer of all replicas. ## Other optimizations This solution resolve some other problem: - When replicas disconnect with master since of out of output buffer limit, releasing the output buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now, it doesn't cause freezing. - This implementation may mitigate reply list copy cost time(also freezes server) when one replication has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy reference info, it is very light. - If we set replication backlog size big, it also may cost much time to copy replication backlog into replica's output buffer. But this commit eliminates this problem. - Resizing replication backlog size doesn't empty current replication backlog content.	2021-10-25 09:24:31 +03:00
Oran Agra	276b460ea9	Attempt to fix a valgrind test failure due to timing (#9643 ) in the past few days i've seen two failures in the valgrind daily test. *** [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl Replica didn't get into loading mode can't reproduce it, but i'm hoping it's just too slow (to start loading within 5 seconds)	2021-10-18 10:45:45 +03:00
YaacovHazan	5becb7c9c6	improve the stability and correctness of "Test child sending info" (#9562 ) Since we measure the COW size in this test by changing some keys and reading the reported COW size, we need to ensure that the "dismiss mechanism" (#8974) will not free memory and reduce the COW size. For that, this commit changes the size of the keys to 512B (less than a page). and because some keys may fall into the same page, we are modifying ten keys on each iteration and check for at least 50% change in the COW size.	2021-10-04 10:32:26 +03:00
Oran Agra	5a4ab7c7d2	Fix stream sanitization for non-int first value (#9553 ) This was recently broken in #9321 when we validated stream IDs to be integers but did that after to the stepping next record instead of before.	2021-09-26 18:46:22 +03:00
Binbin	14d6abd8e9	Add ZMPOP/BZMPOP commands. (#9484 ) This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset. Syntax for the new ZMPOP command: `ZMPOP numkeys [<key> ...] MIN\|MAX [COUNT count]` Syntax for the new BZMPOP command: `BZMPOP timeout numkeys [<key> ...] MIN\|MAX [COUNT count]` Some background: - ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements. - BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key. - ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key. Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key. And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option. As new commands, if we can not pop any elements, the response like: - ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray. - BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX. For the normal response is nested arrays in RESP2 and RESP3: ``` ZMPOP/BZMPOP 1) keyname 2) 1) 1) member1 2) score1 2) 1) member2 2) score2 In RESP2: 1) "myzset" 2) 1) 1) "three" 2) "3" 2) 1) "two" 2) "2" In RESP3: 1) "myzset" 2) 1) 1) "three" 2) (double) 3 2) 1) "two" 2) (double) 2 ```	2021-09-23 08:34:40 +03:00
Oran Agra	16be742b08	fix replication test failure, probing the wrong log file (#9513 )	2021-09-19 12:07:04 +03:00
filipe oliveira	b5a879e1c2	Added URI support to redis-benchmark (cli and benchmark share the same uri-parsing methods) (#9314 ) - Add `-u <uri>` command line option to support `redis://` URI scheme. - included server connection information object (`struct cliConnInfo`), used to describe an ip:port pair, db num user input, and user:pass to avoid a large number of function arguments. - Using sds on connection info strings for redis-benchmark/redis-cli Co-authored-by: yoav-steinberg <yoav@monfort.co.il>	2021-09-14 19:45:06 +03:00
zhaozhao.zz	794442b130	PSYNC2: make partial sync possible after master reboot (#8015 ) The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic. The key point is we need guarantee safety and consistency, so there are two differences between master and replica: 1. master would load the replication info as secondary ID and offset, in case other masters have the same replid. 2. when master loading RDB, it would propagate expired keys as DEL command to replication backlog, then replica can receive these commands to delete stale keys. p.s. the expired keys when RDB loading is useful for users, so we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence. Moreover, after load replication info, master should update `no_replica_time` in case loading RDB cost too long time.	2021-09-13 15:39:11 +08:00
sundb	3ca6972ecd	Replace all usage of ziplist with listpack for t_zset (#9366 ) Part two of implementing #8702 (zset), after #8887. ## Description of the feature Replaced all uses of ziplist with listpack in t_zset, and optimized some of the code to optimize performance. ## Rdb format changes New `RDB_TYPE_ZSET_LISTPACK` rdb type. ## Rdb loading improvements: 1) Pre-expansion of dict for validation of duplicate data for listpack and ziplist. 2) Simplifying the release of empty key objects when RDB loading. 3) Unify ziplist and listpack data verify methods for zset and hash, and move code to rdb.c. ## Interface changes 1) New `zset-max-listpack-entries` config is an alias for `zset-max-ziplist-entries` (same with `zset-max-listpack-value`). 2) OBJECT ENCODING will return listpack instead of ziplist. ## Listpack improvements: 1) Add `lpDeleteRange` and `lpDeleteRangeWithEntry` functions to delete a range of entries from listpack. 2) Improve the performance of `lpCompare`, converting from string to integer is faster than converting from integer to string. 3) Replace `snprintf` with `ll2string` to improve performance in converting numbers to strings in `lpGet()`. ## Zset improvements: 1) Improve the performance of `zzlFind` method, use `lpFind` instead of `lpCompare` in a loop. 2) Use `lpDeleteRangeWithEntry` instead of `lpDelete` twice to delete a element of zset. ## Tests 1) Add some unittests for `lpDeleteRange` and `lpDeleteRangeWithEntry` function. 2) Add zset RDB loading test. 3) Add benchmark test for `lpCompare` and `ziplsitCompare`. 4) Add empty listpack zset corrupt dump test.	2021-09-09 18:18:53 +03:00
Binbin	c50af0aeba	Add LMPOP/BLMPOP commands. (#9373 ) We want to add COUNT option for BLPOP. But we can't do it without breaking compatibility due to the command arguments syntax. So this commit introduce two new commands. Syntax for the new LMPOP command: `LMPOP numkeys [<key> ...] LEFT\|RIGHT [COUNT count]` Syntax for the new BLMPOP command: `BLMPOP timeout numkeys [<key> ...] LEFT\|RIGHT [COUNT count]` Some background: - LPOP takes one key, and can return multiple elements. - BLPOP takes multiple keys, but returns one element from just one key. - LMPOP can take multiple keys and return multiple elements from just one key. Note that LMPOP/BLMPOP can take multiple keys, it eventually operates on just one key. And it will propagate as LPOP or RPOP with the COUNT option. As a new command, it still return NIL if we can't pop any elements. For the normal response is nested arrays in RESP2 and RESP3, like: ``` LMPOP/BLMPOP 1) keyname 2) 1) element1 2) element2 ``` I.e. unlike BLPOP that returns a key name and one element so it uses a flat array, and LPOP that returns multiple elements with no key name, and again uses a flat array, this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does) Some discuss can see: #766 #8824	2021-09-09 12:02:33 +03:00
Wang Yuan	cee3d67f50	Delay to discard cached master when full synchronization (#9398 ) * Delay to discard cache master when full synchronization * Don't disconnect with replicas before loading transferred RDB when full sync Previously, once replica need to start full synchronization with master, it will discard cached master whatever full synchronization is failed or not. Now we discard cached master only when transferring RDB is finished and start to change data space, this make replica could start partial resynchronization with another new master if new master is failed during full synchronization.	2021-09-09 11:32:29 +03:00
Viktor Söderqvist	547c3405d4	Optimize quicklistIndex to seek from the nearest end (#9454 ) Until now, giving a negative index seeks from the end of a list and a positive seeks from the beginning. This change makes it seek from the nearest end, regardless of the sign of the given index. quicklistIndex is used by all list commands which operate by index. LINDEX key 999999 in a list if 1M elements is greately optimized by this change. Latency is cut by 75%. LINDEX key -1000000 in a list of 1M elements, likewise. LRANGE key -1 -1 is affected by this, since LRANGE converts the indices to positive numbers before seeking. The tests for corrupt dumps are updated to make sure the corrup data is seeked in the same direction as before.	2021-09-06 09:12:38 +03:00
Viktor Söderqvist	97dcf95cc8	redis-benchmark: improved help and warnings (#9419 ) 1. The output of --help: * On the Usage line, just write [OPTIONS] [COMMAND ARGS...] instead listing only a few arbitrary options and no command. * For --cluster, describe that if the command is supplied on the command line, the key must contain "{tag}". Otherwise, the command will not be sent to the right cluster node. * For -r, add a note that if -r is omitted, all commands in a benchmark will use the same key. Also align the description. * For -t, describe that -t is ignored if a command is supplied on the command line. 2. Print a warning if -t is present when a specific command is supplied. 3. Print all warnings and errors to stderr. 4. Remove -e from calls in redis-benchmark test suite.	2021-08-29 14:31:08 +03:00
sundb	492d8d0961	Sanitize dump payload: fix double free after insert dup nodekey to stream rax and returns 0 (#9399 )	2021-08-20 10:37:45 +03:00
Yossi Gottlieb	1d9c8d61d8	Skip OOM-related tests on incompatible platforms. (#9386 ) We only run OOM related tests on x86_64 and aarch64, as jemalloc on other platforms (notably s390x) may actually succeed very large allocations. As a result the test may hang for a very long time at the cleanup phase, iterating as many as 2^61 hash table slots.	2021-08-18 16:00:22 +03:00
sundb	02fd76b97c	Replace all usage of ziplist with listpack for t_hash (#8887 ) Part one of implementing #8702 (taking hashes first before other types) ## Description of the feature 1. Change ziplist encoded hash objects to listpack encoding. 2. Convert existing ziplists on RDB loading time. an O(n) operation. ## Rdb format changes 1. Add RDB_TYPE_HASH_LISTPACK rdb type. 2. Bump RDB_VERSION to 10 ## Interface changes 1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`) 2. OBJECT ENCODING will return `listpack` instead of `ziplist` ## Listpack improvements: 1. Support direct insert, replace integer element (rather than convert back and forth from string) 3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such) 4. Optimize element length fetching, avoid multiple calculations 5. Use inline to avoid function call overhead. ## Tests 1. Add a new test to the RDB load time conversion 2. Adding the listpack unit tests. (based on the one in ziplist.c) 3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-10 09:18:49 +03:00
sundb	cbda492909	Sanitize dump payload: handle remaining empty key when RDB loading and restore command (#9349 ) This commit mainly fixes empty keys due to RDB loading and restore command, which was omitted in #9297. 1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned. If only some of the ziplists are empty, then we will skip the empty ziplists silently. 2) When loading hash zipmap, if zipmap is empty, sanitization check will fail. 3) When loading hash ziplist, if ziplist is empty, NULL will be returned. 4) Add RDB loading test with sanitize.	2021-08-09 17:13:46 +03:00
Qu Chen	e8eeba7bee	Allow master to replicate command longer than replica's query buffer limit (#9340 ) Replication client no longer checks incoming command length against the client-query-buffer-limit. This makes the master able to replicate commands longer than replica's configured client-query-buffer-limit	2021-08-08 17:34:11 -07:00
Oran Agra	3f3f678a47	corrupt-dump-fuzzer test, avoid creating junk keys (#9302 ) The execution of the RPOPLPUSH command by the fuzzer created junk keys, that were later being selected by RANDOMKEY and modified. This also meant that lists were statistically tested more than other files. Fix the fuzzer not to pass junk key names to RPOPLPUSH, and add a check that detects that new keys are not added by the fuzzer to detect future similar issues.	2021-08-05 22:57:05 +03:00
Oran Agra	0c90370e6d	Improvements to corrupt payload sanitization (#9321 ) Recently we found two issues in the fuzzer tester: #9302 #9285 After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them. Here's a list of the fixes - Prevent an overflow when allocating a dict hashtable - Prevent OOM when attempting to allocate a huge string - Prevent a few invalid accesses in listpack - Improve sanitization of listpack first entry - Validate integrity of stream consumer groups PEL - Validate integrity of stream listpack entry IDs - Validate ziplist tail followed by extra data which start with 0xff Co-authored-by: sundb <sundbcn@gmail.com>	2021-08-05 22:56:14 +03:00
sundb	8ea777a6a0	Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297 ) When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key. This could either be a corrupt payload, or a result of a bug (see #8453 ) This PR mainly fixes the following: 1) When restore command will return `Bad data format` error. 2) When loading RDB, we will silently discard the key. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-05 22:42:20 +03:00
Binbin	d0244bfc3d	Make sure execute SLAVEOF command in the right order in psync2 test. (#9316 ) The psync2 test has failed several times recently. In #9159 we only solved half of the problem. i.e. reordering of the replica that's already connected to the newly promoted master. Consider this scenario: 0 slaveof 2 1 slaveof 2 3 slaveof 2 4 slaveof 1 0 slaveof no one, became a new master got a new replid 2 slaveof 0, partial resync and got the new replid 3 reconnect 2, inherit the new replid 3 slaveof 4, use the new replid and got a full resync And another scenario: 1 slaveof 3 2 slaveof 4 3 slaveof 0 4 slaveof 0 4 slaveof no one, became a new master got a new replid 2 reconnect 4, inherit the new replid 2 slaveof 1, use the new replid and got a full resync So maybe we should reattach replicas in the right order. i.e. In the above example, if it would have reattached 1, 3 and 0 to the new chain formed by 4 before trying to attach 2 to 1, it would succeed. This commit break the SLAVEOF loop into two loops. (ideas from oran) First loop that uses random to decide who replicates from who. Second loop that does the actual SLAVEOF command. In the second loop, we make sure to execute it in the right order, and after each SLAVEOF, wait for it to be connected before we proceed. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-05 11:26:09 +03:00
Viktor Söderqvist	1c59567a7f	redis-cli ASK redirect test: Add retry loop to fix timing issue (#9315 )	2021-08-05 08:20:30 +03:00
Wang Yuan	d4bca53cd9	Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974 ) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - Serialized key-values the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - Replication backlog Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - Client buffers If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-04 23:01:46 +03:00
Oran Agra	52df350fe5	Skip new redis-cli ASK test in TLS mode (#9312 )	2021-08-03 13:19:03 -07:00
Huang Zhw	cf61ad14cc	When redis-cli received ASK, it didn't handle it (#8930 ) When redis-cli received ASK, it used string matching wrong and didn't handle it. When we access a slot which is in migrating state, it maybe return ASK. After redirect to the new node, we need send ASKING command before retry the command. In this PR after redis-cli receives ASK, we send a ASKING command before send the origin command after reconnecting. Other changes: * Make redis-cli -u and -c (unix socket and cluster mode) incompatible with one another. * When send command fails, we avoid the 2nd reconnect retry and just print the error info. Users will decide how to do next. See #9277. * Add a test faking two redis nodes in TCL to just send ASK and OK in redis protocol to test ASK behavior. Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-02 14:59:08 +03:00
Yossi Gottlieb	68b8b45cd5	Tests: avoid short reads on redis-cli output. (#9301 ) In some cases large replies on slow systems may only be partially read by the test suite, resulting with parsing errors. This fix is still timing sensitive but should greatly reduce the chances of this happening.	2021-08-01 15:07:27 +03:00
sundb	3db0f1a284	Fix missing check for sanitize_dump in corrupt-dump-fuzzer test (#9285 ) this means the assertion that checks that when deep sanitization is enabled, there are no crashes, was missing.	2021-07-29 11:53:21 +03:00
Mikhail Fesenko	1eb4baa5b8	Direct redis-cli repl prints to stderr, because --rdb can print to stdout. fflush stdout after responses (#9136 ) 1. redis-cli can output --rdb data to stdout but redis-cli also write some messages to stdout which will mess up the rdb. 2. Make redis-cli flush stdout when printing a reply This was needed in order to fix a hung in redis-cli test that uses --replica. Note that printf does flush when there's a newline, but fwrite does not. 3. fix the redis-cli --replica test which used to pass previously because it didn't really care what it read, and because redis-cli used printf to print these other things to stdout. 4. improve redis-cli --replica test to run with both diskless and disk-based. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Viktor Söderqvist <viktor@zuiderkwast.se>	2021-07-07 08:26:26 +03:00
Binbin	1d5aa37d68	Fix timing issue in psync2 test. (#9159 ) *** [err]: PSYNC2: total sum of full synchronizations is exactly 4 intests/integration/psync2.tcl Expected 5 == 4 (context: type eval line 8 cmd {assert {$sum == 4}} proc::test) Sometime the test got an unexpected full sync since a replica switch to master, before the new master change propagated the new replid to all replicas, a replica attempted to sync with it using a wrong replid and triggered a full resync. Consider this scenario: 1 slaveof 4 full resync 0 slaveof 4 full resync 2 slaveof 0 full resync 3 slaveof 1 full resync 1 slaveof no one, replid changed 3 reconnect 1, did a partial resyn and got the new replid Before 2 inherits the new replid. 3 slaveof 2 3 try to do a partial resyn with 2. But their replication ids are inconsistent, so a full resync happens. :) A special thank you for oran and helping me in this test case. Co-authored-by: Oran Agra <oran@redislabs.com>	2021-06-30 09:18:10 +03:00
Oran Agra	d0819d618e	solve test timing issues in replication tests (#9121 ) # replication-3.tcl had a test timeout failure with valgrind on daily CI: ``` * [err]: SLAVE can reload "lua" AUX RDB fields of duplicated scripts in tests/integration/replication-3.tcl Replication not started. ``` replication took more than 70 seconds. https://github.com/redis/redis/runs/2854037905?check_suite_focus=true on my machine it takes only about 30, but i can see how 50 seconds isn't enough. # replication.tcl loading was over too quickly in freebsd daily CI: ``` * [err]: slave fails full sync and diskless load swapdb recovers it in tests/integration/replication.tcl Expected '0' to be equal to '1' (context: type eval line 44 cmd {assert_equal [s -1 loading] 1} proc ::start_server) ``` # rdb.tcl loading was over too quickly. increase the time loading takes, and decrease the amount of work we try to achieve in that time.	2021-06-22 11:10:11 +03:00
YaacovHazan	1677efb9da	cleanup around loadAppendOnlyFile (#9012 ) Today when we load the AOF on startup, the loadAppendOnlyFile checks if the file is openning for reading. This check is redundent (dead code) as we open the AOF file for writing at initServer, and the file will always be existing for the loadAppendOnlyFile. In this commit: - remove all the exit(1) from loadAppendOnlyFile, as it is the caller responsibility to decide what to do in case of failure. - move the opening of the AOF file for writing, to be after we loading it. - avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty	2021-06-14 10:38:08 +03:00
Binbin	0bfccc55e2	Fixed some typos, add a spell check ci and others minor fix (#8890 ) This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes. This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything, but at least it is also unlikely to cause false positives. Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use. Here's a summary of other changes: 1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces). 2. Outdated function / variable / argument names in comments 3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString. 4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751 5. Some outdated https link URLs. 6. Fix some outdated comment. Such as: - In README: about the rdb, we used to said create a `thread`, change to `process` - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey) - notifyKeyspaceEvent fucntion comment (add type arg) - Some others minor fix in comment (Most of them are incorrectly quoted by variable names) 7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`	2021-06-10 15:39:33 +03:00
Yossi Gottlieb	8a86bca5ed	Improve test suite to handle external servers better. (#9033 ) This commit revives the improves the ability to run the test suite against external servers, instead of launching and managing `redis-server` processes as part of the test fixture. This capability existed in the past, using the `--host` and `--port` options. However, it was quite limited and mostly useful when running a specific tests. Attempting to run larger chunks of the test suite experienced many issues: * Many tests depend on being able to start and control `redis-server` themselves, and there's no clear distinction between external server compatible and other tests. * Cluster mode is not supported (resulting with `CROSSSLOT` errors). This PR cleans up many things and makes it possible to run the entire test suite against an external server. It also provides more fine grained controls to handle cases where the external server supports a subset of the Redis commands, limited number of databases, cluster mode, etc. The tests directory now contains a `README.md` file that describes how this works. This commit also includes additional cleanups and fixes: * Tests can now be tagged. * Tag-based selection is now unified across `start_server`, `tags` and `test`. * More information is provided about skipped or ignored tests. * Repeated patterns in tests have been extracted to common procedures, both at a global level and on a per-test file basis. * Cleaned up some cases where test setup was based on a previous test executing (a major anti-pattern that repeats itself in many places). * Cleaned up some cases where test teardown was not part of a test (in the future we should have dedicated teardown code that executes even when tests fail). * Fixed some tests that were flaky running on external servers.	2021-06-09 15:13:24 +03:00
Oran Agra	b512dfe794	tests: add details when test fails on malformed info (#9042 )	2021-06-03 20:34:54 +03:00
ny0312	53d1acd598	Always replicate time-to-live(TTL) as absolute timestamps in milliseconds (#8474 ) Till now, on replica full-sync we used to transfer absolute time for TTL, however when a command arrived (EXPIRE or EXPIREAT), we used to propagate it as is to replicas (possibly with relative time), but always translate it to EXPIREAT (absolute time) to AOF. This commit changes that and will always use absolute time for propagation. see discussion in #8433 Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME` that allow extracting the absolute TTL time from a key.	2021-05-30 09:20:32 +03:00
YaacovHazan	501d775583	unregister AE_READABLE from the read pipe in backgroundSaveDoneHandlerSocket (#8991 ) In diskless replication, we create a read pipe for the RDB, between the child and the parent. When we close this pipe (fd), the read handler also needs to be removed from the event loop (if it still registered). Otherwise, next time we will use the same fd, the registration will be fail (panic), because we will use EPOLL_CTL_MOD (the fd still register in the event loop), on fd that already removed from epoll_ctl	2021-05-26 14:51:53 +03:00
YaacovHazan	32a2584e07	stabilize tests that involved with load handlers (#8967 ) When test stop 'load handler' by killing the process that generating the load, some commands that already in the input buffer, still might be processed by the server. This may cause some instability in tests, that count on that no more commands processed after we stop the `load handler' In this commit, new proc 'wait_load_handlers_disconnected' added, to verify that no more cammands from any 'load handler' prossesed, by checking that the clients who genreate the load is disconnceted. Also, replacing check of dbsize with wait_for_ofs_sync before comparing debug digest, as it would fail in case the last key the workload wrote was an overridden key (not a new one). Affected tests Race fix: - failover command to specific replica works - Connect multiple replicas at the same time (issue #141), master diskless=$mdl, replica diskless=$sdl - AOF rewrite during write load: RDB preamble=$rdbpre Cleanup and speedup: - Test replication with blocking lists and sorted sets operations - Test replication with parallel clients writing in different DBs - Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect	2021-05-20 15:29:43 +03:00
Oran Agra	8458baf6a9	longer timeout in replication test (#8963 ) the test normally passes. but we saw one failure in a valgrind run in github actions	2021-05-18 17:13:59 +03:00
Oran Agra	611959eee5	fuzz tester, try to print hung command (#8837 )	2021-04-25 13:08:46 +03:00
bugwz	761d7d2771	Print the number of abnormal line in AOF (#8823 ) When redis-check-aof finds an error, it prints the line number for faster troubleshooting.	2021-04-20 21:51:24 +03:00
Oran Agra	a9897b0084	Fix timing of new replication test (#8807 ) In github actions CI with valgrind, i saw that even the fast replica (one that wasn't paused), didn't get to complete the replication fast enough, and ended up getting disconnected by timeout. Additionally, due to a typo in uname, we didn't get to actually run the CPU efficiency part of the test.	2021-04-18 15:12:34 +03:00
guybe7	d63d02601f	Add a timeout mechanism for replicas stuck in fullsync (#8762 ) Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork child so that the parent is the one sending data to the replicas. This mechanism has an issue in which a hung replica will cause the master to wait for it to read the data sent to it forever, thus preventing the fork child from terminating and preventing the creations of any other forks. This PR adds a timeout mechanism, much like the ACK-based timeout, we disconnect replicas that aren't reading the RDB file fast enough.	2021-04-15 17:18:51 +03:00
Oran Agra	cd81dcf18b	solve race conditions in psync2-pingoff test (#8720 ) Another test race condition in the macos tests. the test was waiting for PINGs to be generated and put on the replication stream, but waiting for 1 or 2 seconds doesn't really guarantee that. then the test that expected 6 full syncs, found only 4	2021-03-30 11:41:06 +03:00
Qu Chen	7de6451818	Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666 )	2021-03-24 08:41:05 -07:00
Oran Agra	f6e1a94e03	Corrupt stream key access to uninitialized memory (#8681 ) the corrupt-dump-fuzzer test found a case where an access to a corrupt stream would have caused accessing to uninitialized memory. now it'll panic instead. The issue was that there was a stream that says it has more than 0 records, but looking for the max ID came back empty handed. p.s. when sanitize-dump-payload is used, this corruption is detected, and the RESTORE command is gracefully rejected.	2021-03-24 11:33:49 +02:00
Oran Agra	a7c02b19bf	Fix race in replication test (#8679 ) Since redis 6.2, redis immediately tries to connect to the master, not waiting for replication cron. in the slow freebsd CI, this test failed and master_link_status was already "up" when INFO was called.	2021-03-22 10:50:39 +02:00
Yossi Gottlieb	3c7d6a1853	Improve redis-cli non-binary safe string handling. (#8566 ) * The `redis-cli --scan` output should honor output mode (set explicitly or implicitly), and quote key names when not in raw mode. * Technically this is a breaking change, but it should be very minor since raw mode is by default on for non-tty output. * It should only affect TTY output (human users) or non-tty output if `--no-raw` is specified. * Added `--quoted-input` option to treat all arguments as potentially quoted strings. * Added `--quoted-pattern` option to accept a potentially quoted pattern. Unquoting is applied to potentially quoted input only if single or double quotes are used. Fixes #8561, #8563	2021-03-04 15:03:49 +02:00
Yossi Gottlieb	5d180d2834	Fix potential replication-4 test race condition. (#8583 ) Co-authored-by: Oran Agra <oran@redislabs.com>	2021-03-02 18:12:11 +02:00
Oran Agra	349ef3f6a0	fix stream deep sanitization with deleted records (#8568 ) When sanitizing the stream listpack, we need to count the deleted records too. otherwise the last line that checks the next pointer fails. Add test to cover that state in the stream tests.	2021-03-01 17:23:29 +02:00
Yossi Gottlieb	95ea74549c	Fix failed tests on Linux Alpine and add a CI job. (#8532 ) * Remove linux/version.h dependency. This introduces unnecessary dependencies, and generally not a good idea as the platform we build on may be different than the platform we run on. To determine if sync_file_range exists we can simply rely on header file hints. * Fix setproctitle() on libmusl. The previous ifdef checks were a bit too strict for no apparent reason. * Fix tests failure on Linux with no backtrace. * Add alpine daily CI job.	2021-02-23 12:57:45 +02:00
uriyage	fd052d2a86	Adds INFO fields to track fork child progress (#8414 ) * Adding current_save_keys_total and current_save_keys_processed info fields. Present in replication, BGSAVE and AOFRW. * Changing RM_SendChildCOWInfo() to RM_SendChildHeartbeat(double progress) * Adding new info field current_fork_perc. Present in Replication, BGSAVE, AOFRW, and module forks.	2021-02-16 16:06:51 +02:00
Yossi Gottlieb	141ac8df59	Escape unsafe field name characters in INFO. (#8492 ) Fixes #8489	2021-02-15 17:08:53 +02:00
Oran Agra	30775bc3e3	solve race in replication-2 test - again (#8491 ) this should make it timing independent and also faster in most cases	2021-02-15 12:50:23 +02:00
Oran Agra	02ab14cc2e	solve race in replication-2 test (#8461 ) use SIGSTOP instead of DEBUG SLEEP, reduces the test time by some 2 seconds and avoids failures on slow machines	2021-02-07 16:22:30 +02:00
Yossi Gottlieb	de6f3ad017	Fix FreeBSD tests and CI Daily issues. (#8438 ) * Add bash temporarily to allow sentinel fd leaks test to run. * Use vmactions-freebsd rdist sync to work around bind permission denied and slow execution issues. * Upgrade to tcl8.6 to be aligned with latest Ubuntu envs. * Concat all command executions to avoid ignoring failures. * Skip intensive fuzzer on FreeBSD. For some yet unknown reason, generate_fuzzy_traffic_on_key causes TCL to significantly bloat on FreeBSD resulting with out of memory.	2021-02-03 17:35:28 +02:00
Oran Agra	5a7eb9c881	Fix test issues from introduction of HRANDFIELD (#8424 ) * The corrupt dump fuzzer found a division by zero. * in some cases the random fields from the HRANDFIELD tests produced fields with newlines and other special chars (due to \ char), this caused the TCL tests to see a bulk response that has a newline in it and add {} around it, later it can think this is a nested list. in fact the `alpha` random string generator isn't using spaces and newlines, so it should not use `\` either.	2021-01-31 12:13:45 +02:00
Allen Farris	0d18a1e85f	implement FAILOVER command (#8315 ) Implement FAILOVER command, which coordinates failover between the server and one of its replicas.	2021-01-28 13:18:05 -08:00
Raghav Muddur	0367a80819	GETEX, GETDEL and SET PXAT/EXAT (#8327 ) This commit introduces two new command and two options for an existing command GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp] [PXAT milliseconds-timestamp] The getexCommand() function implements extended options and variants of the GET command. Unlike GET command this command is not read-only. Only one of the options can be used at a given time. 1. PERSIST removes any TTL associated with the key. 2. EX Set expiry TTL in seconds. 3. PX Set expiry TTL in milliseconds. 4. EXAT Same like EX instead of specifying the number of seconds representing the TTL (time to live), it takes an absolute Unix timestamp 5. PXAT Same like PX instead of specifying the number of milliseconds representing the TTL (time to live), it takes an absolute Unix timestamp Command would return either the bulk string, error or nil. GETDEL <key> Would delete the key after getting. SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>] [EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>] Two new options added here are EXAT and PXAT Key implementation notes - `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is specified (`PX/EX`), replication will always use `PX`. - `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile` as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of correct AOF translation. - `GETEX` without any optional argument behaves like `GET`. - `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`. - `GETDEL` command is propagated as `DEL` - Combined the validation for `SET` and `GETEX` arguments. - Test cases to validate AOF/Replication propagation	2021-01-27 19:47:26 +02:00
Yossi Gottlieb	522d93607a	Add io-thread daily CI tests. (#8232 ) This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled. Also provides some necessary additional improvements to the test suite: * Add --config to sentinel/cluster tests for arbitrary configuration. * Fix --tags whitelisting which was broken. * Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.	2021-01-17 15:48:48 +02:00
Oran Agra	8dd16caec8	Fix last COW INFO report, Skip test on non-linux platforms (#8301 ) - the last COW report wasn't always read from the pipe (receiveLastChildInfo wasn't used) - but in fact, there's no reason we won't always try to drain that pipe so i'm unifying receiveLastChildInfo with receiveChildInfo - adjust threshold of the COW test when run in accurate mode - add some prints in case this test fails again - fix indentation, page size, and PID! in MacOS proc info p.s. it seems that pri_pages_dirtied is always 0	2021-01-08 23:35:30 +02:00
YaacovHazan	ea930a352c	Report child copy-on-write info continuously Add INFO field, rdb_active_cow_size, to report COW of a live fork child while it's active. - once in 1024 keys check the time, and if there's more than one second since the last report send a report to the parent via the pipe. - refactor the child_info_data struct, it's an implementation detail that shouldn't be in the server struct, and not used to communicate data between caller and callee - remove the magic value from that struct (not sure what it was good for), and instead add handling of short reads. - add another value to the structure, cow_type, to indicate if the report is for the new rdb_active_cow_size field, or it's the last report of a successful operation - add new Module API to report the active COW - add more asserts variants to test.tcl	2021-01-07 16:14:29 +02:00
Oran Agra	cfb449cc80	Sanitize dump payload: excessive free on dup zset fields (#8189 )	2020-12-14 17:10:31 +02:00
Oran Agra	7ca00d694d	Sanitize dump payload: fail RESTORE if memory allocation fails When RDB input attempts to make a huge memory allocation that fails, RESTORE should fail gracefully rather than die with panic	2020-12-06 14:54:34 +02:00
Oran Agra	3716950cfc	Sanitize dump payload: validate no duplicate records in hash/zset/intset If RESTORE passes successfully with full sanitization, we can't affort to crash later on assertion due to duplicate records in a hash when converting it form ziplist to dict. This means that when doing full sanitization, we must make sure there are no duplicate records in any of the collections.	2020-12-06 14:54:34 +02:00
Oran Agra	c31055db61	Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed The test creates keys with various encodings, DUMP them, corrupt the payload and RESTORES it. It utilizes the recently added use-exit-on-panic config to distinguish between asserts and segfaults. If the restore succeeds, it runs random commands on the key to attempt to trigger a crash. It runs in two modes, one with deep sanitation enabled and one without. In the first one we don't expect any assertions or segfaults, in the second one we expect assertions, but no segfaults. We also check for leaks and invalid reads using valgrind, and if we find them we print the commands that lead to that issue. Changes in the code (other than the test): - Replace a few NPD (null pointer deference) flows and division by zero with an assertion, so that it doesn't fail the test. (since we set the server to use `exit` rather than `abort` on assertion). - Fix quite a lot of flows in rdb.c that could have lead to memory leaks in RESTORE command (since it now responds with an error rather than panic) - Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need to bother with faking a valid checksum - Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to run in the crash report (see comments in the code) - fix a missing boundary check in lzf_decompress test suite infra improvements: - be able to run valgrind checks before the process terminates - rotate log files when restarting servers	2020-12-06 14:54:34 +02:00
Oran Agra	01c13bddea	Sanitize dump payload: improve tests of ziplist and stream encodings - improve stream rdb encoding test to include more types of stream metadata - add test to cover various ziplist encoding entries (although it does look like the stress test above it is able to find some too - add another test for ziplist encoding for hash with full sanitization - add similar ziplist encoding tests for list	2020-12-06 14:54:34 +02:00
Oran Agra	ca1c182567	Sanitize dump payload: ziplist, listpack, zipmap, intset, stream When loading an encoded payload we will at least do a shallow validation to check that the size that's encoded in the payload matches the size of the allocation. This let's us later use this encoded size to make sure the various offsets inside encoded payload don't reach outside the allocation, if they do, we'll assert/panic, but at least we won't segfault or smear memory. We can also do 'deep' validation which runs on all the records of the encoded payload and validates that they don't contain invalid offsets. This lets us detect corruptions early and reject a RESTORE command rather than accepting it and asserting (crashing) later when accessing that payload via some command. configuration: - adding ACL flag skip-sanitize-payload - adding config sanitize-dump-payload [yes/no/clients] For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't being slowed down by these sanitation, so i'm setting the default value to `no`, but later on it should be set to `clients` by default. changes: - changing rdbReportError not to `exit` in RESTORE command - adding a new stat to be able to later check if cluster MIGRATE isn't being slowed down by sanitation.	2020-12-06 14:54:34 +02:00
luhuachao	7885faf18b	Modify help msg PING_BULK to PING_MBULK in benchmark (#8109 ) As described in redis-benchamrk help message 'The test names are the same as the ones produced as output.', In redis-benchmark output, we can only see PING_BULK, but the cmd `redis-benchmark -t ping_bulk` is not supported. We have to run it with ping_mbulk which is not user friendly.	2020-12-02 13:17:25 +02:00
filipe oliveira	10b5006934	Enable specifying TLS ciphers(suites) in redis-cli/redis-benchmark (#8005 ) Enable specifying the preferred ciphers and/or ciphersuites for redis-cli/redis-benchmark. Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>	2020-11-04 14:49:15 +02:00
Meir Shpilraien (Spielrein)	762be79f0b	Disable new SIGABRT test on valgrind (#8013 ) The crash reports cause false-positive warnings when run with valgrind.	2020-11-04 13:13:55 +02:00
Meir Shpilraien (Spielrein)	f210e197f3	Added crash report on SIGABRT (#8004 ) The reason that we want to get a full crash report on SIGABRT is that the jmalloc, when detecting a corruption, calls abort(). This will cause the Redis to exist silently without any report and without any way to analyze what happened.	2020-11-03 14:59:21 +02:00
Oran Agra	9122379abc	Propagate GETSET and SET-GET as SET (#7957 ) - Generates a more backwards compatible command stream - Slightly more efficient execution in replica/AOF - Add a test for coverage	2020-11-03 14:56:57 +02:00
Wang Yuan	dc899c4c88	Fix timing dependence in replication tcl tests (#7969 ) Remove 'fork child $pid' log in replication.tcl	2020-10-27 09:36:42 +02:00
filipe oliveira	01acfa71ca	redis-benchmark: add tests, --version, a minor bug fixes (#7947 ) - add test suite coverage for redis-benchmark - add --version (similar to what redis-cli has) - fix bug sending more requests than intended when pipeline > 1. - when done sending requests, avoid freeing client in the write handler, in theory before responses are received (probably dead code since the read handler will call clientDone first) Co-authored-by: Oran Agra <oran@redislabs.com>	2020-10-26 08:04:59 +02:00
Yossi Gottlieb	843a13e88f	Add a --no-latency tests flag. (#7939 ) Useful for running tests on systems which may be way slower than usual.	2020-10-22 11:10:53 +03:00
Felipe Machado	c3f9e01794	Adds new pop-push commands (LMOVE, BLMOVE) (#6929 ) Adding [B]LMOVE <src> <dst> RIGHT\|LEFT RIGHT\|LEFT. deprecating [B]RPOPLPUSH. Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH, but on BLMOVE RIGHT LEFT we'll propagate an LMOVE improvement to existing tests - Replace "after 1000" with "wait_for_condition" when wait for clients to block/unblock. - Add a pre-existing element to target list on basic tests so that we can check if the new element was added to the correct side of the list. - check command stats on the replica to make sure the right command was replicated Co-authored-by: Oran Agra <oran@redislabs.com>	2020-10-08 08:33:17 +03:00
Wang Yuan	1bb5794a1f	Kill disk-based fork child when all replicas drop and 'save' is not enabled (#7819 ) When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit), It may be good to kill the bgsave child. in diskless replication it already happens, but in disk-based, the child may still serve some purpose (for persistence). By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave. We do that only if rdb persistence wasn't enabled in the configuration. Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.	2020-09-22 09:47:58 +03:00
Oran Agra	1c71038540	Squash merging 125 typo/grammar/comment/doc PRs (#7773 ) List of squashed commits or PRs =============================== commit `66801ea` Author: hwware <wen.hui.ware@gmail.com> Date: Mon Jan 13 00:54:31 2020 -0500 typo fix in acl.c commit `46f55db` Author: Itamar Haber <itamar@redislabs.com> Date: Sun Sep 6 18:24:11 2020 +0300 Updates a couple of comments Specifically: * RM_AutoMemory completed instead of pointing to docs * Updated link to custom type doc commit `61a2aa0` Author: xindoo <xindoo@qq.com> Date: Tue Sep 1 19:24:59 2020 +0800 Correct errors in code comments commit `a5871d1` Author: yz1509 <pro-756@qq.com> Date: Tue Sep 1 18:36:06 2020 +0800 fix typos in module.c commit `41eede7` Author: bookug <bookug@qq.com> Date: Sat Aug 15 01:11:33 2020 +0800 docs: fix typos in comments commit `c303c84` Author: lazy-snail <ws.niu@outlook.com> Date: Fri Aug 7 11:15:44 2020 +0800 fix spelling in redis.conf commit `1eb76bf` Author: zhujian <zhujianxyz@gmail.com> Date: Thu Aug 6 15:22:10 2020 +0800 add a missing 'n' in comment commit `1530ec2` Author: Daniel Dai <764122422@qq.com> Date: Mon Jul 27 00:46:35 2020 -0400 fix spelling in tracking.c commit `e517b31` Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:32 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit `c300eff` Author: Hunter-Chen <huntcool001@gmail.com> Date: Fri Jul 17 22:33:23 2020 +0800 Update redis.conf Co-authored-by: Itamar Haber <itamar@redislabs.com> commit `4c058a8` Author: 陈浩鹏 <chenhaopeng@heytea.com> Date: Thu Jun 25 19:00:56 2020 +0800 Grammar fix and clarification commit `5fcaa81` Author: bodong.ybd <bodong.ybd@alibaba-inc.com> Date: Fri Jun 19 10:09:00 2020 +0800 Fix typos commit `4caca9a` Author: Pruthvi P <pruthvi@ixigo.com> Date: Fri May 22 00:33:22 2020 +0530 Fix typo eviciton => eviction commit `b2a25f6` Author: Brad Dunbar <dunbarb2@gmail.com> Date: Sun May 17 12:39:59 2020 -0400 Fix a typo. commit `12842ae` Author: hwware <wen.hui.ware@gmail.com> Date: Sun May 3 17:16:59 2020 -0400 fix spelling in redis conf commit `ddba07c` Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Sat May 2 23:25:34 2020 +0100 Correct a "conflicts" spelling error. commit `8fc7bf2` Author: Nao YONASHIRO <yonashiro@r.recruit.co.jp> Date: Thu Apr 30 10:25:27 2020 +0900 docs: fix EXPIRE_FAST_CYCLE_DURATION to ACTIVE_EXPIRE_CYCLE_FAST_DURATION commit `9b2b67a` Author: Brad Dunbar <dunbarb2@gmail.com> Date: Fri Apr 24 11:46:22 2020 -0400 Fix a typo. commit `0746f10` Author: devilinrust <63737265+devilinrust@users.noreply.github.com> Date: Thu Apr 16 00:17:53 2020 +0200 Fix typos in server.c commit `92b588d` Author: benjessop12 <56115861+benjessop12@users.noreply.github.com> Date: Mon Apr 13 13:43:55 2020 +0100 Fix spelling mistake in lazyfree.c commit `1da37aa` Merge: `2d4ba28` `af347a8` Author: hwware <wen.hui.ware@gmail.com> Date: Thu Mar 5 22:41:31 2020 -0500 Merge remote-tracking branch 'upstream/unstable' into expiretypofix commit `2d4ba28` Author: hwware <wen.hui.ware@gmail.com> Date: Mon Mar 2 00:09:40 2020 -0500 fix typo in expire.c commit `1a746f7` Author: SennoYuki <minakami1yuki@gmail.com> Date: Thu Feb 27 16:54:32 2020 +0800 fix typo commit `8599b1a` Author: dongheejeong <donghee950403@gmail.com> Date: Sun Feb 16 20:31:43 2020 +0000 Fix typo in server.c commit `f38d4e8` Author: hwware <wen.hui.ware@gmail.com> Date: Sun Feb 2 22:58:38 2020 -0500 fix typo in evict.c commit `fe143fc` Author: Leo Murillo <leonardo.murillo@gmail.com> Date: Sun Feb 2 01:57:22 2020 -0600 Fix a few typos in redis.conf commit `1ab4d21` Author: viraja1 <anchan.viraj@gmail.com> Date: Fri Dec 27 17:15:58 2019 +0530 Fix typo in Latency API docstring commit `ca1f70e` Author: gosth <danxuedexing@qq.com> Date: Wed Dec 18 15:18:02 2019 +0800 fix typo in sort.c commit `a57c06b` Author: ZYunH <zyunhjob@163.com> Date: Mon Dec 16 22:28:46 2019 +0800 fix-zset-typo commit `b8c92b5` Author: git-hulk <hulk.website@gmail.com> Date: Mon Dec 16 15:51:42 2019 +0800 FIX: typo in cluster.c, onformation->information commit `9dd981c` Author: wujm2007 <jim.wujm@gmail.com> Date: Mon Dec 16 09:37:52 2019 +0800 Fix typo commit `e132d7a` Author: Sebastien Williams-Wynn <s.williamswynn.mail@gmail.com> Date: Fri Nov 15 00:14:07 2019 +0000 Minor typo change commit `47f44d5` Author: happynote3966 <01ssrmikururudevice01@gmail.com> Date: Mon Nov 11 22:08:48 2019 +0900 fix comment typo in redis-cli.c commit `b8bdb0d` Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 18:00:17 2019 +0800 Fix a spelling mistake of comments in defragDictBucketCallback commit `0def46a` Author: fulei <fulei@kuaishou.com> Date: Wed Oct 16 13:09:27 2019 +0800 fix some spelling mistakes of comments in defrag.c commit `f3596fd` Author: Phil Rajchgot <tophil@outlook.com> Date: Sun Oct 13 02:02:32 2019 -0400 Typo and grammar fixes Redis and its documentation are great -- just wanted to submit a few corrections in the spirit of Hacktoberfest. Thanks for all your work on this project. I use it all the time and it works beautifully. commit `2b928cd` Author: KangZhiDong <worldkzd@gmail.com> Date: Sun Sep 1 07:03:11 2019 +0800 fix typos commit `33aea14` Author: Axlgrep <axlgrep@gmail.com> Date: Tue Aug 27 11:02:18 2019 +0800 Fixed eviction spelling issues commit `e282a80` Author: Simen Flatby <simen@oms.no> Date: Tue Aug 20 15:25:51 2019 +0200 Update comments to reflect prop name In the comments the prop is referenced as replica-validity-factor, but it is really named cluster-replica-validity-factor. commit `74d1f9a` Author: Jim Green <jimgreen2013@qq.com> Date: Tue Aug 20 20:00:31 2019 +0800 fix comment error, the code is ok commit `eea1407` Author: Liao Tonglang <liaotonglang@gmail.com> Date: Fri May 31 10:16:18 2019 +0800 typo fix fix cna't to can't commit `0da553c` Author: KAWACHI Takashi <tkawachi@gmail.com> Date: Wed Jul 17 00:38:16 2019 +0900 Fix typo commit `7fc8fb6` Author: Michael Prokop <mika@grml.org> Date: Tue May 28 17:58:42 2019 +0200 Typo fixes s/familar/familiar/ s/compatiblity/compatibility/ s/ ot / to / s/itsef/itself/ commit `5f46c9d` Author: zhumoing <34539422+zhumoing@users.noreply.github.com> Date: Tue May 21 21:16:50 2019 +0800 typo-fixes typo-fixes commit `321dfe1` Author: wxisme <850885154@qq.com> Date: Sat Mar 16 15:10:55 2019 +0800 typo fix commit `b4fb131` Merge: `267e0e6` `3df1eb8` Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Fri Feb 8 22:55:45 2019 +0200 Merge branch 'unstable' of antirez/redis into unstable commit `267e0e6` Author: Nikitas Bastas <nikitasbst@gmail.com> Date: Wed Jan 30 21:26:04 2019 +0200 Minor typo fix commit `30544e7` Author: inshal96 <39904558+inshal96@users.noreply.github.com> Date: Fri Jan 4 16:54:50 2019 +0500 remove an extra 'a' in the comments commit `337969d` Author: BrotherGao <yangdongheng11@gmail.com> Date: Sat Dec 29 12:37:29 2018 +0800 fix typo in redis.conf commit `9f4b121` Merge: `423a030` `e504583` Author: BrotherGao <yangdongheng@xiaomi.com> Date: Sat Dec 29 11:41:12 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit `423a030` Merge: `42b02b7` `46a51cd` Author: 杨东衡 <yangdongheng@xiaomi.com> Date: Tue Dec 4 23:56:11 2018 +0800 Merge branch 'unstable' of antirez/redis into unstable commit `42b02b7` Merge: `68c0e6e` `b8febe6` Author: Dongheng Yang <yangdongheng11@gmail.com> Date: Sun Oct 28 15:54:23 2018 +0800 Merge pull request #1 from antirez/unstable update local data commit `714b589` Author: Christian <crifei93@gmail.com> Date: Fri Dec 28 01:17:26 2018 +0100 fix typo "resulution" commit `e23259d` Author: garenchan <1412950785@qq.com> Date: Wed Dec 26 09:58:35 2018 +0800 fix typo: segfauls -> segfault commit `a9359f8` Author: xjp <jianping_xie@aliyun.com> Date: Tue Dec 18 17:31:44 2018 +0800 Fixed REDISMODULE_H spell bug commit `a12c3e4` Author: jdiaz <jrd.palacios@gmail.com> Date: Sat Dec 15 23:39:52 2018 -0600 Fixes hyperloglog hash function comment block description commit `770eb11` Author: 林上耀 <1210tom@163.com> Date: Sun Nov 25 17:16:10 2018 +0800 fix typo commit `fd97fbb` Author: Chris Lamb <chris@chris-lamb.co.uk> Date: Fri Nov 23 17:14:01 2018 +0100 Correct "unsupported" typo. commit `a85522d` Author: Jungnam Lee <jungnam.lee@oracle.com> Date: Thu Nov 8 23:01:29 2018 +0900 fix typo in test comments commit `ade8007` Author: Arun Kumar <palerdot@users.noreply.github.com> Date: Tue Oct 23 16:56:35 2018 +0530 Fixed grammatical typo Fixed typo for word 'dictionary' commit `869ee39` Author: Hamid Alaei <hamid.a85@gmail.com> Date: Sun Aug 12 16:40:02 2018 +0430 fix documentations: (ThreadSafeContextStart/Stop -> ThreadSafeContextLock/Unlock), minor typo commit `f89d158` Author: Mayank Jain <mayankjain255@gmail.com> Date: Tue Jul 31 23:01:21 2018 +0530 Updated README.md with some spelling corrections. Made correction in spelling of some misspelled words. commit `892198e` Author: dsomeshwar <someshwar.dhayalan@gmail.com> Date: Sat Jul 21 23:23:04 2018 +0530 typo fix commit `8a4d780` Author: Itamar Haber <itamar@redislabs.com> Date: Mon Apr 30 02:06:52 2018 +0300 Fixes some typos commit `e3acef6` Author: Noah Rosamilia <ivoahivoah@gmail.com> Date: Sat Mar 3 23:41:21 2018 -0500 Fix typo in /deps/README.md commit `04442fb` Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:32:42 2018 +0800 Fix typo in readSyncBulkPayload() comment. commit `9f36880` Author: WuYunlong <xzsyeb@126.com> Date: Sat Mar 3 10:20:37 2018 +0800 replication.c comment: run_id -> replid. commit `f866b4a` Author: Francesco 'makevoid' Canessa <makevoid@gmail.com> Date: Thu Feb 22 22:01:56 2018 +0000 fix comment typo in server.c commit `0ebc69b` Author: 줍 <jubee0124@gmail.com> Date: Mon Feb 12 16:38:48 2018 +0900 Fix typo in redis.conf Fix `five behaviors` to `eight behaviors` in [this sentence ](antirez/redis@unstable/redis.conf#L564) commit `b50a620` Author: martinbroadhurst <martinbroadhurst@users.noreply.github.com> Date: Thu Dec 28 12:07:30 2017 +0000 Fix typo in valgrind.sup commit `7d8f349` Author: Peter Boughton <peter@sorcerersisle.com> Date: Mon Nov 27 19:52:19 2017 +0000 Update CONTRIBUTING; refer doc updates to redis-doc repo. commit `02dec7e` Author: Klauswk <klauswk1@hotmail.com> Date: Tue Oct 24 16:18:38 2017 -0200 Fix typo in comment commit `e1efbc8` Author: chenshi <baiwfg2@gmail.com> Date: Tue Oct 3 18:26:30 2017 +0800 Correct two spelling errors of comments commit `93327d8` Author: spacewander <spacewanderlzx@gmail.com> Date: Wed Sep 13 16:47:24 2017 +0800 Update the comment for OBJ_ENCODING_EMBSTR_SIZE_LIMIT's value The value of OBJ_ENCODING_EMBSTR_SIZE_LIMIT is 44 now instead of 39. commit `63d361f` Author: spacewander <spacewanderlzx@gmail.com> Date: Tue Sep 12 15:06:42 2017 +0800 Fix <prevlen> related doc in ziplist.c According to the definition of ZIP_BIG_PREVLEN and other related code, the guard of single byte <prevlen> should be 254 instead of 255. commit `ebe228d` Author: hanael80 <hanael80@gmail.com> Date: Tue Aug 15 09:09:40 2017 +0900 Fix typo commit `6b696e6` Author: Matt Robenolt <matt@ydekproductions.com> Date: Mon Aug 14 14:50:47 2017 -0700 Fix typo in LATENCY DOCTOR output commit `a2ec6ae` Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 15 14:15:16 2017 +0800 Fix a typo: form => from commit `3ab7699` Author: caosiyang <caosiyang@qiyi.com> Date: Thu Aug 10 18:40:33 2017 +0800 Fix a typo: replicationFeedSlavesFromMaster() => replicationFeedSlavesFromMasterStream() commit `72d43ef` Author: caosiyang <caosiyang@qiyi.com> Date: Tue Aug 8 15:57:25 2017 +0800 fix a typo: servewr => server commit `707c958` Author: Bo Cai <charpty@gmail.com> Date: Wed Jul 26 21:49:42 2017 +0800 redis-cli.c typo: conut -> count. Signed-off-by: Bo Cai <charpty@gmail.com> commit `b9385b2` Author: JackDrogon <jack.xsuperman@gmail.com> Date: Fri Jun 30 14:22:31 2017 +0800 Fix some spell problems commit `20d9230` Author: akosel <aaronjkosel@gmail.com> Date: Sun Jun 4 19:35:13 2017 -0500 Fix typo commit `b167bfc` Author: Krzysiek Witkowicz <krzysiekwitkowicz@gmail.com> Date: Mon May 22 21:32:27 2017 +0100 Fix #4008 small typo in comment commit `2b78ac8` Author: Jake Clarkson <jacobwclarkson@gmail.com> Date: Wed Apr 26 15:49:50 2017 +0100 Correct typo in tests/unit/hyperloglog.tcl commit `b0f1cdb` Author: Qi Luo <qiluo-msft@users.noreply.github.com> Date: Wed Apr 19 14:25:18 2017 -0700 Fix typo commit `a90b0f9` Author: charsyam <charsyam@naver.com> Date: Thu Mar 16 18:19:53 2017 +0900 fix typos fix typos fix typos commit `8430a79` Author: Richard Hart <richardhart92@gmail.com> Date: Mon Mar 13 22:17:41 2017 -0400 Fixed log message typo in listenToPort. commit `481a1c2` Author: Vinod Kumar <kumar003vinod@gmail.com> Date: Sun Jan 15 23:04:51 2017 +0530 src/db.c: Correct "save" -> "safe" typo commit `586b4d3` Author: wangshaonan <wshn13@gmail.com> Date: Wed Dec 21 20:28:27 2016 +0800 Fix typo they->the in helloworld.c commit `c1c4b5e` Author: Jenner <hypxm@qq.com> Date: Mon Dec 19 16:39:46 2016 +0800 typo error commit `1ee1a3f` Author: tielei <43289893@qq.com> Date: Mon Jul 18 13:52:25 2016 +0800 fix some comments commit `11a41fb` Author: Otto Kekäläinen <otto@seravo.fi> Date: Sun Jul 3 10:23:55 2016 +0100 Fix spelling in documentation and comments commit `5fb5d82` Author: francischan <f1ancis621@gmail.com> Date: Tue Jun 28 00:19:33 2016 +0800 Fix outdated comments about redis.c file. It should now refer to server.c file. commit `6b254bc` Author: lmatt-bit <lmatt123n@gmail.com> Date: Thu Apr 21 21:45:58 2016 +0800 Refine the comment of dictRehashMilliseconds func SLAVECONF->REPLCONF in comment - by andyli029 commit `ee9869f` Author: clark.kang <charsyam@naver.com> Date: Tue Mar 22 11:09:51 2016 +0900 fix typos commit `f7b3b11` Author: Harisankar H <harisankarh@gmail.com> Date: Wed Mar 9 11:49:42 2016 +0530 Typo correction: "faield" --> "failed" Typo correction: "faield" --> "failed" commit `3fd40fc` Author: Itamar Haber <itamar@redislabs.com> Date: Thu Feb 25 10:31:51 2016 +0200 Fixes a typo in comments commit `621c160` Author: Prayag Verma <prayag.verma@gmail.com> Date: Mon Feb 1 12:36:20 2016 +0530 Fix typo in Readme.md Spelling mistakes - `eviciton` > `eviction` `familar` > `familiar` commit `d7d07d6` Author: WonCheol Lee <toctoc21c@gmail.com> Date: Wed Dec 30 15:11:34 2015 +0900 Typo fixed commit `a4dade7` Author: Felix Bünemann <buenemann@louis.info> Date: Mon Dec 28 11:02:55 2015 +0100 [ci skip] Improve supervised upstart config docs This mentions that "expect stop" is required for supervised upstart to work correctly. See http://upstart.ubuntu.com/cookbook/#expect-stop for an explanation. commit `d9caba9` Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:30:03 2015 +1100 README: Remove trailing whitespace commit `72d42e5` Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:32 2015 +1100 README: Fix typo. th => the commit `dd6e957` Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:29:20 2015 +1100 README: Fix typo. familar => familiar commit `3a12b23` Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:28:54 2015 +1100 README: Fix typo. eviciton => eviction commit `2d1d03b` Author: daurnimator <quae@daurnimator.com> Date: Mon Dec 21 18:21:45 2015 +1100 README: Fix typo. sever => server commit `3973b06` Author: Itamar Haber <itamar@garantiadata.com> Date: Sat Dec 19 17:01:20 2015 +0200 Typo fix commit `4f2e460` Author: Steve Gao <fu@2token.com> Date: Fri Dec 4 10:22:05 2015 +0800 Update README - fix typos commit `b21667c` Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:48:37 2015 +0800 delete redundancy color judge in sdscatcolor commit `88894c7` Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 22:14:42 2015 +0800 the example output shoule be HelloWorld commit `2763470` Author: binyan <binbin.yan@nokia.com> Date: Wed Dec 2 17:41:39 2015 +0800 modify error word keyevente Signed-off-by: binyan <binbin.yan@nokia.com> commit `0847b3d` Author: Bruno Martins <bscmartins@gmail.com> Date: Wed Nov 4 11:37:01 2015 +0000 typo commit `bbb9e9e` Author: dawedawe <dawedawe@gmx.de> Date: Fri Mar 27 00:46:41 2015 +0100 typo: zimap -> zipmap commit `5ed297e` Author: Axel Advento <badwolf.bloodseeker.rev@gmail.com> Date: Tue Mar 3 15:58:29 2015 +0800 Fix 'salve' typos to 'slave' commit `edec9d6` Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Wed Jun 12 14:12:47 2019 +0200 Update README.md Co-Authored-By: Qix <Qix-@users.noreply.github.com> commit `692a7af` Author: LudwikJaniuk <ludvig.janiuk@gmail.com> Date: Tue May 28 14:32:04 2019 +0200 grammar commit `d962b0a` Author: Nick Frost <nickfrostatx@gmail.com> Date: Wed Jul 20 15:17:12 2016 -0700 Minor grammar fix commit 24fff01aaccaf5956973ada8c50ceb1462e211c6 (typos) Author: Chad Miller <chadm@squareup.com> Date: Tue Sep 8 13:46:11 2020 -0400 Fix faulty comment about operation of unlink() commit 3cd5c1f3326c52aa552ada7ec797c6bb16452355 Author: Kevin <kevin.xgr@gmail.com> Date: Wed Nov 20 00:13:50 2019 +0800 Fix typo in server.c. From `a83af59` Mon Sep 17 00:00:00 2001 From: wuwo <wuwo@wacai.com> Date: Fri, 17 Mar 2017 20:37:45 +0800 Subject: [PATCH] falure to failure From `c961896` Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=B7=A6=E6=87=B6?= <veficos@gmail.com> Date: Sat, 27 May 2017 15:33:04 +0800 Subject: [PATCH] fix typo From `e600ef2` Mon Sep 17 00:00:00 2001 From: "rui.zou" <rui.zou@yunify.com> Date: Sat, 30 Sep 2017 12:38:15 +0800 Subject: [PATCH] fix a typo From `c7d07fa` Mon Sep 17 00:00:00 2001 From: Alexandre Perrin <alex@kaworu.ch> Date: Thu, 16 Aug 2018 10:35:31 +0200 Subject: [PATCH] deps README.md typo From `b25cb67` Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 10:55:37 +0300 Subject: [PATCH 1/2] fix typos in header From `ad28ca6` Mon Sep 17 00:00:00 2001 From: Guy Korland <gkorland@gmail.com> Date: Wed, 26 Sep 2018 11:02:36 +0300 Subject: [PATCH 2/2] fix typos commit 34924cdedd8552466fc22c1168d49236cb7ee915 Author: Adrian Lynch <adi_ady_ade@hotmail.com> Date: Sat Apr 4 21:59:15 2015 +0100 Typos fixed commit `fd2a1e7` Author: Jan <jsteemann@users.noreply.github.com> Date: Sat Oct 27 19:13:01 2018 +0200 Fix typos Fix typos commit e14e47c1a234b53b0e103c5f6a1c61481cbcbb02 Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:30:07 2019 -0500 Fix multiple misspellings of "following" commit 79b948ce2dac6b453fe80995abbcaac04c213d5a Author: Andy Lester <andy@petdance.com> Date: Fri Aug 2 22:24:28 2019 -0500 Fix misspelling of create-cluster commit 1fffde52666dc99ab35efbd31071a4c008cb5a71 Author: Andy Lester <andy@petdance.com> Date: Wed Jul 31 17:57:56 2019 -0500 Fix typos commit 204c9ba9651e9e05fd73936b452b9a30be456cfe Author: Xiaobo Zhu <xiaobo.zhu@shopee.com> Date: Tue Aug 13 22:19:25 2019 +0800 fix typos Squashed commit of the following: commit `1d9aaf8` Author: danmedani <danmedani@gmail.com> Date: Sun Aug 2 11:40:26 2015 -0700 README typo fix. Squashed commit of the following: commit `32bfa7c` Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Jul 6 21:15:08 2015 +0200 Fixed grammer Squashed commit of the following: commit `b24f69c` Author: Sisir Koppaka <sisir.koppaka@gmail.com> Date: Mon Mar 2 22:38:45 2015 -0500 utils/hashtable/rehashing.c: Fix typos Squashed commit of the following: commit `4e04082` Author: Erik Dubbelboer <erik@dubbelboer.com> Date: Mon Mar 23 08:22:21 2015 +0000 Small config file documentation improvements Squashed commit of the following: commit `acb8773` Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:52:48 2015 -0700 Typo and grammar fixes in readme commit `2eb75b6` Author: ctd1500 <ctd1500@gmail.com> Date: Fri May 8 01:36:18 2015 -0700 fixed redis.conf comment Squashed commit of the following: commit `a8249a2` Author: Masahiko Sawada <sawada.mshk@gmail.com> Date: Fri Dec 11 11:39:52 2015 +0530 Revise correction of typos. Squashed commit of the following: commit `3c02028` Author: zhaojun11 <zhaojun11@jd.com> Date: Wed Jan 17 19:05:28 2018 +0800 Fix typos include two code typos in cluster.c and latency.c Squashed commit of the following: commit `9dba47c` Author: q191201771 <191201771@qq.com> Date: Sat Jan 4 11:31:04 2020 +0800 fix function listCreate comment in adlist.c Update src/server.c commit 2c7c2cb536e78dd211b1ac6f7bda00f0f54faaeb Author: charpty <charpty@gmail.com> Date: Tue May 1 23:16:59 2018 +0800 server.c typo: modules system dictionary type comment Signed-off-by: charpty <charpty@gmail.com> commit a8395323fb63cb59cb3591cb0f0c8edb7c29a680 Author: Itamar Haber <itamar@redislabs.com> Date: Sun May 6 00:25:18 2018 +0300 Updates test_helper.tcl's help with undocumented options Specifically: * Host * Port * Client commit bde6f9ced15755cd6407b4af7d601b030f36d60b Author: wxisme <850885154@qq.com> Date: Wed Aug 8 15:19:19 2018 +0800 fix comments in deps files commit 3172474ba991532ab799ee1873439f3402412331 Author: wxisme <850885154@qq.com> Date: Wed Aug 8 14:33:49 2018 +0800 fix some comments commit 01b6f2b6858b5cf2ce4ad5092d2c746e755f53f0 Author: Thor Juhasz <thor@juhasz.pro> Date: Sun Nov 18 14:37:41 2018 +0100 Minor fixes to comments Found some parts a little unclear on a first read, which prompted me to have a better look at the file and fix some minor things I noticed. Fixing minor typos and grammar. There are no changes to configuration options. These changes are only meant to help the user better understand the explanations to the various configuration options	2020-09-10 13:43:38 +03:00
Yossi Gottlieb	918abd7276	Tests: clean up stale .cli files. (#7768 )	2020-09-09 12:30:43 +03:00
Oran Agra	573246f73c	if diskless repl child is killed, make sure to reap the pid (#7742 ) Starting redis 6.0 and the changes we made to the diskless master to be suitable for TLS, I made the master avoid reaping (wait3) the pid of the child until we know all replicas are done reading their rdb. I did that in order to avoid a state where the rdb_child_pid is -1 but we don't yet want to start another fork (still busy serving that data to replicas). It turns out that the solution used so far was problematic in case the fork child was being killed (e.g. by the kernel OOM killer), in that case there's a chance that we currently disabled the read event on the rdb pipe, since we're waiting for a replica to become writable again. and in that scenario the master would have never realized the child exited, and the replica will remain hung too. Note that there's no mechanism to detect a hung replica while it's in rdb transfer state. The solution here is to add another pipe which is used by the parent to tell the child it is safe to exit. this mean that when the child exits, for whatever reason, it is safe to reap it. Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was part of #6271 (Accelerate diskless master connections) but was dropped when that PR was rebased after the TLS fork/pipe changes (`5a47794`). Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK has chance to detect that the child exited, it should be the one to call it so that we don't have to wait for cron (server.hz) to do that.	2020-09-06 16:43:57 +03:00
Oran Agra	1b7ba44e79	test infra - wait_done_loading reduce code duplication in aof.tcl. move creation of clients into the test so that it can be skipped	2020-09-06 09:59:19 +03:00
Oran Agra	e3e69c25fd	test infra - reduce disk space usage this is important when running a test with --loop	2020-09-06 09:59:19 +03:00
Yossi Gottlieb	f80f3f492a	Tests: fix redis-cli with remote hosts. (#7693 )	2020-08-23 10:17:43 +03:00
Oran Agra	c17e597d05	Accelerate diskless master connections, and general re-connections (#6271 ) Diskless master has some inherent latencies. 1) fork starts with delay from cron rather than immediately 2) replica is put online only after an ACK. but the ACK was sent only once a second. 3) but even if it would arrive immediately, it will not register in case cron didn't yet detect that the fork is done. Besides that, when a replica disconnects, it doesn't immediately attempts to re-connect, it waits for replication cron (one per second). in case it was already online, it may be important to try to re-connect as soon as possible, so that the backlog at the master doesn't vanish. In case it disconnected during rdb transfer, one can argue that it's not very important to re-connect immediately, but this is needed for the "diskless loading short read" test to be able to run 100 iterations in 5 seconds, rather than 3 (waiting for replication cron re-connection) changes in this commit: 1) sync command starts a fork immediately if no sync_delay is configured 2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron) 3) when a replica unexpectedly disconnets, it immediately tries to re-connect rather than waiting 1s 4) when when a child exits, if there is another replica waiting, we spawn a new one right away, instead of waiting for 1s replicationCron. 5) added a call to connectWithMaster from replicationSetMaster. which is called from the REPLICAOF command but also in 3 places in cluster.c, in all of these the connection attempt will now be immediate instead of delayed by 1 second. side note: we can add a call to rdbPipeReadHandler in replconfCommand when getting a REPLCONF ACK from the replica to solve a race where the replica got the entire rdb and EOF marker before we detected that the pipe was closed. in the test i did see this race happens in one about of some 300 runs, but i concluded that this race is unlikely in real life (where the replica is on another host and we're more likely to first detect the pipe was closed. the test runs 100 iterations in 3 seconds, so in some cases it'll take 4 seconds instead (waiting for another REPLCONF ACK). Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave Now that CheckChildrenDone is calling the new replicationStartPendingFork (extracted from serverCron) there's actually no need to call startBgsaveForReplication from updateSlavesWaitingForBgsave anymore, since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is calling replicationStartPendingFork that handles that anyway. The code in updateSlavesWaitingForBgsave had a bug in which it ignored repl-diskless-sync-delay, but removing that code shows that this bug was hiding another bug, which is that the max_idle should have used >= and not >, this one second delay has a big impact on my new test.	2020-08-06 16:53:06 +03:00
Oran Agra	824bd2ac11	fix new rdb test failing on timing issues (#7604 ) apparenlty on github actions sometimes 500ms is not enough	2020-08-04 08:53:50 +03:00
Oran Agra	109b5ccdcd	Fix failing tests due to issues with wait_for_log_message (#7572 ) - the test now waits for specific set of log messages rather than wait for timeout looking for just one message. - we don't wanna sample the current length of the log after an action, due to a race, we need to start the search from the line number of the last message we where waiting for. - when attempting to trigger a full sync, use multi-exec to avoid a race where the replica manages to re-connect before we completed the set of actions that should force a full sync. - fix verify_log_message which was broken and unused	2020-07-28 11:15:29 +03:00
Oran Agra	8a57969fd7	Stabilize bgsave test that sometimes fails with valgrind (#7559 ) on ci.redis.io the test fails a lot, reporting that bgsave didn't end. increaseing the timeout we wait for that bgsave to get aborted. in addition to that, i also verify that it indeed got aborted by checking that the save counter wasn't reset. add another test to verify that a successful bgsave indeed resets the change counter.	2020-07-23 13:06:24 +03:00
Yossi Gottlieb	f57e844b2e	Tests: drop TCL 8.6 dependency. (#7548 ) This re-implements the redis-cli --pipe test so it no longer depends on a close feature available only in TCL 8.6. Basically what this test does is run redis-cli --pipe, generates a bunch of commands and pipes them through redis-cli, and inspects the result in both Redis and the redis-cli output. To do that, we need to close stdin for redis-cli to indicate we're done so it can flush its buffers and exit. TCL has bi-directional channels can only offers a way to "one-way close" a channel with TCL 8.6. To work around that, we now generate the commands into a file and feed that file to redis-cli directly. As we're writing to an actual file, the number of commands is now reduced.	2020-07-21 14:17:14 +03:00
Oran Agra	254c962554	redis-cli tests, fix valgrind timing issue (#7519 ) this test when run with valgrind on github actions takes 160 seconds	2020-07-14 18:04:08 +03:00
Oran Agra	e5227aab89	fix recently added time sensitive tests failing with valgrind (#7512 ) interestingly the latency monitor test fails because valgrind is slow enough so that the time inside PEXPIREAT command from the moment of the first mstime() call to get the basetime until checkAlreadyExpired calls mstime() again is more than 1ms, and that test was too sensitive. using this opportunity to speed up the test (unrelated to the failure) the fix is just the longer time passed to PEXPIRE.	2020-07-13 16:40:03 +03:00
Yossi Gottlieb	d9f970d8d3	TLS: Add missing redis-cli options. (#7456 ) * Tests: fix and reintroduce redis-cli tests. These tests have been broken and disabled for 10 years now! * TLS: add remaining redis-cli support. This adds support for the redis-cli --pipe, --rdb and --replica options previously unsupported in --tls mode. * Fix writeConn().	2020-07-10 10:25:55 +03:00
Oran Agra	8e76e13472	stabilize tests that look for log lines (#7367 ) tests were sensitive to additional log lines appearing in the log causing the search to come empty handed. instead of just looking for the n last log lines, capture the log lines before performing the action, and then search from that offset.	2020-07-10 08:28:22 +03:00
Oran Agra	69ade87325	tests/valgrind: don't use debug restart (#7404 ) * tests/valgrind: don't use debug restart DEBUG REATART causes two issues: 1. it uses execve which replaces the original process and valgrind doesn't have a chance to check for errors, so leaks go unreported. 2. valgrind report invalid calls to close() which we're unable to resolve. So now the tests use restart_server mechanism in the tests, that terminates the old server and starts a new one, new PID, but same stdout, stderr. since the stderr can contain two or more valgrind report, it is not enough to just check for the absence of leaks, we also need to check for some known errors, we do both, and fail if we either find an error, or can't find a report saying there are no leaks. other changes: - when killing a server that was already terminated we check for leaks too. - adding DEBUG LEAK which was used to test it. - adding --trace-children to valgrind, although no longer needed. - since the stdout contains two or more runs, we need slightly different way of checking if the new process is up (explicitly looking for the new PID) - move the code that handles --wait-server to happen earlier (before watching the startup message in the log), and serve the restarted server too. * squashme - CR fixes	2020-07-10 08:26:52 +03:00
Oran Agra	c480af9007	fix pingoff test race	2020-05-31 15:51:52 +03:00
antirez	23f2b4d0a8	Test: take PSYNC2 test master timeout high during switch. This will likely avoid false positives due to trailing pings.	2020-05-28 10:47:30 +02:00
Oran Agra	2a8af8e675	adjust revived meaningful offset tests these tests create several edge cases that are otherwise uncovered (at least not consistently) by the test suite, so although they're no longer testing what they were meant to test, it's still a good idea to keep them in hope that they'll expose some issue in the future.	2020-05-28 09:10:51 +03:00
Oran Agra	90f3856fd5	revive meaningful offset tests	2020-05-28 08:21:24 +03:00
antirez	484cfc3d76	Another meaningful offset test removed.	2020-05-27 12:50:02 +02:00
antirez	32d0df0c1f	Remove the PSYNC2 meaningful offset test.	2020-05-27 12:47:34 +02:00
antirez	091fb64681	Test: PSYNC2 test can now show server logs.	2020-05-25 20:26:29 +02:00
Qu Chen	42f5da5d2d	Disconnect chained replicas when the replica performs PSYNC with the master always to avoid replication offset mismatch between master and chained replicas.	2020-05-21 18:42:10 -07:00
Oran Agra	75c11d7fec	fix valgrind test failure in replication test in `b4416280c` i added more keys to that test to make it run longer but in valgrind this now means the test times out, give valgrind more time.	2020-05-18 10:26:53 +03:00
antirez	0ca2f4f824	Merge branch 'unstable' of github.com:/antirez/redis into unstable	2020-05-17 18:24:48 +02:00
antirez	96bb0c9471	Improve the PSYNC2 test reliability.	2020-05-17 18:24:34 +02:00
Oran Agra	357aace895	add regression test for the race in #7205 with the original version of 6.0.0, this test detects an excessive full sync. with the fix in `1a7cd2c0e`, this test detects memory corruption, especially when using libc allocator with or without valgrind.	2020-05-17 18:26:02 +03:00
antirez	c38fd1f661	Merge branch 'free_clients_during_loading' into unstable	2020-05-14 11:28:08 +02:00
Oran Agra	b4416280cf	fix unstable replication test this test which has coverage for varoius flows of diskless master was failing randomly from time to time. the failure was: [err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl log message of 'Diskless rdb transfer, last replica dropped, killing fork child' not found what seemed to have happened is that the master didn't detect that all replicas dropped by the time the replication ended, it thought that one replica is still connected. now the test takes a few seconds longer but it seems stable.	2020-05-12 08:59:09 +03:00
Oran Agra	905e28ee87	fix redis 6.0 not freeing closed connections during loading. This bug was introduced by a recent change in which readQueryFromClient is using freeClientAsync, and despite the fact that now freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since it's not called during loading in processEventsWhileBlocked. furthermore, afterSleep was called in that case but beforeSleep wasn't. This bug also caused slowness sine the level-triggered mode of epoll kept signaling these connections as readable causing us to keep doing connRead again and again for ll of these, which keep accumulating. now both before and after sleep are called, but not all of their actions are performed during loading, some are only reserved for the main loop. fixes issue #7215	2020-05-11 11:33:46 +03:00
Oran Agra	deee2c1ef2	add daily github actions with libc malloc and valgrind * fix memlry leaks with diskless replica short read. * fix a few timing issues with valgrind runs * fix issue with valgrind and watchdog schedule signal about the valgrind WD issue: the stack trace test in logging.tcl, has issues with valgrind: ==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1: ==28808== too small or bad protection modes it seems to be some valgrind bug with SA_ONSTACK. SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed), also, not sure if it's even valid without a call to sigaltstack()	2020-05-04 09:52:20 +03:00
Oran Agra	d31c0c5264	fix loading race in psync2 tests	2020-04-28 09:18:01 +03:00
Oran Agra	4447ddc8bb	Keep track of meaningful replication offset in replicas too Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.	2020-04-27 15:52:23 +02:00
antirez	c4d7f30e25	PSYNC2: meaningful offset test.	2020-03-25 15:43:34 +01:00

1 2 3 4 5 ...

380 Commits