Commit Graph

995 Commits

Author SHA1 Message Date
ranshid 9b15dd288e
Introduce debug command to disable reply buffer resizing (#10360)
In order to resolve some flaky tests which hard rely on examine memory footprint.
we introduce the following fixes:

# Fix in client-eviction test - by @yoav-steinberg 
Sometime the libc allocator can use different size client struct allocations.
this may cause unexpected memory calculations to fail the test.

# Introduce new DEBUG command for disabling reply buffer resizing
In order to eliminate reply buffer resizing during specific tests.
we introduced the ability to disable (and enable) the resizing cron job

Co-authored-by: yoav-steinberg yoav@redislabs.com
2022-03-01 14:40:29 +02:00
Madelyn Olson 4a45386e3c
Move most of the configuration to a hashtable (#10323)
* Moved configuration storage from a list to a hash table
* Configs are returned in a non-deterministic order. It's possible that a client was relying on order (hopefully not).
* Fixed an esoteric bug where if you did a set with an alias with an error, it would throw an error indicating a bug with the preferred name for that config.
2022-02-28 23:02:47 -08:00
Meir Shpilraien (Spielrein) aa856b39f2
Sort out the mess around Lua error messages and error stats (#10329)
This PR fix 2 issues on Lua scripting:
* Server error reply statistics (some errors were counted twice).
* Error code and error strings returning from scripts (error code was missing / misplaced).

## Statistics
a Lua script user is considered part of the user application, a sophisticated transaction,
so we want to count an error even if handled silently by the script, but when it is
propagated outwards from the script we don't wanna count it twice. on the other hand,
if the script decides to throw an error on its own (using `redis.error_reply`), we wanna
count that too.
Besides, we do count the `calls` in command statistics for the commands the script calls,
we we should certainly also count `failed_calls`.
So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call
to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once.

The PR changes the error object that is raised on errors. Instead of raising a simple Lua
string, Redis will raise a Lua table in the following format:

```
{
    err='<error message (including error code)>',
    source='<User source file name>',
    line='<line where the error happned>',
    ignore_error_stats_update=true/false,
}
```

The `luaPushError` function was modified to construct the new error table as describe above.
The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise
the table on the top of the Lua stack as the error object.
The reason is that since its functionality is changed, in case some Redis branch / fork uses it,
it's better to have a compilation error than a bug.

The `source` and `line` fields are enriched by the error handler (if possible) and the
`ignore_error_stats_update` is optional and if its not present then the default value is `false`.
If `ignore_error_stats_update` is true, the error will not be counted on the error stats.

When parsing Redis call reply, each error is translated to a Lua table on the format describe
above and the `ignore_error_stats_update` field is set to `true` so we will not count errors
twice (we counted this error when we invoke the command).

The changes in this PR might have been considered as a breaking change for users that used
Lua `pcall` function. Before, the error was a string and now its a table. To keep backward
comparability the PR override the `pcall` implementation and extract the error message from
the error table and return it.

Example of the error stats update:

```
127.0.0.1:6379> lpush l 1
(integer) 2
127.0.0.1:6379> eval "return redis.call('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1.

127.0.0.1:6379> info Errorstats
# Errorstats
errorstat_WRONGTYPE:count=1

127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1
cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0
cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1
```

## error message
We can now construct the error message (sent as a reply to the user) from the error table,
so this solves issues where the error message was malformed and the error code appeared
in the middle of the error message:

```diff
127.0.0.1:6379> eval "return redis.call('set','x','y')" 0
-(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'.
+(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479)
```

```diff
127.0.0.1:6379> eval "redis.call('get', 'l')" 0
-(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value
+(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1.
```

Notica that `redis.pcall` was not change:
```
127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value
```


## other notes
Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we
can not count on it to update the command stats. In order to be able to update those stats correctly
we needed to promote `realcmd` variable to be located on the client struct.

Tests was added and modified to verify the changes.

Related PR's: #10279, #10218, #10278, #10309

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-27 13:40:57 +02:00
Itamar Haber c81c7f51c3
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.

The proposed constant-time solution is in the spirit of "best-effort."

Partially addresses #8737.

## Description of approach

We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`.  It is essentially an all-time
counter of the entries added to the stream.

Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.

The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.

Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.

## Limitations

There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.

The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.

In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).

## API changes

* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
  for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
  for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
  number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
  the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
  for propagating the CG's offset and maximal tombstone ID of the stream.

## The generic unsolved problem

The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.

## A refactoring note

The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
filipe oliveira b857928ba7
Optimize deferred replies to use shared objects instead of sprintf (#10334)
Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects.
In some pipelined workloads this achieves about 10% improvement.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 18:15:12 +02:00
ranshid 47c51d0c78
introduce dynamic client reply buffer size - save memory on idle clients (#9822)
Current implementation simple idle client which serves no traffic still
use ~17Kb of memory. this is mainly due to a fixed size reply buffer
currently set to 16kb.

We have encountered some cases in which the server operates in a low memory environments.
In such cases a user who wishes to create large connection pools to support potential burst period,
will exhaust a large amount of memory  to maintain connected Idle clients.
Some users may choose to "sacrifice" performance in order to save memory.

This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on
periodic observed peak.
the algorithm works as follows:
1. each time a client reply buffer has been fully written, the last recorded peak is updated: 
new peak = MAX( last peak, current written size)
2. during clients cron we check for each client if the last observed peak was:
     a. matching the current buffer size - in which case we expend (resize) the buffer size by 100%
     b. less than half the buffer size - in which case we shrink the buffer size by 50%
3. In any case we will **not** resize the buffer in case:
    a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the
      current buffer usable size
    b. the value of (current buffer usable size/2) is less than 1Kib
    c. the value of  (current buffer usable size*2) is larger than 16Kib
4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new
   field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it
   passed since the last buffer peak reset.

### **Interface changes:**
**CIENT LIST** - now contains 2 new extra fields:
rbs= < the current size in bytes of the client reply buffer >
rbp=< the current value in bytes of the last observed buffer peak position >

**INFO STATS** - now contains 2 new statistics:
reply_buffer_shrinks = < total number of buffer shrinks performed >
reply_buffer_expends = < total number of buffer expends performed >

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Yoav Steinberg <yoav@redislabs.com>
2022-02-22 11:19:38 +02:00
Madelyn Olson 71204f9632
Implemented module getchannels api and renamed channel keyspec (#10299)
This implements the following main pieces of functionality:
* Renames key spec "CHANNEL" to be "NOT_KEY", and update the documentation to
  indicate it's for cluster routing and not for any other key related purpose.
* Add the getchannels-api, so that modules can now define commands that are subject to
  ACL channel permission checks. 
* Add 4 new flags that describe how a module interacts with a command (SUBSCRIBE, PUBLISH,
  UNSUBSCRIBE, and PATTERN). They are all technically composable, however not sure how a
  command could both subscribe and unsubscribe from a command at once, but didn't see
  a reason to add explicit validation there.
* Add two new module apis RM_ChannelAtPosWithFlags and RM_IsChannelsPositionRequest to
  duplicate the functionality provided by the keys position APIs.
* The RM_ACLCheckChannelPermissions (only released in 7.0 RC1) was changed to take flags
  rather than a boolean literal.
* The RM_ACLCheckKeyPermissions (only released in 7.0 RC1) was changed to take flags
  corresponding to keyspecs instead of custom permission flags. These keyspec flags mimic
  the flags for ACLCheckChannelPermissions.
2022-02-22 11:00:03 +02:00
Oran Agra fad0b0d2a6
Fix error stats and failed command stats for blocked clients (#10309)
This is a followup work for #10278, and a discussion about #10279

The changes:
- fix failed_calls in command stats for blocked clients that got error.
  including CLIENT UNBLOCK, and module replying an error from a thread.
- fix latency stats for XREADGROUP that filed with -NOGROUP

Theory behind which errors should be counted:
- error stats represents errors returned to the user, so an error handled by a
  module should not be counted.
- total error counter should be the same.
- command stats represents execution of commands (even with RM_Call, and if
  they fail or get rejected it counts these calls in commandstats, so it should
  also count failed_calls)

Some thoughts about Scripts:
for scripts it could be different since they're part of user code, not the infra (not an extension to redis)
we certainly want commandstats to contain all calls and errors
a simple script is like mult-exec transaction so an error inside it should be counted in error stats
a script that replies with an error to the user (using redis.error_reply) should also be counted in error stats
but then the problem is that a plain `return redis.call("SET")` should not be counted twice (once for the SET
and once for EVAL)
so that's something left to be resolved in #10279
2022-02-21 11:20:41 +02:00
yoav-steinberg 56fa48ffc1
aof rewrite and rdb save counters in info (#10178)
Add aof_rewrites and rdb_snapshots counters to info.
This is useful to figure our if a rewrite or snapshot happened since last check.
This was part of the (ongoing) effort to provide a safe backup solution for multipart-aof backups.
2022-02-17 14:32:48 +02:00
chenyang8094 ceeff6bf86
Remove unused code - leftover from script replication mechanisms (#10272)
append for PR #9812
2022-02-09 15:44:09 +02:00
mowenliunian 051cc3d2e6
fix grammar issue in a comment (#10269)
Fixed some syntax errors in the comments
2022-02-09 07:32:40 +02:00
Wen Hui 2e1bc942aa
Make INFO command variadic (#6891)
This is an enhancement for INFO command, previously INFO only support one argument
for different info section , if user want to get more categories information, either perform
INFO all / default or calling INFO for multiple times.

**Description of the feature**

The goal of adding this feature is to let the user retrieve multiple categories via the INFO
command, and still avoid emitting the same section twice.

A use case for this is like Redis Sentinel, which periodically calling INFO command to refresh
info from monitored Master/Slaves, only Server and Replication part categories are used for
parsing information. If the INFO command can return just enough categories that client side
needs, it can save a lot of time for client side parsing it as well as network bandwidth.

**Implementation**
To share code between redis, sentinel, and other users of INFO (DEBUG and modules),
we have a new `genInfoSectionDict` function that returns a dict and some boolean flags
(e.g. `all`) to the caller (built from user input).
Sentinel is later purging unwanted sections from that, and then it is forwarded to the info `genRedisInfoString`.

**Usage Examples**
INFO Server Replication   
INFO CPU Memory
INFO default commandstats

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-08 13:14:42 +02:00
Oran Agra 66be30f7fc
Handle key-spec flags with modules (#10237)
- add COMMAND GETKEYSANDFLAGS sub-command
- add RM_KeyAtPosWithFlags and GetCommandKeysWithFlags
- RM_KeyAtPos and RM_CreateCommand set flags requiring full access for keys
- RM_CreateCommand set VARIABLE_FLAGS
- expose `variable_flags` flag in COMMAND INFO key-specs
- getKeysFromCommandWithSpecs prefers key-specs over getkeys-api
- add tests for all of these
2022-02-08 10:01:35 +02:00
Binbin 7f4cca11dc
COMMAND DOCS avoid adding summary/since if they don't exist (#10252)
If summary or since is empty, we used to return NULL in
COMMAND DOCS. Currently all redis commands will have these
two fields.

But not for module command, summary and since are optional
for RM_SetCommandInfo. With the change in #10043, if a module
command doesn't have the summary or since, redis-cli will
crash (see #10250).

In this commit, COMMAND DOCS avoid adding summary or since
when they are missing.
2022-02-07 19:57:50 +02:00
Viktor Söderqvist 0a82fe8447
Command info module API (#10108)
Adds RM_SetCommandInfo, allowing modules to provide the following command info:

* summary
* complexity
* since
* history
* hints
* arity
* key specs
* args

This information affects the output of `COMMAND`, `COMMAND INFO` and `COMMAND DOCS`,
Cluster, ACL and is used to filter commands with the wrong number of arguments before
the call reaches the module code.

The recently added API functions for key specs (never released) are removed.

A minimalist example would look like so:
```c
    RedisModuleCommand *mycmd = RedisModule_GetCommand(ctx,"mymodule.mycommand");
    RedisModuleCommandInfo mycmd_info = {
        .version = REDISMODULE_COMMAND_INFO_VERSION,
        .arity = -5,
        .summary = "some description",
    };
    if (RedisModule_SetCommandInfo(mycmd, &mycmd_info) == REDISMODULE_ERR)
        return REDISMODULE_ERR;
````

Notes:
* All the provided information (including strings) is copied, not keeping references to the API input data.
* The version field is actually a static struct that contains the sizes of the the structs used in arrays,
  so we can extend these in the future and old version will still be able to take the part they can support.
2022-02-04 21:09:36 +02:00
Binbin d7fcb3c5a1
Fix SENTINEL SET config rewrite test (#10232)
Change the sentinel config file to a directory in SENTINEL SET test.
So it will now fail on the `rename` in `rewriteConfigOverwriteFile`.

The test used to set the sentinel config file permissions to `000` to
simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151)

Other changes:
1. More error messages after the config rewrite failure.
2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304)
3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)
2022-02-04 11:39:51 +02:00
Moti Cohen 52b2fbe970
Improve srand entropy (and fix Sentinel failures) (#10197)
As Sentinel relies upon consensus algorithm, all sentinel instances,
randomize a time to initiate their next attempt to become the
leader of the group. But time after time, all raffled the same value.

The problem is in the line `srand(time(NULL)^getpid())` such that
all spinned up containers get same time (in seconds) and same pid
which is always 1. Added material `tv_usec` and verify that even
consecutive calls brings different values and makes the difference.
2022-01-30 16:39:23 +02:00
guybe7 eedec155ac
Add key-specs notes (#10193)
Add optional `notes` to keyspecs.

Other changes:

1. Remove the "incomplete" flag from SORT and SORT_RO: it is misleading since "incomplete" means "this spec may not return all the keys it describes" but SORT and SORT_RO's specs (except the input key) do not return any keys at all.
So basically:
If a spec's begin_search is "unknown" you should not use it at all, you must use COMMAND KEYS;
if a spec itself is "incomplete", you can use it to get a partial list of keys, but if you want all of them you must use COMMAND GETKEYS;
otherwise, the spec will return all the keys

2. `getKeysUsingKeySpecs` handles incomplete specs internally
2022-01-30 12:00:03 +02:00
guybe7 c79389f032
Reply for command args should be an array, not a set (#10188) 2022-01-26 12:49:24 +02:00
yoav-steinberg 7eadc5ee70
Support function flags in script EVAL via shebang header (#10126)
In #10025 we added a mechanism for flagging certain properties for Redis Functions.
This lead us to think we'd like to "port" this mechanism to Redis Scripts (`EVAL`) as well. 

One good reason for this, other than the added functionality is because it addresses the
poor behavior we currently have in `EVAL` in case the script performs a (non DENY_OOM) write operation
during OOM state. See #8478 (And a previous attempt to handle it via #10093) for details.
Note that in Redis Functions **all** write operations (including DEL) will return an error during OOM state
unless the function is flagged as `allow-oom` in which case no OOM checking is performed at all.

This PR:
- Enables setting `EVAL` (and `SCRIPT LOAD`) script flags as defined in #10025.
- Provides a syntactical framework via [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) for
  additional script annotations and even engine selection (instead of just lua) for scripts.
- Provides backwards compatibility so scripts without the new annotations will behave as they did before.
- Appropriate tests.
- Changes `EVAL[SHA]/_RO` to be flagged as `STALE` commands. This makes it possible to flag individual
  scripts as `allow-stale` or not flag them as such. In backwards compatibility mode these commands will
  return the `MASTERDOWN` error as before.
- Changes `SCRIPT LOAD` to be flagged as a `STALE` command. This is mainly to make it logically
  compatible with the change to `EVAL` in the previous point. It enables loading a script on a stale server
  which is technically okay it doesn't relate directly to the server's dataset. Running the script does, but that
  won't work unless the script is explicitly marked as `allow-stale`.

Note that even though the LUA syntax doesn't support hash tag comments `.lua` files do support a shebang
tag on the top so they can be executed on Unix systems like any shell script. LUA's `luaL_loadfile` handles
this as part of the LUA library. In the case of `luaL_loadbuffer`, which is what Redis uses, I needed to fix the
input script in case of a shebang manually. I did this the same way `luaL_loadfile` does, by replacing the
first line with a single line feed character.
2022-01-24 16:50:02 +02:00
Binbin 23325c135f
sub-command support for ACL CAT and COMMAND LIST. redisCommand always stores fullname (#10127)
Summary of changes:
1. Rename `redisCommand->name` to `redisCommand->declared_name`, it is a
  const char * for native commands and SDS for module commands.
2. Store the [sub]command fullname in `redisCommand->fullname` (sds).
3. List subcommands in `ACL CAT`
4. List subcommands in `COMMAND LIST`
5. `moduleUnregisterCommands` now will also free the module subcommands.
6. RM_GetCurrentCommandName returns full command name

Other changes:
1. Add `addReplyErrorArity` and `addReplyErrorExpireTime`
2. Remove `getFullCommandName` function that now is useless.
3. Some cleanups about `fullname` since now it is SDS.
4. Delete `populateSingleCommand` function from server.h that is useless.
5. Added tests to cover this change.
6. Add some module unload tests and fix the leaks
7. Make error messages uniform, make sure they always contain the full command
  name and that it's quoted.
7. Fixes some typos

see the history in #9504, fixes #10124

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2022-01-23 10:05:06 +02:00
guybe7 a6fd2a46d1
Improved handling of subcommands (don't allow ACL on first-arg of a sub-command) (#10147)
Recently we added extensive support for sub-commands in for redis 7.0,
this meant that the old ACL mechanism for
sub-commands wasn't needed, or actually was improved (to handle both include
and exclude control, like for commands), but only for real sub-commands.
The old mechanism in ACL was renamed to first-arg, and was able to match the
first argument of any command (including sub-commands).
We now realized that we might wanna completely delete that first-arg feature some
day, so the first step was not to give it new capabilities in 7.0 and it didn't have before.

Changes:
1. ACL: Block the first-arg mechanism on subcommands (we keep if in non-subcommands
  for backward compatibility)
2. COMMAND: When looking up a command, insist the command name doesn't contain
  extra words. Example: When a user issues `GET key` we want `lookupCommand` to return
  `getCommand` but when if COMMAND calls `lookupCommand` with `get|key` we want it to fail.

Other changes:
1. ACLSetUser: prevent a redundant command lookup
2022-01-22 14:09:40 +02:00
Madelyn Olson 55c81f2cd3
ACL V2 - Selectors and key based permissions (#9974)
* Implemented selectors which provide multiple different sets of permissions to users
* Implemented key based permissions 
* Added a new ACL dry-run command to test permissions before execution
* Updated module APIs to support checking key based permissions

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-20 13:05:27 -08:00
guybe7 10bbeb6837
Add command tips to COMMAND DOCS (#10104)
Adding command tips (see https://redis.io/topics/command-tips) to commands.

Breaking changes:
1. Removed the "random" and "sort_for_script" flags. They are now command tips.
(this isn't affecting redis behavior since #9812, but could affect some client applications
that's relying on COMMAND command flags)

Summary of changes:
1. add BLOCKING flag (new flag) for all commands that could block. The ACL category with
  the same name is now implicit.
2. move RANDOM flag to a `nondeterministic_output` tip
3. move SORT_FOR_SCRIPT flag to `nondeterministic_output_order` tip
3. add REQUEST_POLICY and RESPONSE_POLICY where appropriate as documented in the tips
4. deprecate (ignore) the `random` flag for RM_CreateCommand

Other notes:
1. Proxies need to send `RANDOMKEY` to all shards and then select one key randomly.
  The other option is to pick a random shard and transfer `RANDOMKEY `to it, but that scheme
  fails if this specific shard is empty
2. Remove CMD_RANDOM from `XACK` (i.e. XACK does not have RANDOM_OUTPUT)
   It was added in 9e4fb96ca1, I guess by mistake.
   Also from `(P)EXPIRETIME` (new command, was flagged "random" by mistake).
3. Add `nondeterministic_output` to `OBJECT ENCODING` (for the same reason `XTRIM` has it:
   the reply may differ depending on the internal representation in memory)
4. RANDOM on `HGETALL` was wrong (there due to a limitation of the old script sorting logic), now
  it's `nondeterministic_output_order`
5. Unrelated: Hide CMD_PROTECTED from COMMAND
2022-01-20 11:32:11 +02:00
perryitay c4b788230c
Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status (#9963)
Some modules might perform a long-running logic in different stages of Redis lifetime, for example:
* command execution
* RDB loading
* thread safe context

During this long-running logic Redis is not responsive.

This PR offers 
1. An API to process events while a busy command is running (`RM_Yield`)
2. A new flag (`ALLOW_BUSY`) to mark the commands that should be handled during busy
  jobs which can also be used by modules (`allow-busy`)
3. In slow commands and thread safe contexts, this flag will start rejecting commands with -BUSY only
  after `busy-reply-threshold`
4. During loading (`rdb_load` callback), it'll process events right away (not wait for `busy-reply-threshold`),
  but either way, the processing is throttled to the server hz rate.
5. Allow modules to Yield to redis background tasks, but not to client commands

* rename `script-time-limit` to `busy-reply-threshold` (an alias to the pre-7.0 `lua-time-limit`)

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-20 09:05:53 +02:00
Oran Agra eef9c6b0ee
New detailed key-spec flags (RO, RW, OW, RM, ACCESS, UPDATE, INSERT, DELETE) (#10122)
The new ACL key based permissions in #9974 require the key-specs (#8324) to have more
explicit flags rather than just READ and WRITE. See discussion in #10040

This PR defines two groups of flags:
One about how redis internally handles the key (mutually-exclusive).
The other is about the logical operation done from the user's point of view (3 mutually exclusive
write flags, and one read flag, all optional).
In both groups, if we can't explicitly flag something as explicit read-only, delete-only, or
insert-only, we flag it as `RW` or `UPDATE`.
here's the definition from the code:
```
/* Key-spec flags *
 * -------------- */
/* The following refer what the command actually does with the value or metadata
 * of the key, and not necessarily the user data or how it affects it.
 * Each key-spec may must have exaclty one of these. Any operation that's not
 * distinctly deletion, overwrite or read-only would be marked as RW. */
#define CMD_KEY_RO (1ULL<<0)     /* Read-Only - Reads the value of the key, but
                                  * doesn't necessarily returns it. */
#define CMD_KEY_RW (1ULL<<1)     /* Read-Write - Modifies the data stored in the
                                  * value of the key or its metadata. */
#define CMD_KEY_OW (1ULL<<2)     /* Overwrite - Overwrites the data stored in
                                  * the value of the key. */
#define CMD_KEY_RM (1ULL<<3)     /* Deletes the key. */
/* The follwing refer to user data inside the value of the key, not the metadata
 * like LRU, type, cardinality. It refers to the logical operation on the user's
 * data (actual input strings / TTL), being used / returned / copied / changed,
 * It doesn't refer to modification or returning of metadata (like type, count,
 * presence of data). Any write that's not INSERT or DELETE, would be an UPADTE.
 * Each key-spec may have one of the writes with or without access, or none: */
#define CMD_KEY_ACCESS (1ULL<<4) /* Returns, copies or uses the user data from
                                  * the value of the key. */
#define CMD_KEY_UPDATE (1ULL<<5) /* Updates data to the value, new value may
                                  * depend on the old value. */
#define CMD_KEY_INSERT (1ULL<<6) /* Adds data to the value with no chance of,
                                  * modification or deletion of existing data. */
#define CMD_KEY_DELETE (1ULL<<7) /* Explicitly deletes some content
                                  * from the value of the key. */
```

Unrelated changes:
- generate-command-code.py is only compatible with python3 (modified the shabang)
- generate-command-code.py print file on json parsing error
- rename `shard_channel` key-spec flag to just `channel`.
- add INCOMPLETE flag in input spec of SORT and SORT_RO
2022-01-18 16:00:00 +02:00
Ozan Tezcan 99ab4236af
Add event loop support to the module API (#10001)
Modules can now register sockets/pipe to the Redis main thread event loop and do network operations asynchronously. Previously, modules had to maintain an event loop and another thread for asynchronous network operations.

Also, if a module is calling API functions after doing some network operations, it had to synchronize its event loop thread's access with Redis main thread by locking the GIL, causing contention on the lock. After this commit, no synchronization is needed as module can operate in Redis main thread context. So, this commit may improve the performance for some use cases.

Added three functions to the module API:

* RedisModule_EventLoopAdd(int fd, int mask, RedisModuleEventLoopFunc func, void *user_data)
* RedisModule_EventLoopDel(int fd, int mask)
* RedisModule_EventLoopAddOneShot(RedisModuleEventLoopOneShotFunc func, void *user_data) - This function can be called from other threads to trigger callback on Redis main thread. Callback will be triggered only once. If Redis main thread is sleeping, this call will wake up the Redis main thread.
Event loop callbacks are called by Redis main thread after locking the GIL. Inside callbacks, modules can operate as if they are holding the GIL.

Added REDISMODULE_EVENT_EVENTLOOP event with two subevents:

* REDISMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP
* REDISMODULE_SUBEVENT_EVENTLOOP_AFTER_SLEEP

These events are for modules that want to participate in the before and after sleep action. e.g It might be useful to implement batching : Read data from the network, write all to a file in one go on BEFORE_SLEEP event.
2022-01-18 13:10:07 +02:00
zhaozhao.zz 90916f16a5
show subcommands latencystats (#10103)
since `info commandstats` already shows sub-commands, we should do the same in `info latencystats`.
similarly, the LATENCY HISTOGRAM command now shows sub-commands (with their full name) when:
* asking for all commands
* asking for a specific container command
* asking for a specific sub-command)

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-17 12:32:32 +02:00
David CARLIER 7da7d2aa8e
checkTcpBacklogSettings check for solaris based systems. (#10109) 2022-01-15 20:39:05 +02:00
Meir Shpilraien (Spielrein) 4db4b43417
Function Flags support (no-writes, no-cluster, allow-state, allow-oom) (#10066)
# Redis Functions Flags

Following the discussion on #10025 Added Functions Flags support.
The PR is divided to 2 sections:
* Add named argument support to `redis.register_function` API.
* Add support for function flags

## `redis.register_function` named argument support

The first part of the PR adds support for named argument on `redis.register_function`, example:
```
redis.register_function{
    function_name='f1',
    callback=function()
        return 'hello'
    end,
    description='some desc'
}
```

The positional arguments is also kept, which means that it still possible to write:
```
redis.register_function('f1', function() return 'hello' end)
```

But notice that it is no longer possible to pass the optional description argument on the positional
argument version. Positional argument was change to allow passing only the mandatory arguments
(function name and callback). To pass more arguments the user must use the named argument version.

As with positional arguments, the `function_name` and `callback` is mandatory and an error will be
raise if those are missing. Also, an error will be raise if an unknown argument name is given or the
arguments type is wrong.

Tests was added to verify the new syntax.

## Functions Flags

The second part of the PR is adding functions flags support.
Flags are given to Redis when the engine calls `functionLibCreateFunction`, supported flags are:

* `no-writes` - indicating the function perform no writes which means that it is OK to run it on:
   * read-only replica
   * Using FCALL_RO
   * If disk error detected
   
   It will not be possible to run a function in those situations unless the function turns on the `no-writes` flag

* `allow-oom` - indicate that its OK to run the function even if Redis is in OOM state, if the function will
  not turn on this flag it will not be possible to run it if OOM reached (even if the function declares `no-writes`
  and even if `fcall_ro` is used). If this flag is set, any command will be allow on OOM (even those that is
  marked with CMD_DENYOOM). The assumption is that this flag is for advance users that knows its
  meaning and understand what they are doing, and Redis trust them to not increase the memory usage.
  (e.g. it could be an INCR or a modification on an existing key, or a DEL command)

* `allow-state` - indicate that its OK to run the function on stale replica, in this case we will also make
  sure the function is only perform `stale` commands and raise an error if not.

* `no-cluster` - indicate to disallow running the function if cluster is enabled.

Default behaviure of functions (if no flags is given):
1. Allow functions to read and write
2. Do not run functions on OOM
3. Do not run functions on stale replica
4. Allow functions on cluster

### Lua API for functions flags

On Lua engine, it is possible to give functions flags as `flags` named argument:

```
redis.register_function{function_name='f1', callback=function() return 1 end, flags={'no-writes', 'allow-oom'}, description='description'}
```

The function flags argument must be a Lua table that contains all the requested flags, The following
will result in an error:
* Unknown flag
* Wrong flag type

Default behaviour is the same as if no flags are used.

Tests were added to verify all flags functionality

## Additional changes
* mark FCALL and FCALL_RO with CMD_STALE flag (unlike EVAL), so that they can run if the function was
  registered with the `allow-stale` flag.
* Verify `CMD_STALE` on `scriptCall` (`redis.call`), so it will not be possible to call commands from script while
  stale unless the command is marked with the `CMD_STALE` flags. so that even if the function is allowed while
  stale we do not allow it to bypass the `CMD_STALE` flag of commands.
* Flags section was added to `FUNCTION LIST` command to provide the set of flags for each function:
```
> FUNCTION list withcode
1)  1) "library_name"
    2) "test"
    3) "engine"
    4) "LUA"
    5) "description"
    6) (nil)
    7) "functions"
    8) 1) 1) "name"
          2) "f1"
          3) "description"
          4) (nil)
          5) "flags"
          6) (empty array)
    9) "library_code"
   10) "redis.register_function{function_name='f1', callback=function() return 1 end}"
```
* Added API to get Redis version from within a script, The redis version can be provided using:
   1. `redis.REDIS_VERSION` - string representation of the redis version in the format of MAJOR.MINOR.PATH
   2. `redis.REDIS_VERSION_NUM` - number representation of the redis version in the format of `0x00MMmmpp`
      (`MM` - major, `mm` - minor,  `pp` - patch). The number version can be used to check if version is greater or less 
      another version. The string version can be used to return to the user or print as logs.

   This new API is provided to eval scripts and functions, it also possible to use this API during functions loading phase.
2022-01-14 14:02:02 +02:00
Binbin 20c33fe6a8
Show subcommand full name in error log / ACL LOG (#10105)
Use `getFullCommandName` to get the full name of the command.
It can also get the full name of the subcommand, like "script|help".

Before:
```
> SCRIPT HELP
(error) NOPERM this user has no permissions to run the 'help' command or its subcommand

> ACL LOG
    7) "object"
    8) "help"
```

After:
```
> SCRIPT HELP
(error) NOPERM this user has no permissions to run the 'script|help' command

> ACL LOG
    7) "object"
    8) "script|help"
```

Fix #10094
2022-01-12 20:05:14 +02:00
Ozan Tezcan 6790d848c5
Reuse temporary client objects for blocked clients by module (#9940)
Added a pool for temporary client objects to reuse in module operations.
By reusing temporary clients, we are avoiding expensive createClient()/freeClient()
calls and improving performance of RM_BlockClient() and  RM_GetThreadSafeContext() calls. 

This commit contains two optimizations: 

1 - RM_BlockClient() and RM_GetThreadSafeContext() calls create temporary clients and they are freed in
RM_UnblockClient() and RM_FreeThreadSafeContext() calls respectively. Creating/destroying client object
takes quite time. To avoid that, added a pool of temporary clients. Pool expands when more clients are needed.
Also, added a cron function to shrink the pool and free unused clients after some time. Pool starts with zero
clients in it. It does not have max size and can grow unbounded as we need it. We will keep minimum of 8
temporary clients in the pool once created. Keeping small amount of clients to avoid client allocation costs
if temporary clients are required after some idle period.

2 - After unblocking a client (RM_UnblockClient()), one byte is written to pipe to wake up Redis main thread.
If there are many clients that will be unblocked, each operation requires one write() call which is quite expensive.
Changed code to avoid subsequent calls if possible. 

There are a few more places that need temporary client objects (e.g RM_Call()). These are now using the same
temporary client pool to make things more centralized.
2022-01-11 19:00:56 +02:00
Oran Agra 3204a03574
Move doc metadata from COMMAND to COMMAND DOCS (#10056)
Syntax:
`COMMAND DOCS [<command name> ...]`

Background:
Apparently old version of hiredis (and thus also redis-cli) can't
support more than 7 levels of multi-bulk nesting.

The solution is to move all the doc related metadata from COMMAND to a
new COMMAND DOCS sub-command.

The new DOCS sub-command returns a map of commands (not an array like in COMMAND),
And the same goes for the `subcommands` field inside it (also contains a map)

Besides that, the remaining new fields of COMMAND (hints, key-specs, and
sub-commands), are placed in the outer array rather than a nested map.
this was done mainly for consistency with the old format.

Other changes:
---
* Allow COMMAND INFO with no arguments, which returns all commands, so that we can some day deprecated
  the plain COMMAND (no args)

* Reduce the amount of deferred replies from both COMMAND and COMMAND
  DOCS, especially in the inner loops, since these create many small
  reply objects, which lead to many small write syscalls and many small
  TCP packets.
  To make this easier, when populating the command table, we count the
  history, args, and hints so we later know their size in advance.
  Additionally, the movablekeys flag was moved into the flags register.
* Update generate-commands-json.py to take the data from both command, it
  now executes redis-cli directly, instead of taking input from stdin.
* Sub-commands in both COMMAND (and COMMAND INFO), and also COMMAND DOCS,
  show their full name. i.e. CONFIG 
*   GET will be shown as `config|get` rather than just `get`.
  This will be visible both when asking for `COMMAND INFO config` and COMMAND INFO config|get`, but is
  especially important for the later.
  i.e. imagine someone doing `COMMAND INFO slowlog|get config|get` not being able to distinguish between the two
  items in the array response.
2022-01-11 17:16:16 +02:00
Madelyn Olson e8e02f900c
Changed latency histogram output to omit trailing 0s and periods (#10075)
Changed latency percentile output to omit trailing 0s and periods
2022-01-09 17:04:18 -08:00
Binbin a84c964d37
Fix crash when error [sub]command name contains | (#10082)
The following error commands will crash redis-server:
```
> get|
Error: Server closed the connection
> get|set
Error: Server closed the connection
> get|other
```

The reason is in #9504, we use `lookupCommandBySds` for find the
container command. And it split the command (argv[0]) with `|`.
If we input something like `get|other`, after the split, `get`
will become a valid command name, pass the `ERR unknown command`
check, and finally crash in `addReplySubcommandSyntaxError`

In this case we do not need to split the command name with `|`
and just look in the commands dict to find if `argv[0]` is a
container command.

So this commit introduce a new function call `isContainerCommandBySds`
that it will return true if a command name is a container command.

Also with the old code, there is a incorrect error message:
```
> config|get set
(error) ERR Unknown subcommand or wrong number of arguments for 'set'. Try CONFIG|GET HELP.
```

The crash was reported in #10070.
2022-01-09 13:06:51 +02:00
Meir Shpilraien (Spielrein) 885f6b5ceb
Redis Function Libraries (#10004)
# Redis Function Libraries

This PR implements Redis Functions Libraries as describe on: https://github.com/redis/redis/issues/9906.

Libraries purpose is to provide a better code sharing between functions by allowing to create multiple
functions in a single command. Functions that were created together can safely share code between
each other without worrying about compatibility issues and versioning.

Creating a new library is done using 'FUNCTION LOAD' command (full API is described below)

This PR introduces a new struct called libraryInfo, libraryInfo holds information about a library:
* name - name of the library
* engine - engine used to create the library
* code - library code
* description - library description
* functions - the functions exposed by the library

When Redis gets the `FUNCTION LOAD` command it creates a new empty libraryInfo.
Redis passes the `CODE` to the relevant engine alongside the empty libraryInfo.
As a result, the engine will create one or more functions by calling 'libraryCreateFunction'.
The new funcion will be added to the newly created libraryInfo. So far Everything is happening
locally on the libraryInfo so it is easy to abort the operation (in case of an error) by simply
freeing the libraryInfo. After the library info is fully constructed we start the joining phase by
which we will join the new library to the other libraries currently exist on Redis.
The joining phase make sure there is no function collision and add the library to the
librariesCtx (renamed from functionCtx). LibrariesCtx is used all around the code in the exact
same way as functionCtx was used (with respect to RDB loading, replicatio, ...).
The only difference is that apart from function dictionary (maps function name to functionInfo
object), the librariesCtx contains also a libraries dictionary that maps library name to libraryInfo object.

## New API
### FUNCTION LOAD
`FUNCTION LOAD <ENGINE> <LIBRARY NAME> [REPLACE] [DESCRIPTION <DESCRIPTION>] <CODE>`
Create a new library with the given parameters:
* ENGINE - REPLACE Engine name to use to create the library.
* LIBRARY NAME - The new library name.
* REPLACE - If the library already exists, replace it.
* DESCRIPTION - Library description.
* CODE - Library code.

Return "OK" on success, or error on the following cases:
* Library name already taken and REPLACE was not used
* Name collision with another existing library (even if replace was uses)
* Library registration failed by the engine (usually compilation error)

## Changed API
### FUNCTION LIST
`FUNCTION LIST [LIBRARYNAME <LIBRARY NAME PATTERN>] [WITHCODE]`
Command was modified to also allow getting libraries code (so `FUNCTION INFO` command is no longer
needed and removed). In addition the command gets an option argument, `LIBRARYNAME` allows you to
only get libraries that match the given `LIBRARYNAME` pattern. By default, it returns all libraries.

### INFO MEMORY
Added number of libraries to `INFO MEMORY`

### Commands flags
`DENYOOM` flag was set on `FUNCTION LOAD` and `FUNCTION RESTORE`. We consider those commands
as commands that add new data to the dateset (functions are data) and so we want to disallows
to run those commands on OOM.

## Removed API
* FUNCTION CREATE - Decided on https://github.com/redis/redis/issues/9906
* FUNCTION INFO - Decided on https://github.com/redis/redis/issues/9899

## Lua engine changes
When the Lua engine gets the code given on `FUNCTION LOAD` command, it immediately runs it, we call
this run the loading run. Loading run is not a usual script run, it is not possible to invoke any
Redis command from within the load run.
Instead there is a new API provided by `library` object. The new API's: 
* `redis.log` - behave the same as `redis.log`
* `redis.register_function` - register a new function to the library

The loading run purpose is to register functions using the new `redis.register_function` API.
Any attempt to use any other API will result in an error. In addition, the load run is has a time
limit of 500ms, error is raise on timeout and the entire operation is aborted.

### `redis.register_function`
`redis.register_function(<function_name>, <callback>, [<description>])`
This new API allows users to register a new function that will be linked to the newly created library.
This API can only be called during the load run (see definition above). Any attempt to use it outside
of the load run will result in an error.
The parameters pass to the API are:
* function_name - Function name (must be a Lua string)
* callback - Lua function object that will be called when the function is invokes using fcall/fcall_ro
* description - Function description, optional (must be a Lua string).

### Example
The following example creates a library called `lib` with 2 functions, `f1` and `f1`, returns 1 and 2 respectively:
```
local function f1(keys, args)
    return 1
end

local function f2(keys, args)
    return 2
end

redis.register_function('f1', f1)
redis.register_function('f2', f2)
```

Notice: Unlike `eval`, functions inside a library get the KEYS and ARGV as arguments to the
functions and not as global.

### Technical Details

On the load run we only want the user to be able to call a white list on API's. This way, in
the future, if new API's will be added, the new API's will not be available to the load run
unless specifically added to this white list. We put the while list on the `library` object and
make sure the `library` object is only available to the load run by using [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv) API. This API allows us to set
the `globals` of a function (and all the function it creates). Before starting the load run we
create a new fresh Lua table (call it `g`) that only contains the `library` API (we make sure
to set global protection on this table just like the general global protection already exists
today), then we use [lua_setfenv](https://www.lua.org/manual/5.1/manual.html#lua_setfenv)
to set `g` as the global table of the load run. After the load run finished we update `g`
metatable and set `__index` and `__newindex` functions to be `_G` (Lua default globals),
we also pop out the `library` object as we do not need it anymore.
This way, any function that was created on the load run (and will be invoke using `fcall`) will
see the default globals as it expected to see them and will not have the `library` API anymore.

An important outcome of this new approach is that now we can achieve a distinct global table
for each library (it is not yet like that but it is very easy to achieve it now). In the future we can
decide to remove global protection because global on different libraries will not collide or we
can chose to give different API to different libraries base on some configuration or input.

Notice that this technique was meant to prevent errors and was not meant to prevent malicious
user from exploit it. For example, the load run can still save the `library` object on some local
variable and then using in `fcall` context. To prevent such a malicious use, the C code also make
sure it is running in the right context and if not raise an error.
2022-01-06 13:39:38 +02:00
Ozan Tezcan 568c2e039b
Set errno to EEXIST in redisFork() if child process exists (#10059)
Callers of redisFork() are logging `strerror(errno)` on failure.
`errno` is not set when there is already a child process, causing printing
current value of errno which was set before `redisFork()` call. 

Setting errno to EEXIST on this failure to provide more meaningful error message.
2022-01-06 09:54:21 +02:00
filipe oliveira 5dd15443ac
Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462)
# Short description

The Redis extended latency stats track per command latencies and enables:
- exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command.
  **( percentile distribution is not mergeable between cluster nodes ).**
- exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command.
  Using the cumulative distribution of latencies we can merge several stats from different cluster nodes
  to calculate aggregate metrics .

By default, the extended latency monitoring is enabled since the overhead of keeping track of the
command latency is very small.
 
If you don't want to track extended latency metrics, you can easily disable it at runtime using the command:
 - `CONFIG SET latency-tracking no`

By default, the exported latency percentiles are the p50, p99, and p999.
You can alter them at runtime using the command:
- `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"`


## Some details:
- The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command
  was called for the first time.
- With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable
  ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable
  vs this branch. 
- We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf )

## `INFO LATENCYSTATS` exposition format

   - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` 

## `LATENCY HISTOGRAM [command ...]` exposition format

Return a cumulative distribution of latencies in the format of a histogram for the specified command names.

The histogram is composed of a map of time buckets:
- Each representing a latency range, between 1 nanosecond and roughly 1 second.
- Each bucket covers twice the previous bucket's range.
- Empty buckets are not printed.
- Everything above 1 sec is considered +Inf.
- At max there will be log2(1000000000)=30 buckets

We reply a map for each command in the format:
`<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ...  } }`

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-05 14:01:05 +02:00
yoav-steinberg 65a7635793
`redis-cli --replica` reads dummy empty rdb instead of full snapshot (#10044)
This makes redis-cli --replica much faster and reduces COW/fork risks on server side.
This commit also improves the RDB filtering via REPLCONF rdb-filter-only to support no "include" specifiers at all.
2022-01-04 17:09:22 +02:00
guybe7 ac84b1cd82
Ban snapshot-creating commands and other admin commands from transactions (#10015)
Creating fork (or even a foreground SAVE) during a transaction breaks the atomicity of the transaction.
In addition to that, it could mess up the propagated transaction to the AOF file.

This change blocks SAVE, PSYNC, SYNC and SHUTDOWN from being executed inside MULTI-EXEC.
It does that by adding a command flag, so that modules can flag their commands with that flag too.

Besides it changes BGSAVE, BGREWRITEAOF, and CONFIG SET appendonly, to turn the
scheduled flag instead of forking righ taway.

Other changes:
* expose `protected`, `no-async-loading`, and `no_multi` flags in COMMAND command
* add a test to validate propagation of FLUSHALL inside a transaction.
* add a test to validate how CONFIG SET that errors reacts in a transaction

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 13:37:47 +02:00
chenyang8094 87789fae0b
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.

The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)

The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
  it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
  one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
  incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
  `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
  It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
  we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
  delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
  period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
  manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
  `aof-load-truncated` is enabled.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-03 19:14:13 +02:00
Meir Shpilraien (Spielrein) 78a62c0124
Fix OOM error not raised of functions (#10048)
OOM Error did not raise on functions due to a bug.
Added test to verify the fix.
2022-01-03 19:04:29 +02:00
Madelyn Olson 5460c10047
Implement clusterbus message extensions and cluster hostname support (#9530)
Implement the ability for cluster nodes to advertise their location with extension messages.
2022-01-02 19:48:29 -08:00
Harkrishn Patro 9f8885760b
Sharded pubsub implementation (#8621)
This commit implements a sharded pubsub implementation based off of shard channels.

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-01-02 16:54:47 -08:00
Joey from AWS 09c668f220
Report slot to keys map size in MEMORY STATS in cluster mode (#10017)
Report slot to keys map size in MEMORY STATS in cluster mode
Report dictMetadataSize in MEMORY USAGE command as well
2022-01-02 10:39:59 +02:00
Viktor Söderqvist 45a155bd0f
Wait for replicas when shutting down (#9872)
To avoid data loss, this commit adds a grace period for lagging replicas to
catch up the replication offset.

Done:

* Wait for replicas when shutdown is triggered by SIGTERM and SIGINT.

* Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new
  blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients
  to call SHUTDOWN in parallel.
  Note that they don't expect a response unless an error happens and shutdown is aborted.

* Log warning for each replica lagging behind when finishing shutdown.

* CLIENT_PAUSE_WRITE while waiting for replicas.

* Configurable grace period 'shutdown-timeout' in seconds (default 10).

* New flags for the SHUTDOWN command:

    - NOW disables the grace period for lagging replicas.

    - FORCE ignores errors writing the RDB or AOF files which would normally
      prevent a shutdown.

    - ABORT cancels ongoing shutdown. Can't be combined with other flags.

* New field in the output of the INFO command: 'shutdown_in_milliseconds'. The
  value is the remaining maximum time to wait for lagging replicas before
  finishing the shutdown. This field is present in the Server section **only**
  during shutdown.

Not directly related:

* When shutting down, if there is an AOF saving child, it is killed **even** if AOF
  is disabled. This can happen if BGREWRITEAOF is used when AOF is off.

* Client pause now has end time and type (WRITE or ALL) per purpose. The
  different pause purposes are *CLIENT PAUSE command*, *failover* and
  *shutdown*. If clients are unpaused for one purpose, it doesn't affect client
  pause for other purposes. For example, the CLIENT UNPAUSE command doesn't
  affect client pause initiated by the failover or shutdown procedures. A completed
  failover or a failed shutdown doesn't unpause clients paused by the CLIENT
  PAUSE command.

Notes:

* DEBUG RESTART doesn't wait for replicas.

* We already have a warning logged when a replica disconnects. This means that
  if any replica connection is lost during the shutdown, it is either logged as
  disconnected or as lagging at the time of exit.

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:50:15 +02:00
yoav-steinberg 1bf6d6f11e
Generate RDB with Functions only via redis-cli --functions-rdb (#9968)
This is needed in order to ease the deployment of functions for ephemeral cases, where user
needs to spin up a server with functions pre-loaded.

#### Details:

* Added `--functions-rdb` option to _redis-cli_.
* Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space
  separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only
  "functions db:3 key-patten:user*"` and a complementing `rdb-filter-exclude` `REPLCONF`
  can also be added.
* Handle "slave requirements" specification to RDB saving code so we can use the same RDB
  when different slaves express the same requirements (like functions-only) and not share the
  RDB when their requirements differ. This is currently just a flags `int`, but can be extended to
  a more complex structure with various filter fields.
* make sure to support filters only in diskless replication mode (not to override the persistence file),
  we do that by forcing diskless (even if disabled by config)

other changes:
* some refactoring in rdb.c (extract portion of a big function to a sub-function)
* rdb_key_save_delay used in AOFRW too
* sendChildInfo takes the number of updated keys (incremental, rather than absolute)

Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-02 09:39:01 +02:00
perryitay 43229e4f10
Safe and organized exit when receiving sigterm while loading (#10003)
on the signal handler we are enabling server.shutdown_asap flag instead of just doing `exit()`,
and then catch it on the whileBlockedCron() where we prepare for shutdown correctly.

this is a more ore organized and safe termination, the old approach was missing these for example:
1. removal of the pidfile
2. shutdown event to modules
2021-12-28 13:25:56 +02:00
David CARLIER 2c5573e894
Check somaxconn system settings on macOS, FreeBSD and OpenBSD. (#9972)
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2021-12-28 09:20:10 +02:00
Itamar Haber f810510bb2
Adds utils/gen-commands-json.py (#9958)
Following #9656, this script generates a "commands.json" file from the output
of the new COMMAND. The output of this script is used in redis/redis-doc#1714
and by redis/redis-io#259. This also converts a couple of rogue dashes (in 
'key-specs' and 'multiple-token' flags) to underscores (continues #9959).
2021-12-27 19:31:13 +02:00
guybe7 7ac213079c
Sort out mess around propagation and MULTI/EXEC (#9890)
The mess:
Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()),
causing edge cases, ugly/hacky code, and the tendency for bugs

The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the
top-most call() is responsible for going over that list and actually propagating them (and wrapping
them in MULTI/EXEC if there's more than one command). This is done in the new function,
propagatePendingCommands.

Callers to propagatePendingCommands:
1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most
   one to propagate them) - via `afterCommand`
2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 
3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the
   expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate
   the deletion explicitly.
4. cron stuff: active-expire and eviction may also propagate stuff
5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications,
   threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one
   place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module
   context may cause propagation.
6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module
   must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when
   releasing the GIL.

A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl):
   When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order:
   first all the commands from RM_Call, and then the ones from RM_Replicate

Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one
write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant.
not anymore.

This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs.
propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function.

Optimizations:
1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas
2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove

Bugfixes:
1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules.
   we need to prevent it from propagating to AOF/replicas
2. We need to set current_client in RM_Call. buggy scenario:
   - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call
   - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE
3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands
   (we always send a notification before propagating the command)
2021-12-23 00:03:48 +02:00
Oran Agra 41e6e05dee
Allow most CONFIG SET during loading, block some commands in async-loading (#9878)
## background
Till now CONFIG SET was blocked during loading.
(In the not so distant past, GET was disallowed too)

We recently (not released yet) added an async-loading mode, see #9323,
and during that time it'll serve CONFIG SET and any other command.
And now we realized (#9770) that some configs, and commands are dangerous
during async-loading.

## changes
* Allow most CONFIG SET during loading (both on async-loading and normal loading)
* Allow CONFIG REWRITE and CONFIG RESETSTAT during loading
* Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`)
* Block a few commands during loading (list below)

## the blocked commands:
* SAVE - obviously we don't wanna start a foregreound save during loading 8-)
* BGSAVE - we don't mind to schedule one, but we don't wanna fork now
* BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now
* MODULE - we obviously don't wanna unload a module during replication / rdb loading
  (MODULE HELP and MODULE LIST are not blocked)
* SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync
  requests now.
* REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let
  the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state.
* CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT,
  GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo

## other fixes
* processEventsWhileBlocked had an issue when being nested, this could happen with a busy script
  during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in
  the scenario described in #6988
2021-12-22 14:11:16 +02:00
zhugezy 1b0968df46
Remove EVAL script verbatim replication, propagation, and deterministic execution logic (#9812)
# Background

The main goal of this PR is to remove relevant logics on Lua script verbatim replication,
only keeping effects replication logic, which has been set as default since Redis 5.0.
As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default
configuration from users' point of view.

There are lots of reasons to remove verbatim replication.
Antirez has listed some of the benefits in Issue #5292:

>1. No longer need to explain to users side effects into scripts.
    They can do whatever they want.
>2. No need for a cache about scripts that we sent or not to the slaves.
>3. No need to sort the output of certain commands inside scripts
    (SMEMBERS and others): this both simplifies and gains speed.
>4. No need to store scripts inside the RDB file in order to startup correctly.
>5. No problems about evicting keys during the script execution.

When looking back at Redis 5.0, antirez and core team decided to set the config
`lua-replicate-commands yes` by default instead of removing verbatim replication
directly, in case some bad situations happened. 3 years later now before Redis 7.0,
it's time to remove it formally.

# Changes

- configuration for lua-replicate-commands removed
  - created config file stub for backward compatibility
- Replication script cache removed
  - this is useless under script effects replication
  - relevant statistics also removed
- script persistence in RDB files is also removed
- Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed
- Deterministic execution logic in scripts removed (i.e. don't run write commands
  after random ones, and sorting output of commands with random order)
  - the flags indicating which commands have non-deterministic results are kept as hints to clients.
- `redis.replicate_commands()` & `redis.set_repl()` changed
  - now `redis.replicate_commands()` does nothing and return an 1
  - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now
- Relevant TCL cases adjusted
- DEBUG lua-always-replicate-commands removed

# Other changes
- Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-12-21 08:32:42 +02:00
Binbin febc3f63b2
Fix recent daily CI test failures (#9966)
Recent PRs have introduced some failures, this commit
try to fix these CI failures. Here are the changes:

1. Enable debug-command in sentinel test.
```
Master reboot in very short time: ERR DEBUG command not allowed. If the
enable-debug-command option is set to "local", you can run it from a
local connection, otherwise you need to set this option in the
configuration file, and then restart the server.
```

2. Enable protected-config in sentinel test.
```
SDOWN is triggered by misconfigured instance replying with errors: ERR
CONFIG SET failed (possibly related to argument 'dir') - can't set
protected config
```

3. Enable debug-command in cluster test.
```
Verify slaves consistency: ERR DEBUG command not allowed. If the
enable-debug-command option is set to "local", you can run it from a
local connection, otherwise you need to set this option in the
configuration file, and then restart the server.
```

4. quicklist fill should be signed int.
The reason for the modification is to eliminate the warning.
Modify `int fill: QL_FILL_BITS` to `signed int fill: QL_FILL_BITS`

The first three were introduced at #9920 (same issue).
And the last one was introduced at #9962.
2021-12-20 12:31:13 +02:00
YaacovHazan ae2f5b7b2e
Protected configs and sensitive commands (#9920)
Block sensitive configs and commands by default.

* `enable-protected-configs` - block modification of configs with the new `PROTECTED_CONFIG` flag.
   Currently we add this flag to `dbfilename`, and `dir` configs,
   all of which are non-mutable configs that can set a file redis will write to.
* `enable-debug-command` - block the `DEBUG` command
* `enable-module-command` - block the `MODULE` command

These have a default value set to `no`, so that these features are not
exposed by default to client connections, and can only be set by modifying the config file.

Users can change each of these to either `yes` (allow all access), or `local` (allow access from
local TCP connections and unix domain connections)

Note that this is a **breaking change** (specifically the part about MODULE command being disabled by default).
I.e. we don't consider DEBUG command being blocked as an issue (people shouldn't have been using it),
and the few configs we protected are unlikely to have been set at runtime anyway.
On the other hand, it's likely to assume some users who use modules, load them from the config file anyway.
Note that's the whole point of this PR, for redis to be more secure by default and reduce the attack surface on
innocent users, so secure defaults will necessarily mean a breaking change.
2021-12-19 10:46:16 +02:00
guybe7 5df070ba39
COMMAND: Use underscores instead of hyphens in attributes (#9959)
some languages can build a json-like object by parsing a textual json,
but it works poorly when attributes contain hyphens

example in JS:
```
let j = JSON.parse(json)
j['key-name'] <- works
j.key-name <= illegal syntax
```
2021-12-18 09:00:42 +02:00
ny0312 792afb4432
Introduce memory management on cluster link buffers (#9774)
Introduce memory management on cluster link buffers:
 * Introduce a new `cluster-link-sendbuf-limit` config that caps memory usage of cluster bus link send buffers.
 * Introduce a new `CLUSTER LINKS` command that displays current TCP links to/from peers.
 * Introduce a new `mem_cluster_links` field under `INFO` command output, which displays the overall memory usage by all current cluster links.
 * Introduce a new `total_cluster_links_buffer_limit_exceeded` field under `CLUSTER INFO` command output, which displays the accumulated count of cluster links freed due to `cluster-link-sendbuf-limit`.
2021-12-16 21:56:59 -08:00
ranshid 28b5a6537d
Throw error on too long unix domain socket file path (#9826)
* Fix too long unix domain socket file path

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2021-12-15 21:38:45 -08:00
guybe7 867816003e
Auto-generate the command table from JSON files (#9656)
Delete the hardcoded command table and replace it with an auto-generated table, based
on a JSON file that describes the commands (each command must have a JSON file).

These JSON files are the SSOT of everything there is to know about Redis commands,
and it is reflected fully in COMMAND INFO.

These JSON files are used to generate commands.c (using a python script), which is then
committed to the repo and compiled.

The purpose is:
* Clients and proxies will be able to get much more info from redis, instead of relying on hard coded logic.
* drop the dependency between Redis-user and the commands.json in redis-doc.
* delete help.h and have redis-cli learn everything it needs to know just by issuing COMMAND (will be
  done in a separate PR)
* redis.io should stop using commands.json and learn everything from Redis (ultimately one of the release
  artifacts should be a large JSON, containing all the information about all of the commands, which will be
  generated from COMMAND's reply)
* the byproduct of this is:
  * module commands will be able to provide that info and possibly be more of a first-class citizens
  * in theory, one may be able to generate a redis client library for a strictly typed language, by using this info.

### Interface changes

#### COMMAND INFO's reply change (and arg-less COMMAND)

Before this commit the reply at index 7 contained the key-specs list
and reply at index 8 contained the sub-commands list (Both unreleased).
Now, reply at index 7 is a map of:
- summary - short command description
- since - debut version
- group - command group
- complexity - complexity string
- doc-flags - flags used for documentation (e.g. "deprecated")
- deprecated-since - if deprecated, from which version?
- replaced-by - if deprecated, which command replaced it?
- history - a list of (version, what-changed) tuples
- hints - a list of strings, meant to provide hints for clients/proxies. see https://github.com/redis/redis/issues/9876
- arguments - an array of arguments. each element is a map, with the possibility of nesting (sub-arguments)
- key-specs - an array of keys specs (already in unstable, just changed location)
- subcommands - a list of sub-commands (already in unstable, just changed location)
- reply-schema - will be added in the future (see https://github.com/redis/redis/issues/9845)

more details on these can be found in https://github.com/redis/redis-doc/pull/1697

only the first three fields are mandatory 

#### API changes (unreleased API obviously)

now they take RedisModuleCommand opaque pointer instead of looking up the command by name

- RM_CreateSubcommand
- RM_AddCommandKeySpec
- RM_SetCommandKeySpecBeginSearchIndex
- RM_SetCommandKeySpecBeginSearchKeyword
- RM_SetCommandKeySpecFindKeysRange
- RM_SetCommandKeySpecFindKeysKeynum

Currently, we did not add module API to provide additional information about their commands because
we couldn't agree on how the API should look like, see https://github.com/redis/redis/issues/9944.

### Somehow related changes
1. Literals should be in uppercase while placeholder in lowercase. Now all the GEO* command
   will be documented with M|KM|FT|MI and can take both lowercase and uppercase

### Unrelated changes
1. Bugfix: no_madaory_keys was absent in COMMAND's reply
2. expose CMD_MODULE as "module" via COMMAND
3. have a dedicated uint64 for ACL categories (instead of having them in the same uint64 as command flags)

Co-authored-by: Itamar Haber <itamar@garantiadata.com>
2021-12-15 21:23:15 +02:00
丽媛自己动 95f943add6
in line 3749 resetServerSaveParams will set the param to null,so no need (#9943)
to do here
2021-12-15 16:32:14 +02:00
Binbin a7726cdf51
Fix SENTINEL subcommands's arity (#9909)
For `SENTINEL SET`, we can use in these ways:
1. SENTINEL SET mymaster quorum 3
2. SENTINEL SET mymaster quorum 5 parallel-syncs 1

For `SENTINEL SIMULATE-FAILURE`, although it is only used for testing:
1. SENTINEL SIMULATE-FAILURE CRASH-AFTER-ELECTION
2. SENTINEL SIMULATE-FAILURE CRASH-AFTER-ELECTION CRASH-AFTER-PROMOTION
2021-12-08 08:59:02 +02:00
yoav-steinberg 1736fa4d22
Don't write oom score adj to proc unless we're managing it. (#9904)
When disabling redis oom-score-adj managment we restore the
base value read before enabling oom-score-adj management.

This fixes an issue introduced in #9748 where updating
`oom-score-adj-values` while `oom-score-adj` was set to `no`
would write the base oom score adj value read on startup to `/proc`.
This is a bug since while `oom-score-adj` is disabled we should
never write to proc and let external processes manage it.

Added appropriate tests.
2021-12-07 16:05:51 +02:00
meir@redislabs.com cbd463175f Redis Functions - Added redis function unit and Lua engine
Redis function unit is located inside functions.c
and contains Redis Function implementation:
1. FUNCTION commands:
  * FUNCTION CREATE
  * FCALL
  * FCALL_RO
  * FUNCTION DELETE
  * FUNCTION KILL
  * FUNCTION INFO
2. Register engine

In addition, this commit introduce the first engine
that uses the Redis Function capabilities, the
Lua engine.
2021-12-02 19:35:52 +02:00
meir@redislabs.com fc731bc67f Redis Functions - Introduce script unit.
Script unit is a new unit located on script.c.
Its purpose is to provides an API for functions (and eval)
to interact with Redis. Interaction includes mostly
executing commands, but also functionalities like calling
Redis back on long scripts or check if the script was killed.

The interaction is done using a scriptRunCtx object that
need to be created by the user and initialized using scriptPrepareForRun.

Detailed list of functionalities expose by the unit:
1. Calling commands (including all the validation checks such as
   acl, cluster, read only run, ...)
2. Set Resp
3. Set Replication method (AOF/REPLICATION/NONE)
4. Call Redis back to on long running scripts to allow Redis reply
   to clients and perform script kill

The commit introduce the new unit and uses it on eval commands to
interact with Redis.
2021-12-01 23:54:23 +02:00
meir@redislabs.com e0cd580aef Redis Functions - Move Lua related variable into luaCtx struct
The following variable was renamed:
1. lua_caller 			-> script_caller
2. lua_time_limit 		-> script_time_limit
3. lua_timedout 		-> script_timedout
4. lua_oom 			-> script_oom
5. lua_disable_deny_script 	-> script_disable_deny_script
6. in_eval			-> in_script

The following variables was moved to lctx under eval.c
1.  lua
2.  lua_client
3.  lua_cur_script
4.  lua_scripts
5.  lua_scripts_mem
6.  lua_replicate_commands
7.  lua_write_dirty
8.  lua_random_dirty
9.  lua_multi_emitted
10. lua_repl
11. lua_kill
12. lua_time_start
13. lua_time_snapshot

This commit is in a low risk of introducing any issues and it
is just moving varibales around and not changing any logic.
2021-12-01 23:31:08 +02:00
yoav-steinberg 0e5b813ef9
Multiparam config set (#9748)
We can now do: `config set maxmemory 10m repl-backlog-size 5m`

## Basic algorithm to support "transaction like" config sets:

1. Backup all relevant current values (via get).
2. Run "verify" and "set" on everything, if we fail run "restore".
3. Run "apply" on everything (optional optimization: skip functions already run). If we fail run "restore".
4. Return success.

### restore
1. Run set on everything in backup. If we fail log it and continue (this puts us in an undefined
   state but we decided it's better than the alternative of panicking). This indicates either a bug
   or some unsupported external state.
2. Run apply on everything in backup (optimization: skip functions already run). If we fail log
   it (see comment above).
3. Return error.

## Implementation/design changes:
* Apply function are idempotent (have no effect if they are run more than once for the same config).
* No indication in set functions if we're reading the config or running from the `CONFIG SET` command
   (removed `update` argument).
* Set function should set some config variable and assume an (optional) apply function will use that
   later to apply. If we know this setting can be safely applied immediately and can always be reverted
   and doesn't depend on any other configuration we can apply immediately from within the set function
   (and not store the setting anywhere). This is the case of this `dir` config, for example, which has no
   apply function. No apply function is need also in the case that setting the variable in the `server` struct
   is all that needs to be done to make the configuration take effect. Note that the original concept of `update_fn`,
   which received the old and new values was removed and replaced by the optional apply function.
* Apply functions use settings written to the `server` struct and don't receive any inputs.
* I take care that for the generic (non-special) configs if there's no change I avoid calling the setter (possible
   optimization: avoid calling the apply function as well).
* Passing the same config parameter more than once to `config set` will fail. You can't do `config set my-setting
   value1 my-setting value2`.

Note that getting `save` in the context of the conf file parsing to work here as before was a pain.
The conf file supports an aggregate `save` definition, where each `save` line is added to the server's
save params. This is unlike any other line in the config file where each line overwrites any previous
configuration. Since we now support passing multiple save params in a single line (see top comments
about `save` in https://github.com/redis/redis/pull/9644) we should deprecate the aggregate nature of
this config line and perhaps reduce this ugly code in the future.
2021-12-01 10:15:11 +02:00
sundb 4d8700786e
Fix COMMAND GETKEYS on LCS (#9852)
Remove lcsGetKeys to clean up the remaining STRALGO after #9733.
i.e. it still used a getkeys_proc which was still looking for the KEYS or STRINGS arguments
2021-11-28 09:02:38 +02:00
sundb 4512905961
Replace ziplist with listpack in quicklist (#9740)
Part three of implementing #8702, following #8887 and #9366 .

## Description of the feature
1. Replace the ziplist container of quicklist with listpack.
2. Convert existing quicklist ziplists on RDB loading time. an O(n) operation.

## Interface changes
1. New `list-max-listpack-size` config is an alias for `list-max-ziplist-size`.
2. Replace `debug ziplist` command with `debug listpack`.

## Internal changes
1. Add `lpMerge` to merge two listpacks . (same as `ziplistMerge`)
2. Add `lpRepr` to print info of listpack which is used in debugCommand and `quicklistRepr`. (same as `ziplistRepr`)
3. Replace `QUICKLIST_NODE_CONTAINER_ZIPLIST` with `QUICKLIST_NODE_CONTAINER_PACKED`(following #9357 ).
    It represent that a quicklistNode is a packed node, as opposed to a plain node.
4. Remove `createZiplistObject` method, which is never used.
5. Calculate listpack entry size using overhead overestimation in `quicklistAllowInsert`.
    We prefer an overestimation, which would at worse lead to a few bytes below the lowest limit of 4k.

## Improvements
1. Calling `lpShrinkToFit` after converting Ziplist to listpack, which was missed at #9366.
2. Optimize `quicklistAppendPlainNode` to avoid memcpy data.

## Bugfix
1. Fix crash in `quicklistRepr` when ziplist is compressed, introduced from #9366.

## Test
1. Add unittest for `lpMerge`.
2. Modify the old quicklist ziplist corrupt dump test.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-24 13:34:13 +02:00
guybe7 b161cff5f9
QUIT is a command, HOST: and POST are not (#9798)
Some people complain that QUIT is missing from help/command table.
Not appearing in COMMAND command, command stats, ACL, etc.
and instead, there's a hack in processCommand with a comment that looks outdated.
Note that it is [documented](https://redis.io/commands/quit)

At the same time, HOST: and POST are there in the command table although these are not real commands.
They would appear in the COMMAND command, and even in commandstats.

Other changes:
1. Initialize the static logged_time static var in securityWarningCommand
2. add `no-auth` flag to RESET so it can always be executed.
2021-11-23 10:38:25 +02:00
Eduardo Semprebon 1a255e3150
Reject PING with MASTERDOWN when replica-serve-stale-data=no (#9757)
Currently PING returns different status when server is not serving data,
for example when `LOADING` or `BUSY`.
But same was not true for `MASTERDOWN`
This commit makes PING reply with `MASTERDOWN` when
replica-serve-stale-data=no and link is MASTER is down.
2021-11-18 10:53:17 +02:00
guybe7 af7489886d
Obliterate STRALGO! add LCS (which only works on keys) (#9799)
Drop the STRALGO command, now LCS is a command of its own and it only works on keys (not input strings).
The motivation is that STRALGO's syntax was really messed-up...
- assumes all (future) string algorithms will take similar arguments
- mixes command that takes keys and one that doesn't in the same command.
- make it nearly impossible to expose the right key spec in COMMAND INFO (issues cluster clients)
- hard for cluster clients to determine the key names (firstkey, lastkey, etc)
- hard for ACL / flags (is it a read command?)

This is a breaking change.
2021-11-18 10:47:49 +02:00
sundb e725d737fb
Add --large-memory flag for REDIS_TEST to enable tests that consume more than 100mb (#9784)
This is a preparation step in order to add a new test in quicklist.c see #9776
2021-11-16 08:55:10 +02:00
guoxiang1996 aba70df48f
insufficient size for cached client flags in call() (#9783)
The client flags is a 64 bit integer, but the temporary cached value on the stack of call() is 32 bit.
luckily this doesn't lead to any bugs since the only flags used against this variables are below 32 bit.
2021-11-16 08:21:23 +02:00
yoav-steinberg 79ac57561f
Refactor config.c for generic setter interface (#9644)
This refactors all `CONFIG SET`s and conf file loading arguments go through
the generic config handling interface.

Refactoring changes:
- All config params go through the `standardConfig` interface (some stuff which
  is only related to the config file and not the `CONFIG` command still has special
  handling for rewrite/config file parsing, `loadmodule`, for example.) .
- Added `MULTI_ARG_CONFIG` flag for configs to signify they receive a variable
  number of arguments instead of a single argument. This is used to break up space
  separated arguments to `CONFIG SET` so the generic setter interface can pass
  multiple arguments to the setter function. When parsing the config file we also break
  up anything after the config name into multiple arguments to the setter function.

Interface changes:
- A side effect of the above interface is that the `bind` argument in the config file can
  be empty (no argument at all) this is treated the same as passing an single empty
  string argument (same as `save` already used to work).
- Support rewrite and setting `watchdog-period` from config file (was only supported
  by the CONFIG command till now).
- Another side effect is that the `save T X` config argument now supports multiple
  Time-Changes pairs in a single line like its `CONFIG SET` counterpart. So in the
  config file you can either do:
  ```
  save 3600 1
  save 600 10
  ```
  or do
  ```
  save 3600 1 600 10
  ```

Co-authored-by: Bjorn Svensson <bjorn.a.svensson@est.tech>
2021-11-07 13:40:08 +02:00
Eduardo Semprebon 91d0c758e5
Replica keep serving data during repl-diskless-load=swapdb for better availability (#9323)
For diskless replication in swapdb mode, considering we already spend replica memory
having a backup of current db to restore in case of failure, we can have the following benefits
by instead swapping database only in case we succeeded in transferring db from master:

- Avoid `LOADING` response during failed and successful synchronization for cases where the
  replica is already up and running with data.
- Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load
  time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping.
- This could be implemented also for disk replication with similar benefits if consumers are willing
  to spend the extra memory usage.

General notes:
- The concept of `backupDb` becomes `tempDb` for clarity.
- Async loading mode will only kick in if the replica is syncing from a master that has the same
  repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. 
- New property in INFO: `async_loading` to differentiate from the blocking loading
- Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db
  and the tempDb that is passed around.
- Because this is affecting replicas only, we assume that if they are not readonly and write commands
  during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET
  here anyways to avoid complications.

Considerations for review:
- We have many cases where server.loading flag is used and even though I tried my best, there may
  be cases where async_loading should be checked as well and cases where it shouldn't (would require
  very good understanding of whole code)
- Several places that had different behavior depending on the loading flag where actually meant to just
  handle commands coming from the AOF client differently than ones coming from real clients, changed
  to check CLIENT_ID_AOF instead.

**Additional for Release Notes**
- Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't
  contribute on triggering next database SAVE
- New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING
- Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event.
  Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED,
  ABORTED and COMPLETED.
- New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions
  to allow modules to declare they support the diskless replication with async loading (when absent, we fall
  back to disk-based loading).

Co-authored-by: Eduardo Semprebon <edus@saxobank.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-11-04 10:46:50 +02:00
guybe7 f11a2d4dd7
Fix COMMAND GETKEYS on EVAL without keys (#9733)
Add new no-mandatory-keys flag to support COMMAND GETKEYS of commands
which have no mandatory keys.

In the past we would have got this error:
```
127.0.0.1:6379> command getkeys eval "return 1" 0
(error) ERR Invalid arguments specified for command
```
2021-11-03 14:38:26 +02:00
zhaozhao.zz d08f0552ee
rebuild replication backlog index when master restart (#9720)
After PR #9166 , replication backlog is not a real block of memory, just contains a
reference points to replication buffer's block and the blocks index (to accelerate
search offset when partial sync), so we need update both replication buffer's block's
offset and replication backlog blocks index's offset when master restart from RDB,
since the `server.master_repl_offset` is changed.
The implications of this bug was just a slow search, but not a replication failure.
2021-11-02 10:53:52 +02:00
guybe7 975f51fe16
Add new SLOTSRANGE to subcommands table (#9689) 2021-10-27 10:44:14 +03:00
Wang Yuan 9ec3294b97
Add timestamp annotations in AOF (#9326)
Add timestamp annotation in AOF, one part of #9325.

Enabled with the new `aof-timestamp-enabled` config option.

Timestamp annotation format is "#TS:${timestamp}\r\n"."
TS" is short of timestamp and this method could save extra bytes in AOF.

We can use timestamp annotation for some special functions. 
- know the executing time of commands
- restore data to a specific point-in-time (by using redis-check-rdb to truncate the file)
2021-10-25 13:08:34 +03:00
Itamar Haber 00362f2a94
Removes admin acl category from CLIENT TRACKINGINFO (#9662)
overlooked in #9504
2021-10-25 11:33:37 +03:00
Wang Yuan c1718f9d86
Replication backlog and replicas use one global shared replication buffer (#9166)
## Background
For redis master, one replica uses one copy of replication buffer, that is a big waste of memory,
more replicas more waste, and allocate/free memory for every reply list also cost much.
If we set client-output-buffer-limit small and write traffic is heavy, master may disconnect with
replicas and can't finish synchronization with replica. If we set  client-output-buffer-limit big,
master may be OOM when there are many replicas that separately keep much memory.
Because replication buffers of different replica client are the same, one simple idea is that
all replicas only use one replication buffer, that will effectively save memory.

Since replication backlog content is the same as replicas' output buffer, now we
can discard replication backlog memory and use global shared replication buffer
to implement replication backlog mechanism.

## Implementation
I create one global "replication buffer" which contains content of replication stream.
The structure of "replication buffer" is similar to the reply list that exists in every client.
But the node of list is `replBufBlock`, which has `id, repl_offset, refcount` fields.
```c
/* Replication buffer blocks is the list of replBufBlock.
 *
 * +--------------+       +--------------+       +--------------+
 * | refcount = 1 |  ...  | refcount = 0 |  ...  | refcount = 2 |
 * +--------------+       +--------------+       +--------------+
 *      |                                            /       \
 *      |                                           /         \
 *      |                                          /           \
 *  Repl Backlog                               Replia_A      Replia_B
 * 
 * Each replica or replication backlog increments only the refcount of the
 * 'ref_repl_buf_node' which it points to. So when replica walks to the next
 * node, it should first increase the next node's refcount, and when we trim
 * the replication buffer nodes, we remove node always from the head node which
 * refcount is 0. If the refcount of the head node is not 0, we must stop
 * trimming and never iterate the next node. */

/* Similar with 'clientReplyBlock', it is used for shared buffers between
 * all replica clients and replication backlog. */
typedef struct replBufBlock {
    int refcount;           /* Number of replicas or repl backlog using. */
    long long id;           /* The unique incremental number. */
    long long repl_offset;  /* Start replication offset of the block. */
    size_t size, used;
    char buf[];
} replBufBlock;
```
So now when we feed replication stream into replication backlog and all replicas, we only need
to feed stream into replication buffer `feedReplicationBuffer`. In this function, we set some fields of
replication backlog and replicas to references of the global replication buffer blocks. And we also
need to check replicas' output buffer limit to free if exceeding `client-output-buffer-limit`, and trim
replication backlog if exceeding `repl-backlog-size`.

When sending reply to replicas, we also need to iterate replication buffer blocks and send its
content, when totally sending one block for replica, we decrease current node count and
increase the next current node count, and then free the block which reference is 0 from the
head of replication buffer blocks.

Since now we use linked list to manage replication backlog, it may cost much time for iterating
all linked list nodes to find corresponding replication buffer node. So we create a rax tree to
store some nodes  for index, but to avoid rax tree occupying too much memory, i record
one per 64 nodes for index.

Currently, to make partial resynchronization as possible as much, we always let replication
backlog as the last reference of replication buffer blocks, backlog size may exceeds our setting
if slow replicas that reference vast replication buffer blocks, and this method doesn't increase
memory usage since they share replication buffer. To avoid freezing server for freeing unreferenced
replication buffer blocks when we need to trim backlog for exceeding backlog size setting,
we trim backlog incrementally (free 64 blocks per call now), and make it faster in
`beforeSleep` (free 640 blocks).

### Other changes
- `mem_total_replication_buffers`: we add this field in INFO command, it means the total
  memory of replication buffers used.
- `mem_clients_slaves`:  now even replica is slow to replicate, and its output buffer memory
  is not 0, but it still may be 0, since replication backlog and replicas share one global replication
  buffer, only if replication buffer memory is more than the repl backlog setting size, we consider
  the excess as replicas' memory. Otherwise, we think replication buffer memory is the consumption
  of repl backlog.
- Key eviction
  Since all replicas and replication backlog share global replication buffer, we think only the
  part of exceeding backlog size the extra separate consumption of replicas.
  Because we trim backlog incrementally in the background, backlog size may exceeds our
  setting if slow replicas that reference vast replication buffer blocks disconnect.
  To avoid massive eviction loop, we don't count the delayed freed replication backlog into
  used memory even if there are no replicas, i.e. we also regard this memory as replicas's memory.
- `client-output-buffer-limit` check for replica clients
  It doesn't make sense to set the replica clients output buffer limit lower than the repl-backlog-size
  config (partial sync will succeed and then replica will get disconnected). Such a configuration is
  ignored (the size of repl-backlog-size will be used). This doesn't have memory consumption
  implications since the replica client will share the backlog buffers memory.
- Drop replication backlog after loading data if needed
  We always create replication backlog if server is a master, we need it because we put DELs in
  it when loading expired keys in RDB, but if RDB doesn't have replication info or there is no rdb,
  it is not possible to support partial resynchronization, to avoid extra memory of replication backlog,
  we drop it.
- Multi IO threads
 Since all replicas and replication backlog use global replication buffer,  if I/O threads are enabled,
  to guarantee data accessing thread safe, we must let main thread handle sending the output buffer
  to all replicas. But before, other IO threads could handle sending output buffer of all replicas.

## Other optimizations
This solution resolve some other problem:
- When replicas disconnect with master since of out of output buffer limit, releasing the output
  buffer of replicas may freeze server if we set big `client-output-buffer-limit` for replicas, but now,
  it doesn't cause freezing.
- This implementation may mitigate reply list copy cost time(also freezes server) when one replication
  has huge reply buffer and another replica can copy buffer for full synchronization. now, we just copy
  reference info, it is very light.
- If we set replication backlog size big, it also may cost much time to copy replication backlog into
  replica's output buffer. But this commit eliminates this problem.
- Resizing replication backlog size doesn't empty current replication backlog content.
2021-10-25 09:24:31 +03:00
Oran Agra 6b297cd646
Improve errno reporting on fork and fopen rdbLoad failures (#9649)
I moved a bunch of stats in redisFork to be executed only on successful
fork, since they seem wrong to be done when it failed.
I guess when fork fails it does that immediately, no latency spike.
2021-10-24 16:52:44 +03:00
Itamar Haber 48e4d77099
Fixes `CLUSTER COUNTKEYSINSLOT` (#9672)
Introduced via typo in #9504. 
Also adds a sanity test for coverage.
2021-10-24 12:32:53 +03:00
guybe7 8f745da159
Fix sentinel commands, ACL dictIter leak (#9661) 2021-10-21 12:50:58 +03:00
Oran Agra 7d6744c739
fix new cluster tests issues (#9657)
Following #9483 the daily CI exposed a few problems.

* The cluster creation code (uses redis-cli) is complicated to test with TLS enabled.
  for now i'm just skipping them since the tests we run there don't really need that kind of coverage
* cluster port binding failures
  note that `find_available_port` already looks for a free cluster port
  but the code in `wait_server_started` couldn't detect the failure of binding
  (the text it greps for wasn't found in the log)
2021-10-20 15:40:28 +03:00
guybe7 43e736f79b
Treat subcommands as commands (#9504)
## Intro

The purpose is to allow having different flags/ACL categories for
subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't)

We create a small command table for every command that has subcommands
and each subcommand has its own flags, etc. (same as a "regular" command)

This commit also unites the Redis and the Sentinel command tables

## Affected commands

CONFIG
Used to have "admin ok-loading ok-stale no-script"
Changes:
1. Dropped "ok-loading" in all except GET (this doesn't change behavior since
there were checks in the code doing that)

XINFO
Used to have "read-only random"
Changes:
1. Dropped "random" in all except CONSUMERS

XGROUP
Used to have "write use-memory"
Changes:
1. Dropped "use-memory" in all except CREATE and CREATECONSUMER

COMMAND
No changes.

MEMORY
Used to have "random read-only"
Changes:
1. Dropped "random" in PURGE and USAGE

ACL
Used to have "admin no-script ok-loading ok-stale"
Changes:
1. Dropped "admin" in WHOAMI, GENPASS, and CAT

LATENCY
No changes.

MODULE
No changes.

SLOWLOG
Used to have "admin random ok-loading ok-stale"
Changes:
1. Dropped "random" in RESET

OBJECT
Used to have "read-only random"
Changes:
1. Dropped "random" in ENCODING and REFCOUNT

SCRIPT
Used to have "may-replicate no-script"
Changes:
1. Dropped "may-replicate" in all except FLUSH and LOAD

CLIENT
Used to have "admin no-script random ok-loading ok-stale"
Changes:
1. Dropped "random" in all except INFO and LIST
2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY

STRALGO
No changes.

PUBSUB
No changes.

CLUSTER
Changes:
1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots

SENTINEL
No changes.

(note that DEBUG also fits, but we decided not to convert it since it's for
debugging and anyway undocumented)

## New sub-command
This commit adds another element to the per-command output of COMMAND,
describing the list of subcommands, if any (in the same structure as "regular" commands)
Also, it adds a new subcommand:
```
COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)]
```
which returns a set of all commands (unless filters), but excluding subcommands.

## Module API
A new module API, RM_CreateSubcommand, was added, in order to allow
module writer to define subcommands

## ACL changes:
1. Now, that each subcommand is actually a command, each has its own ACL id.
2. The old mechanism of allowed_subcommands is redundant
(blocking/allowing a subcommand is the same as blocking/allowing a regular command),
but we had to keep it, to support the widespread usage of allowed_subcommands
to block commands with certain args, that aren't subcommands (e.g. "-select +select|0").
3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference.
4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands
(e.g. "+client -client|kill"), which wasn't possible in the past.
5. It is also possible to use the allowed_firstargs mechanism with subcommand.
For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except
for setting the log level.
6. All of the ACL changes above required some amount of refactoring.

## Misc
1. There are two approaches: Either each subcommand has its own function or all
   subcommands use the same function, determining what to do according to argv[0].
   For now, I took the former approaches only with CONFIG and COMMAND,
   while other commands use the latter approach (for smaller blamelog diff).
2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec.
4. Bugfix: GETNAME was missing from CLIENT's help message.
5. Sentinel and Redis now use the same table, with the same function pointer.
   Some commands have a different implementation in Sentinel, so we redirect
   them (these are ROLE, PUBLISH, and INFO).
6. Command stats now show the stats per subcommand (e.g. instead of stats just
   for "config" you will have stats for "config|set", "config|get", etc.)
7. It is now possible to use COMMAND directly on subcommands:
   COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and
   can be used in functions lookupCommandBySds and lookupCommandByCString)
8. STRALGO is now a container command (has "help")

## Breaking changes:
1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 11:52:57 +03:00
Bjorn Svensson c9fabc2ef0
Move config `unixsocketperm` to generic configs (#9607)
Since the size of mode_t is platform dependant we handle the
`unixsocketperm` configuration as a generic int type.
mode_t is either an unsigned int or unsigned short (macOS) and
the range-limits allows for a simple cast to a mode_t.
2021-10-18 23:58:52 -07:00
Madelyn Olson a6b5d518a9
Improved the reliability of cluster replica sync tests (#9628)
Improved the reliability of cluster replica sync tests
2021-10-13 00:06:53 -07:00
Bjorn Svensson b874c6f1fc
Move config logfile to generic config (#9592)
Move config `logfile` to generic configs
2021-10-07 22:33:08 -07:00
Bjorn Svensson 54d01e363a
Move config `cluster-config-file` to generic configs (#9597) 2021-10-07 22:32:40 -07:00
Huang Zhw fd135f3e2d
Make tracking invalidation messages always after command's reply (#9422)
Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.
2021-10-07 15:13:42 +03:00
Andy Pan 2391aefd82
Implement anetPipe() to combine creating pipe and setting flags (#9511)
Implement createPipe() to combine creating pipe and setting flags, also reduce
system calls by prioritizing pipe2() over pipe().

Without createPipe(), we have to call pipe() to create a pipe and then call some
functions (like anetCloexec() and anetNonBlock()) of anet.c to set flags respectively,
which leads to some extra system calls, now we can leverage pipe2() to combine
them and make the process of creating pipe more convergent in createPipe().

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-10-06 16:08:13 +03:00
Meir Shpilraien (Spielrein) 4fb39b6700
Added module-acquire-GIL latency stats (#9608)
The new value indicates how long Redis wait to
acquire the GIL after sleep. This can help identify
problems where a module perform some background
operation for a long time (with the GIL held) and
blocks the Redis main thread.
2021-10-06 11:33:01 +03:00
tzongw f5160ed0aa
improve latency when a client is unblocked by module timer (#9593)
Scenario:
1. client block on command `XREAD BLOCK 0 STREAMS mystream  $`
2. in a module, calling `XADD mystream * field value` via lua from a timer callback
3. client will receive response after some latency up to 100ms

Reason:
When `XADD` signal the key `mystream` as ready, `beforeSleep` in next eventloop will call
`handleClientsBlockedOnKeys` to unblock the client and add pending data to write but not
actually install a write handler, so next redis will block in `aeApiPoll` up to 100ms given `hz`
config as default 10, pending data will be sent in another next eventloop by
`handleClientsWithPendingWritesUsingThreads`.

Calling `handleClientsBlockedOnKeys` before `handleClientsWithPendingWritesUsingThreads`
in `beforeSleep` solves the problem.
2021-10-06 10:15:03 +03:00
Oran Agra fba15850e5
Prevent unauthenticated client from easily consuming lots of memory (CVE-2021-32675) (#9588)
This change sets a low limit for multibulk and bulk length in the
protocol for unauthenticated connections, so that they can't easily
cause redis to allocate massive amounts of memory by sending just a few
characters on the network.
The new limits are 10 arguments of 16kb each (instead of 1m of 512mb)
2021-10-04 12:10:31 +03:00
yoav-steinberg 6600253046
Client eviction ci issues (#9549)
Fixing CI test issues introduced in #8687
- valgrind warnings in readQueryFromClient when client was freed by processInputBuffer
- adding DEBUG pause-cron for tests not to be time dependent.
- skipping a test that depends on socket buffers / events not compatible with TLS
- making sure client got subscribed by not using deferring client
2021-09-26 17:45:02 +03:00
yoav-steinberg 2753429c99
Client eviction (#8687)
### Description
A mechanism for disconnecting clients when the sum of all connected clients is above a
configured limit. This prevents eviction or OOM caused by accumulated used memory
between all clients. It's a complimentary mechanism to the `client-output-buffer-limit`
mechanism which takes into account not only a single client and not only output buffers
but rather all memory used by all clients.

#### Design
The general design is as following:
* We track memory usage of each client, taking into account all memory used by the
  client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date
  after reading from the socket, after processing commands and after writing to the socket.
* Based on the used memory we sort all clients into buckets. Each bucket contains all
  clients using up up to x2 memory of the clients in the bucket below it. For example up
  to 1m clients, up to 2m clients, up to 4m clients, ...
* Before processing a command and before sleep we check if we're over the configured
  limit. If we are we start disconnecting clients from larger buckets downwards until we're
  under the limit.

#### Config
`maxmemory-clients` max memory all clients are allowed to consume, above this threshold
we disconnect clients.
This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB
suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%`
would mean 10% of `maxmemory`).

#### Important code changes
* During the development I encountered yet more situations where our io-threads access
  global vars. And needed to fix them. I also had to handle keeps the clients sorted into the
  memory buckets (which are global) while their memory usage changes in the io-thread.
  To achieve this I decided to simplify how we check if we're in an io-thread and make it
  much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking
  if the client is in an io-thread (it wasn't used for anything else) and just used the global
  `io_threads_op` variable the same way to check during writes.
* I optimized the cleanup of the client from the `clients_pending_read` list on client freeing.
  We now store a pointer in the `client` struct to this list so we don't need to search in it
  (`pending_read_list_node`).
* Added `evicted_clients` stat to `INFO` command.
* Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the
  client eviction mechanism. Added corrosponding 'e' flag in the client info string.
* Added `multi-mem` field in the client info string to show how much memory is used up
  by buffered multi commands.
* Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and
  channels (partially), tracking prefixes (partially).
* CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so
  clients will be disconnected between processing different clients and not only before sleep.
  This new function can be used in the future for work we want to do outside the command
  processing loop but don't want to wait for all clients to be processed before we get to it.
  Specifically I wanted to handle output-buffer-limit related closing before we process client
  eviction in case the two race with each other.
* Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction
  buckets.
* Each client now holds a pointer to the client eviction memory usage bucket it belongs to
  and listNode to itself in that bucket for quick removal.
* Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value
  indicating no io-threading is currently being executed.
* In order to track memory used by each clients in real-time we can't rely on updating
  these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()`
  (used to be `clientsCronTrackClientsMemUsage()`) after command processing, after
  writing data to pubsub clients, after writing the output buffer and after reading from the
  socket (and maybe other places too). The function is written to be fast.
* Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before
  processing a command (before performing oom-checks and key-eviction).
* All clients memory usage buckets are grouped as follows:
  * All clients using less than 64k.
  * 64K..128K
  * 128K..256K
  * ...
  * 2G..4G
  * All clients using 4g and up.
* Added client-eviction.tcl with a bunch of tests for the new mechanism.
* Extended maxmemory.tcl to test the interaction between maxmemory and
  maxmemory-clients settings.
* Added an option to flag a numeric configuration variable as a "percent", this means that
  if we encounter a '%' after the number in the config file (or config set command) we
  consider it as valid. Such a number is store internally as a negative value. This way an
  integer value can be interpreted as either a percent (negative) or absolute value (positive).
  This is useful for example if some numeric configuration can optionally be set to a percentage
  of something else.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 14:02:16 +03:00
YaacovHazan a56d4533b7
Adding ACL support for modules (#9309)
This commit introduced a new flag to the RM_Call:
'C' - Check if the command can be executed according to the ACLs associated with it.

Also, three new API's added to check if a command, key, or channel can be executed or accessed
by a user, according to the ACLs associated with it.
- RM_ACLCheckCommandPerm
- RM_ACLCheckKeyPerm
- RM_ACLCheckChannelPerm

The user for these API's is a RedisModuleUser object, that for a Module user returned by the RM_CreateModuleUser API, or for a general ACL user can be retrieved by these two new API's:
- RM_GetCurrentUserName - Retrieve the user name of the client connection behind the current context.
- RM_GetModuleUserFromUserName - Get a RedisModuleUser from a user name

As a result of getting a RedisModuleUser from name, it can now also access the general ACL users (not just ones created by the module).
This mean the already existing API RM_SetModuleUserACL(), can be used to change the ACL rules for such users.
2021-09-23 08:52:56 +03:00
Binbin 14d6abd8e9
Add ZMPOP/BZMPOP commands. (#9484)
This is similar to the recent addition of LMPOP/BLMPOP (#9373), but zset.

Syntax for the new ZMPOP command:
`ZMPOP numkeys [<key> ...] MIN|MAX [COUNT count]`

Syntax for the new BZMPOP command:
`BZMPOP timeout numkeys [<key> ...] MIN|MAX [COUNT count]`

Some background:
- ZPOPMIN/ZPOPMAX take only one key, and can return multiple elements.
- BZPOPMIN/BZPOPMAX take multiple keys, but return only one element from just one key.
- ZMPOP/BZMPOP can take multiple keys, and can return multiple elements from just one key.

Note that ZMPOP/BZMPOP can take multiple keys, it eventually operates on just on key.
And it will propagate as ZPOPMIN or ZPOPMAX with the COUNT option.

As new commands, if we can not pop any elements, the response like:
- ZMPOP: Return a NIL in both RESP2 and RESP3, unlike ZPOPMIN/ZPOPMAX return emptyarray.
- BZMPOP: Return a NIL in both RESP2 and RESP3 when timeout is reached, like BZPOPMIN/BZPOPMAX.

For the normal response is nested arrays in RESP2 and RESP3:
```
ZMPOP/BZMPOP
1) keyname
2) 1) 1) member1
      2) score1
   2) 1) member2
      2) score2

In RESP2:
1) "myzset"
2) 1) 1) "three"
      2) "3"
   2) 1) "two"
      2) "2"

In RESP3:
1) "myzset"
2) 1) 1) "three"
      2) (double) 3
   2) 1) "two"
      2) (double) 2
```
2021-09-23 08:34:40 +03:00
Binbin f898a9e97d
Adds limit to SINTERCARD/ZINTERCARD. (#9425)
Implements the [LIMIT limit] variant of SINTERCARD/ZINTERCARD.
Now with the LIMIT, we can stop the searching when cardinality
reaching the limit, and return the cardinality ASAP.

Note that in SINTERCARD, the old synatx was: `SINTERCARD key [key ...]`
In order to add a optional parameter, we must break the old synatx.
So the new syntax of SINTERCARD will be consistent with ZINTERCARD.
New syntax: `SINTERCARD numkeys key [key ...] [LIMIT limit]`.

Note that this means that SINTERCARD has a different syntax than
SINTER and SINTERSTORE (taking numkeys argument)

As for ZINTERCARD, we can easily add a optional parameter to it.
New syntax: `ZINTERCARD numkeys key [key ...] [LIMIT limit]`
2021-09-16 14:07:08 +03:00
guybe7 08f4e1335c
createSharedObjects: zopomin and zpopmax apeear twice (#9505)
Introduced by https://github.com/redis/redis/pull/9502
2021-09-15 15:29:35 +03:00
guybe7 7759ec7c43
Cleanup: propagate and alsoPropagate do not need redisCommand (#9502)
The `cmd` argument was completely unused, and all the code that bothered to pass it was unnecessary.
This is a prepartion for a future commit that treats subcommands as commands
2021-09-15 12:53:42 +03:00
guybe7 03fcc211de
A better approach for COMMAND INFO for movablekeys commands (#8324)
Fix #7297

The problem:

Today, there is no way for a client library or app to know the key name indexes for commands such as
ZUNIONSTORE/EVAL and others with "numkeys", since COMMAND INFO returns no useful info for them.

For cluster-aware redis clients, this requires to 'patch' the client library code specifically for each of these commands or to
resolve each execution of these commands with COMMAND GETKEYS.

The solution:

Introducing key specs other than the legacy "range" (first,last,step)

The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates
the command doesn't take any key arguments or may contain one or more key-specs, each one may leads to the discovery
of 0 or more key arguments.

A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will
obviously remain unchanged.

A client that supports this key-specs feature needs only to look at the key-specs array. If it finds an unrecognized spec, it
must resort to using COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order
to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no
need to use GETKEYS.

Each spec is an array of arguments, first one is the spec name, the second is an array of flags, and the third is an array
containing details about the spec (specific meaning for each spec type)
The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write.
clients should ignore any unfamiliar flags.

In order to easily find the positions of keys in a given array of args we introduce keys specs. There are two logical steps of
key specs:
1. `start_search`: Given an array of args, indicate where we should start searching for keys
2. `find_keys`: Given the output of start_search and an array of args, indicate all possible indices of keys.

### start_search step specs
- `index`: specify an argument index explicitly
  - `index`: 0 based index (1 means the first command argument)
- `keyword`: specify a string to match in `argv`. We should start searching for keys just after the keyword appears.
  - `keyword`: the string to search for
  - `start_search`: an index from which to start the keyword search (can be negative, which means to search from the end)

Examples:
- `SET` has start_search of type `index` with value `1`
- `XREAD` has start_search of type `keyword` with value `[“STREAMS”,1]`
- `MIGRATE` has start_search of type `keyword` with value `[“KEYS”,-2]`

### find_keys step specs
- `range`: specify `[count, step, limit]`.
  - `lastkey`: index of the last key. relative to the index returned from begin_search. -1 indicating till the last argument, -2 one before the last
  - `step`: how many args should we skip after finding a key, in order to find the next one
  - `limit`: if count is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means ½ of the remaining args, 3 means ⅓, and so on.
- “keynum”: specify `[keynum_index, first_key_index, step]`.
  - `keynum_index`: is relative to the return of the `start_search` spec.
  - `first_key_index`: is relative to `keynum_index`.
  - `step`: how many args should we skip after finding a key, in order to find the next one

Examples:
- `SET` has `range` of `[0,1,0]`
- `MSET` has `range` of `[-1,2,0]`
- `XREAD` has `range` of `[-1,1,2]`
- `ZUNION` has `start_search` of type `index` with value `1` and `find_keys` of type `keynum` with value `[0,1,1]`
- `AI.DAGRUN` has `start_search` of type `keyword` with value `[“LOAD“,1]` and `find_keys` of type `keynum` with value
  `[0,1,1]` (see https://oss.redislabs.com/redisai/master/commands/#aidagrun)

Note: this solution is not perfect as the module writers can come up with anything, but at least we will be able to find the key
args of the vast majority of commands.
If one of the above specs can’t describe the key positions, the module writer can always fall back to the `getkeys-api` option.

Some keys cannot be found easily (`KEYS` in `MIGRATE`: Imagine the argument for `AUTH` is the string “KEYS” - we will
start searching in the wrong index). 
The guarantee is that the specs may be incomplete (`incomplete` will be specified in the spec to denote that) but we never
report false information (assuming the command syntax is correct).
For `MIGRATE` we start searching from the end - `startfrom=-1` - and if one of the keys is actually called "keys" we will
report only a subset of all keys - hence the `incomplete` flag.
Some `incomplete` specs can be completely empty (i.e. UNKNOWN begin_search) which should tell the client that
COMMAND GETKEYS (or any other way to get the keys) must be used (Example: For `SORT` there is no way to describe
the STORE keyword spec, as the word "store" can appear anywhere in the command).

We will expose these key specs in the `COMMAND` command so that clients can learn, on startup, where the keys are for
all commands instead of holding hardcoded tables or use `COMMAND GETKEYS` in runtime.

Comments:
1. Redis doesn't internally use the new specs, they are only used for COMMAND output.
2. In order to support the current COMMAND INFO format (reply array indices 4, 5, 6) we created a synthetic range, called
   legacy_range, that, if possible, is built according to the new specs.
3. Redis currently uses only getkeys_proc or the legacy_range to get the keys indices (in COMMAND GETKEYS for
   example).

"incomplete" specs:
the command we have issues with are MIGRATE, STRALGO, and SORT
for MIGRATE, because the token KEYS, if exists, must be the last token, we can search in reverse. it one of the keys is
actually the string "keys" will return just a subset of the keys (hence, it's "incomplete")
for SORT and STRALGO we can use this heuristic (the keys can be anywhere in the command) and therefore we added a
key spec that is both "incomplete" and of "unknown type"

if a client encounters an "incomplete" spec it means that it must find a different way (either COMMAND GETKEYS or have
its own parser) to retrieve the keys.
please note that all commands, apart from the three mentioned above, have "complete" key specs
2021-09-15 11:10:29 +03:00
zhaozhao.zz 794442b130
PSYNC2: make partial sync possible after master reboot (#8015)
The main idea is how to allow a master to load replication info from RDB file when rebooting, if master can load replication info it means that replicas may have the chance to psync with master, it can save much traffic.

The key point is we need guarantee safety and consistency, so there
are two differences between master and replica:

1. master would load the replication info as secondary ID and
   offset, in case other masters have the same replid.
2. when master loading RDB, it would propagate expired keys as DEL
   command to replication backlog, then replica can receive these
   commands to delete stale keys.
   p.s. the expired keys when RDB loading is useful for users, so
   we show it as `rdb_last_load_keys_expired` and `rdb_last_load_keys_loaded` in info persistence.

Moreover, after load replication info, master should update
`no_replica_time` in case loading RDB cost too long time.
2021-09-13 15:39:11 +08:00
zhaozhao.zz d7fa44f4da
init client pause value in more appropriate place (#9479) 2021-09-10 14:02:45 +08:00
yvette903 f560531d5b
Fix: client pause uses an old timeout (#9477)
A write request may be paused unexpectedly because `server.client_pause_end_time` is old.

**Recreate this:**
redis-cli -p 6379
127.0.0.1:6379> client pause 500000000 write
OK
127.0.0.1:6379> client unpause
OK
127.0.0.1:6379> client pause 10000 write
OK
127.0.0.1:6379> set key value

The write request `set key value` is paused util  the timeout of 500000000 milliseconds was reached.

**Fix:**
reset `server.client_pause_end_time` = 0 in `unpauseClients`
2021-09-09 13:44:48 +03:00
Binbin c50af0aeba
Add LMPOP/BLMPOP commands. (#9373)
We want to add COUNT option for BLPOP.
But we can't do it without breaking compatibility due to the command arguments syntax.
So this commit introduce two new commands.

Syntax for the new LMPOP command:
`LMPOP numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Syntax for the new BLMPOP command:
`BLMPOP timeout numkeys [<key> ...] LEFT|RIGHT [COUNT count]`

Some background:
- LPOP takes one key, and can return multiple elements.
- BLPOP takes multiple keys, but returns one element from just one key.
- LMPOP can take multiple keys and return multiple elements from just one key.

Note that LMPOP/BLMPOP  can take multiple keys, it eventually operates on just one key.
And it will propagate as LPOP or RPOP with the COUNT option.

As a new command, it still return NIL if we can't pop any elements.
For the normal response is nested arrays in RESP2 and RESP3, like:
```
LMPOP/BLMPOP 
1) keyname
2) 1) element1
   2) element2
```
I.e. unlike BLPOP that returns a key name and one element so it uses a flat array,
and LPOP that returns multiple elements with no key name, and again uses a flat array,
this one has to return a nested array, and it does for for both RESP2 and RESP3 (like SCAN does)

Some discuss can see: #766 #8824
2021-09-09 12:02:33 +03:00
Huang Zhw 216f168b2b
Add INFO total_active_defrag_time and current_active_defrag_time (#9377)
Add two INFO metrics:
```
total_active_defrag_time:12345
current_active_defrag_time:456
```
`current_active_defrag_time` if greater than 0, means how much time has
passed since active defrag started running. If active defrag stops, this metric is reset to 0.
`total_active_defrag_time` means total time the fragmentation
was over the defrag threshold since the server started.

This is a followup PR for #9031
2021-09-09 11:38:10 +03:00
zhaozhao.zz 1b83353dc3
Fix wrong offset when replica pause (#9448)
When a replica paused, it would not apply any commands event the command comes from master, if we feed the non-applied command to replication stream, the replication offset would be wrong, and data would be lost after failover(since replica's `master_repl_offset` grows but command is not applied).

To fix it, here are the changes:
* Don't update replica's replication offset or propagate commands to sub-replicas when it's paused in `commandProcessed`.
* Show `slave_read_repl_offset` in info reply.
* Add an assert to make sure master client should never be blocked unless pause or module (some modules may use block way to do background (parallel) processing and forward original block module command to the replica, it's not a good way but it can work, so the assert excludes module now, but someday in future all modules should rewrite block command to propagate like what `BLPOP` does).
2021-09-08 16:07:25 +08:00
guybe7 6aa2285e32
Fix two minor bugs (MIGRATE key args and getKeysUsingCommandTable) (#9455)
1. MIGRATE has a potnetial key arg in argv[3]. It should be reflected in the command table.
2. getKeysUsingCommandTable should never free getKeysResult, it is always freed by the caller)
   The reason we never encountered this double-free bug is that almost always getKeysResult
   uses the statis buffer and doesn't allocate a new one.
2021-09-02 17:19:27 +03:00
Viktor Söderqvist f24c63a292
Slot-to-keys using dict entry metadata (#9356)
* Enhance dict to support arbitrary metadata carried in dictEntry

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

* Rewrite slot-to-keys mapping to linked lists using dict entry metadata

This is a memory enhancement for Redis Cluster.

The radix tree slots_to_keys (which duplicates all key names prefixed with their
slot number) is replaced with a linked list for each slot. The dict entries of
the same cluster slot form a linked list and the pointers are stored as metadata
in each dict entry of the main DB dict.

This commit also moves the slot-to-key API from db.c to cluster.c.

Co-authored-by: Jim Brunner <brunnerj@amazon.com>
2021-08-30 23:25:36 -07:00
Garen Chan 945a83d406
Fix boundary problem of adjusting open files limit. (#5722)
When `decr_step` is greater than `oldlimit`, the final `bestlimit` may be invalid.

    For example, oldlimit = 10, decr_step = 16.
    Current bestlimit = 15 and setrlimit() failed. Since bestlimit  is less than decr_step , then exit the loop.
    The final bestlimit is larger than oldlimit but is invalid.

Note that this only matters if the system fd limit is below 16, so unlikely to have any actual effect.
2021-08-24 22:54:21 +03:00
Yossi Gottlieb 1221f7cd5e
Improve setup operations order after fork. (#9365)
The order of setting things up follows some reasoning: Setup signal
handlers first because a signal could fire at any time. Adjust OOM score
before everything else to assist the OOM killer if memory resources are
low.

The trigger for this is a valgrind test failure which resulted with the
child catching a SIGUSR1 before initializing the handler.
2021-08-12 14:31:12 +03:00
DarrenJiang13 8ab33c18e4
fix a compilation error around madvise when make with jemalloc on MacOS (#9350)
We only use MADV_DONTNEED on Linux, that's were it was tested.
2021-08-10 11:32:27 +03:00
sundb 02fd76b97c
Replace all usage of ziplist with listpack for t_hash (#8887)
Part one of implementing #8702 (taking hashes first before other types)

## Description of the feature
1. Change ziplist encoded hash objects to listpack encoding.
2. Convert existing ziplists on RDB loading time. an O(n) operation.

## Rdb format changes
1. Add RDB_TYPE_HASH_LISTPACK rdb type.
2. Bump RDB_VERSION to 10

## Interface changes
1. New `hash-max-listpack-entries` config is an alias for `hash-max-ziplist-entries` (same with `hash-max-listpack-value`)
2. OBJECT ENCODING will return `listpack` instead of `ziplist`

## Listpack improvements:
1. Support direct insert, replace integer element (rather than convert back and forth from string)
3. Add more listpack capabilities to match the ziplist ones (like `lpFind`, `lpRandomPairs` and such)
4. Optimize element length fetching, avoid multiple calculations
5. Use inline to avoid function call overhead.

## Tests
1. Add a new test to the RDB load time conversion
2. Adding the listpack unit tests. (based on the one in ziplist.c)
3. Add a few "corrupt payload: fuzzer findings" tests, and slightly modify existing ones.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-10 09:18:49 +03:00
Eduardo Semprebon d3356bf614
Add SORT_RO command (#9299)
Add a readonly variant of the STORE command, so it can be used on
read-only workloads (replica, ACL, etc)
2021-08-09 09:40:29 +03:00
yoav-steinberg 0a9377535b
Ignore resize threshold on idle qbuf resizing (#9322)
Also update qbuf tests to verify both idle and peak based resizing logic.
And delete unused function: getClientsMaxBuffers
2021-08-06 20:50:34 +03:00
yoav-steinberg 5e908a290c
dict struct memory optimizations (#9228)
Reduce dict struct memory overhead
on 64bit dict size goes down from jemalloc's 96 byte bin to its 56 byte bin.

summary of changes:
- Remove `privdata` from callbacks and dict creation. (this affects many files, see "Interface change" below).
- Meld `dictht` struct into the `dict` struct to eliminate struct padding. (this affects just dict.c and defrag.c)
- Eliminate the `sizemask` field, can be calculated from size when needed.
- Convert the `size` field into `size_exp` (exponent), utilizes one byte instead of 8.

Interface change: pass dict pointer to dict type call back functions.
This is instead of passing the removed privdata field. In the future if
we'd like to have private data in the callbacks we can extract it from
the dict type. We can extend dictType to include a custom dict struct
allocator and use it to allocate more data at the end of the dict
struct. This data can then be used to store private data later acccessed
by the callbacks.
2021-08-05 08:25:58 +03:00
Wang Yuan d4bca53cd9
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.

But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.

## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.

Moreover, CoW is based on  pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.

There are some buffers we can release in the fork child process:
- **Serialized key-values**
  the fork child process never access serialized key-values, so we try to free them.
  Because we only can release big bulk memory, and it is time consumed to iterate all
  items/members/fields/entries of complex data type. So we decide to iterate them and
  try to release them only when their average size of item/member/field/entry is more
  than page size of OS.
- **Replication backlog**
  Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
  write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
  If clients have requests during having the fork child process, clients' buffer also be changed
  frequently. The memory includes client query buffer, output buffer, and client struct used memory.

To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-04 23:01:46 +03:00
Jonah H. Harris 432c92d8df
Add SINTERCARD/ZINTERCARD Commands (#8946)
Add SINTERCARD and ZINTERCARD commands that are similar to
ZINTER and SINTER but only return the cardinality with minimum
processing and memory overheads.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-03 11:45:27 +03:00
Binbin 4000cb7d34
Modify some error logs printing level. (#9306)
1. In sendBulkToSlave, we used LL_VERBOSE in the past, changed to
LL_WARNING. (all the other places that do freeClient(slave) use LL_WARNING)
2. The old style LOG_WARNING, chang it to LL_WARNING. Introduced in an
old pr (#1690).
2021-08-02 11:18:35 +03:00
Ning Sun f74af0e61d
Add NX/XX/GT/LT options to EXPIRE command group (#2795)
Add NX, XX, GT, and LT flags to EXPIRE, PEXPIRE, EXPIREAT, PEXAPIREAT.
- NX - only modify the TTL if no TTL is currently set 
- XX - only modify the TTL if there is a TTL currently set 
- GT - only increase the TTL (considering non-volatile keys as infinite expire time)
- LT - only decrease the TTL (considering non-volatile keys as infinite expire time)
return value of the command is 0 when the operation was skipped due to one of these flags.

Signed-off-by: Ning Sun <sunng@protonmail.com>
2021-08-02 08:57:49 +03:00
Huang Zhw 17511df59b
Add INFO stat total_eviction_exceeded_time and current_eviction_exceeded_time (#9031)
Add two INFO metrics:
```
total_eviction_exceeded_time:69734
current_eviction_exceeded_time:10230
```
`current_eviction_exceeded_time` if greater than 0, means how much time current used memory is greater than `maxmemory`. And we are still over the maxmemory. If used memory is below `maxmemory`, this metric is reset to 0.
`total_eviction_exceeded_time` means total time used memory is greater than `maxmemory` since server startup. 
The units of these two metrics are ms.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-26 10:07:20 +03:00
Oran Agra 32e61ee295
Fix ACL category for SELECT, WAIT, ROLE, LASTSAVE, READONLY, READWRITE, ASKING (#9208)
- SELECT and WAIT don't read or write from the keyspace (unlike DEL, EXISTS, EXPIRE, DBSIZE, KEYS, etc).
they're more similar to AUTH and HELLO (and maybe PING and COMMAND).
they only affect the current connection, not the server state, so they should be `@connection`, not `@keyspace`

- ROLE, like LASTSAVE is `@admin` (and `@dangerous` like INFO) 

- ASKING, READONLY, READWRITE are `@connection` too (not `@keyspace`)

- Additionally, i'm now documenting the exact meaning of each ACL category so it's clearer which commands belong where.
2021-07-20 21:48:43 +03:00
perryitay ac8b1df885
Fail EXEC command in case a watched key is expired (#9194)
There are two issues fixed in this commit: 
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-07-11 13:17:23 +03:00
Huang Zhw 9a22457e8f
Start redis-check-aof/redis-check-rdb only if executable name contains (#9215)
redis-check-aof/redis-check-rdb.

Related to #9176. Before this commit, redis-server starts as
redis-check-aof/redis-check-rdb if the directory it is started from
contains the string redis-check-aof/redis-check-rdb. We check the
executable name instead of directory.
2021-07-09 21:45:58 +03:00
Oran Agra ec582cc7ad Query buffer shrinking improvements (#5013)
when tracking the peak, don't reset the peak to 0, reset it to the
maximum of the current used, and the planned to be used by the current
arg.

when shrining, split the two separate conditions.
the idle time shrinking will remove all free space.
but the peak based shrinking will keep room for the current arg.

when we resize due to a peak (rahter than idle time), don't trim all
unused space, let the qbuf keep a size that's sufficient for the
currently process bulklen, and the current peak.

Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: yoav-steinberg <yoav@monfort.co.il>
2021-07-05 09:30:16 +03:00
zhaozhao.zz 2248eaacac resize query buffer more accurately
1. querybuf_peak has not been updated correctly in readQueryFromClient.
2. qbuf shrinking uses sdsalloc instead of sdsAllocSize

see more details in issue #4983
2021-07-05 09:30:16 +03:00
Madelyn Olson 8f59f131e5
Update incrDecrCommand to use addReplyLongLong (#9188)
Update incrDecrCommand to use addReplyLongLong
2021-07-03 10:51:53 -05:00
Wang Yuan 16e04ed944
Don't start in sentinel mode if only the folder name contains redis-sentinel (#9176)
Before this commit, redis-server starts in sentinel mode if the first startup
argument has the string redis-sentinel, so redis also starts in sentinel mode
if the directory it was started from contains the string redis-sentinel.
Now we check the executable name instead of directory.

Some examples:
1. Execute ./redis-sentinel/redis/src/redis-sentinel, starts in sentinel mode.
2. Execute ./redis-sentinel/redis/src/redis-server, starts in server mode,
   but before, redis will start in sentinel mode.
3. Execute ./redis-sentinel/redis/src/redis-server --sentinel, of course, like
   before, starts in sentinel mode.
2021-07-01 08:05:18 +03:00
Yossi Gottlieb f233c4c59d
Add bind-source-addr configuration argument. (#9142)
In the past, the first bind address that was explicitly specified was
also used to bind outgoing connections. This could result with some
problems. For example: on some systems using `bind 127.0.0.1` would
result with outgoing connections also binding to `127.0.0.1` and failing
to connect to remote addresses.

With the recent change to the way `bind` is handled, this presented
other issues:

* The default first bind address is '*' which is not a valid address.
* We make no distinction between user-supplied config that is identical
to the default, and the default config.

This commit addresses both these issues by introducing an explicit
configuration parameter to control the bind address on outgoing
connections.
2021-06-24 19:48:18 +03:00
Yossi Gottlieb 07b0d144ce
Improve bind and protected-mode config handling. (#9034)
* Specifying an empty `bind ""` configuration prevents Redis from listening on any TCP port. Before this commit, such configuration was not accepted.
* Using `CONFIG GET bind` will always return an explicit configuration value. Before this commit, if a bind address was not specified the returned value was empty (which was an anomaly).

Another behavior change is that modifying the `bind` configuration to a non-default value will NO LONGER DISABLE protected-mode implicitly.
2021-06-22 12:50:17 +03:00
gourav 6248127c43
Use exitFromChild to exit from child process in linuxMadvFreeForkBugCheck (#9101) 2021-06-17 13:37:38 -07:00
Uri Shachar c7e502a07b
Cleaning up the cluster interface by moving almost all related declar… (#9080)
* Cleaning up the cluster interface by moving almost all related declarations into cluster.h
(no logic change -- just moving declarations/definitions around)

This initial effort leaves two items out of scope - the configuration parsing into the server
struct and the internals exposed by the clusterNode struct.

* Remove unneeded declarations of dictSds*
Ideally all the dictSds functionality would move from server.c into a dedicated module
so we can avoid the duplication in redis-benchmark/cli

* Move crc16 back into server.h, will be moved out once we create a seperate header file for
hashing functions
2021-06-15 20:35:13 -07:00
sundb e5d8a5eb85
Fix the wrong reisze of querybuf (#9003)
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).

## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
2021-06-15 14:46:19 +03:00
YaacovHazan 1677efb9da
cleanup around loadAppendOnlyFile (#9012)
Today when we load the AOF on startup, the loadAppendOnlyFile checks if
the file is openning for reading.
This check is redundent (dead code) as we open the AOF file for writing at initServer,
and the file will always be existing for the loadAppendOnlyFile.

In this commit:
- remove all the exit(1) from loadAppendOnlyFile, as it is the caller
  responsibility to decide what to do in case of failure.
- move the opening of the AOF file for writing, to be after we loading it.
- avoid return -ERR in DEBUG LOADAOF, when the AOF is existing but empty
2021-06-14 10:38:08 +03:00
Binbin 0bfccc55e2
Fixed some typos, add a spell check ci and others minor fix (#8890)
This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes.
This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything,
but at least it is also unlikely to cause false positives.

Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use.

Here's a summary of other changes:
1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces).
2. Outdated function / variable / argument names in comments
3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString.
4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751
5. Some outdated https link URLs.
6. Fix some outdated comment. Such as:
    - In README: about the rdb, we used to said create a `thread`, change to `process`
    - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey)
    - notifyKeyspaceEvent fucntion comment (add type arg)
    - Some others minor fix in comment (Most of them are incorrectly quoted by variable names)
7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
2021-06-10 15:39:33 +03:00
ny0312 53d1acd598
Always replicate time-to-live(TTL) as absolute timestamps in milliseconds (#8474)
Till now, on replica full-sync we used to transfer absolute time for TTL,
however when a command arrived (EXPIRE or EXPIREAT),
we used to propagate it as is to replicas (possibly with relative time),
but always translate it to EXPIREAT (absolute time) to AOF.

This commit changes that and will always use absolute time for propagation.
see discussion in #8433

Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME`
that allow extracting the absolute TTL time from a key.
2021-05-30 09:20:32 +03:00
Madelyn Olson a59e75a475
Hide migrate command from slowlog if they include auth (#8859)
Redact commands that include sensitive data from slowlog and monitor
2021-05-19 08:23:54 -07:00
Oran Agra fbc0e2b834
Reset lazyfreed_objects info field with RESETSTAT, test for stream lazyfree (#8934)
And also add tests to cover lazy free of streams with various types of
metadata (see #8932)
2021-05-17 16:54:37 +03:00
Raghav Muddur 31edc22ecc
EVALSHA_RO and EVAL_RO Commands (#8820)
* EVALSHA_RO and EVAL_RO Commands

Added new readonly versions of EVAL
and EVALSHA.
2021-05-12 21:07:34 -07:00
yoav-steinberg 152fce5e2c
Enforce client output buffer soft limit when no traffic. (#8833)
When client breached the output buffer soft limit but then went idle,
we didn't disconnect on soft limit timeout, now we do.
Note this also resolves some sporadic test failures in due to Linux
buffering data which caused tests to fail if during the test we went
back under the soft COB limit.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: sundb <sundbcn@gmail.com>
2021-05-04 13:45:08 +03:00
Binbin 9c927e9de9
Delete some unimplemented prototype. (#8882)
Remove forward declarations from header files to functions that do not exist.
2021-04-29 08:25:10 +03:00
Oran Agra 46f4ebbe84
Prevent replicas from sending commands that interact with keyspace (#8868)
This solves an issue reported in #8712 in which a replica would bypass
the client write pause check and cause an assertion due to executing a
write command during failover.

The fact is that we don't expect replicas to execute any command other
than maybe REPLCONF and PING, etc. but matching against the ADMIN
command flag is insufficient, so instead i just block keyspace access
for now.
2021-04-27 08:15:10 +03:00
Ikko Ashimine ac36161694
Fix typo in server.c (#8809) 2021-04-21 13:43:38 +03:00
Wen Hui 0413fbc7d0
fix invalid master_link_down_since_seconds in info repication (#8785)
When replica never successfully connect to master, server.repl_down_since
will be initialized to 0, therefore, the info master_link_down_since_seconds
was showing the current unix timestamp, which does not make much sense.

This commit fixes the issue by showing master_link_down_since_seconds to -1.
means the replica never connect to master before.

This commit also resets this variable back to 0 when a replica is turned into
a master, so that it'll behave the same if the master is later turned into a
replica again.

The implication of this change is that if some app is checking if the value is > 60
do something, like conclude the replica is stale, this could case harm (changing
a big positive number with a small one).
2021-04-19 09:34:21 +03:00
Wen Hui 38da8d07d0
Clean up no-conf server warning for sentinel mode (#8769) 2021-04-13 16:28:54 +03:00
Wang Yuan a0e19e3cf1
Fix wrong check for aof fsync and handle aof fsync errno (#8751)
The bio aof fsync fd may be closed by main thread (AOFRW done handler)
and even possibly reused for another socket, pipe, or file.
This can can an EBADF or EINVAL fsync error, which will lead to -MISCONF errors failing all writes.
We just ignore these errno because aof fsync did not really fail.

We handle errno when fsyncing aof in bio, so we could know the real reason
when users get -MISCONF Errors writing to the AOF file error

Issue created with #8419
2021-04-11 08:14:31 +03:00
Wang Yuan 1eb85249e7
Handle remaining fsync errors (#8419)
In `aof.c`, we call fsync when stop aof, and now print a log to let user know that if fail.
In `cluster.c`, we now return error, the calling function already handles these write errors.
In `redis-cli.c`, users hope to save rdb, we now print a message if fsync failed.
In `rio.c`, we now treat fsync errors like we do for write errors. 
In `server.c`, we try to fsync aof file when shutdown redis, we only can print one log if fail.
In `bio.c`, if failing to fsync aof file, we will set `aof_bio_fsync_status` to error , and reject writing just like last writing aof error,  moreover also set INFO command field `aof_last_write_status` to error.
2021-04-01 12:45:15 +03:00
Wen Hui d5935bb0a4
generalize config file check for sentinel (#8730)
The implications of this change is just that in the past when a config file was missing,
in some cases it was exiting before printing the sever startup prints and sometimes after,
and now it'll always exit before printing them.
2021-04-01 09:01:05 +03:00
Jérôme Loyet 91f4f41665
Add replica-announced config option (#8653)
The 'sentinel replicas <master>' command will ignore replicas with
`replica-announced` set to no.

The goal of disabling the config setting replica-announced is to allow ghost
replicas. The replica is in the cluster, synchronize with its master, can be
promoted to master and is not exposed to sentinel clients. This way, it is
acting as a live backup or living ghost.

In addition, to prevent the replica to be promoted as master, set
replica-priority to 0.
2021-03-30 23:40:22 +03:00
Huang Zhw e138698e54
make processCommand check publish channel permissions. (#8534)
Add publish channel permissions check in processCommand.

processCommand didn't check publish channel permissions, so we can
queue a publish command in a transaction. But when exec the transaction,
it will fail with -NOPERM.

We also union keys/commands/channels permissions check togegher in
ACLCheckAllPerm. Remove pubsubCheckACLPermissionsOrReply in 
publishCommand/subscribeCommand/psubscribeCommand. Always 
check permissions in processCommand/execCommand/
luaRedisGenericCommand.
2021-03-26 14:10:01 +03:00
Oran Agra 497351ad07
Fix SLOWLOG for blocked commands (#8632)
* SLOWLOG didn't record anything for blocked commands because the client
  was reset and argv was already empty. there was a fix for this issue
  specifically for modules, now it works for all blocked clients.
* The original command argv (before being re-written) was also reset
  before adding the slowlog on behalf of the blocked command.
* Latency monitor is now updated regardless of the slowlog flags of the
  command or its execution (their purpose is to hide sensitive info from
  the slowlog, not hide the fact the latency happened).
* Latency monitor now uses real_cmd rather than c->cmd (which may be
  different if the command got re-written, e.g. GEOADD)

Changes:
* Unify shared code between slowlog insertion in call() and
  updateStatsOnUnblock(), hopefully prevent future bugs from happening
  due to the later being overlooked.
* Reset CLIENT_PREVENT_LOGGING in resetClient rather than after command
  processing.
* Add a test for SLOWLOG and BLPOP

Notes:
- real_cmd == c->lastcmd, except inside MULTI and Lua.
- blocked commands never happen in these cases (MULTI / Lua)
- real_cmd == c->cmd, except for when the command is rewritten (e.g.
  GEOADD)
- blocked commands (currently) are never rewritten
- other than the command's CLIENT_PREVENT_LOGGING, and the
  execution flag CLIENT_PREVENT_LOGGING, other cases that we want to
  avoid slowlog are on AOF loading (specifically CMD_CALL_SLOWLOG will
  be off when executed from execCommand that runs from an AOF)
2021-03-25 10:20:27 +02:00
Qu Chen 7de6451818
Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666) 2021-03-24 08:41:05 -07:00
yoav-steinberg d026647f4f
Avoid evaluating log arguments when log filtered by level. (#8685) 2021-03-24 08:22:12 +02:00
Yossi Gottlieb c3df27d1ea
Fix slowdown due to child reporting CoW. (#8645)
Reading CoW from /proc/<pid>/smaps can be slow with large processes on
some platforms.

This measures the time it takes to read CoW info and limits the duty
cycle of future updates to roughly 1/100.

As current_cow_size no longer represnets a current, fixed interval value
there is also a new current_cow_size_age field that provides information
about the age of the size value, in seconds.
2021-03-22 13:25:58 +02:00
Huang Zhw a19c4058be
When tests exit normally, some processes may still be alive (#8647)
In certain scenario start_server may think it failed to start a redis server
although it started successfully. in these cases, it'll not terminate it, and
it'll remain running when the test is over.

In start_server if config doesn't have bind (the minimal.conf in introspection.tcl),
it will try to bind ipv4 and ipv6. One may success while other fails. It will
output "Could not create server TCP listening socket".
wait_server_started uses this message to check whether instance started
successfully. So it will consider that it failed even though redis started successfully.

Additionally, in some cases it wasn't clear to users why the server exited,
since the warning message printed to the log, could in some cases be harmless,
and in some cases fatal.

This PR adds makes a clear distinction between a warning log message and
a fatal one, and changes the test suite to look for the fatal message.
2021-03-16 17:25:30 +02:00
Yossi Gottlieb df5f543b65
Server won't start on alpine/libmusl without IPv6. (#8655)
listenToPort attempts to gracefully handle and ignore certain errors but does not store errno prior to logging, which in turn calls several libc functions that may overwrite errno.

This has been discovered due to libmusl strftime() always returning with errno set to EINVAL, which resulted with docker-library/redis#273.
2021-03-16 11:20:03 +02:00
Madelyn Olson e1d98bca5a
Redact slowlog entries for config with sensitive data. (#8584)
Redact config set requirepass/masterauth/masteruser from slowlog in addition to showing ACL commands without sensitive values.
2021-03-15 22:00:29 -07:00
guybe7 dba33a943d
Missing EXEC on modules propagation after failed EVAL execution (#8654)
1. moduleReplicateMultiIfNeeded should use server.in_eval like
   moduleHandlePropagationAfterCommandCallback
2. server.in_eval could have been set to 1 and not reset back
   to 0 (a lot of missed early-exits after in_eval is already 1)

Note: The new assertions in processCommand cover (2) and I added
two module tests to cover (1)

Implications:
If an EVAL that failed (and thus left server.in_eval=1) runs before a module
command that replicates, the replication stream will contain MULTI (because
moduleReplicateMultiIfNeeded used to check server.lua_caller which is NULL
at this point) but not EXEC (because server.in_eval==1)
This only affects modules as module.c the only user of server.in_eval.

Affects versions 6.2.0, 6.2.1
2021-03-15 21:19:57 +02:00
Guillem Jover 3a5905fa85
Send the readiness notification when we are ready to accept connections (#8409)
On a replica we do accept connections, even though commands accessing
the database will operate in read-only mode. But the server is still
already operational and processing commands.

Not sending the readiness notification means that on a HA setup where
the nodes all start as replicas (with replicaof in the config) with
a replica that cannot connect to the master server and which might not
come back in a predictable amount of time or at all, the service
supervisor will end up timing out the service and terminating it, with
no option to promote it to be the main instance. This seems counter to
what the readiness notification is supposed to be signaling.

Instead send the readiness notification when we start accepting
commands, and then send the various server status changes as that.

Fixes: commit 641c64ada1
Fixes: commit dfb598cf33
2021-03-14 08:46:26 +02:00
Oran Agra 4ae1bea323
allow RESET during busy scripts (#8629)
same as DISCARD, this command (RESET) is about client state, not server state
2021-03-11 07:52:27 +02:00
guybe7 3d0b427c30
Fix some issues with modules and MULTI/EXEC (#8617)
Bug 1:
When a module ctx is freed moduleHandlePropagationAfterCommandCallback
is called and handles propagation. We want to prevent it from propagating
commands that were not replicated by the same context. Example:
1. module1.foo does: RM_Replicate(cmd1); RM_Call(cmd2); RM_Replicate(cmd3)
2. RM_Replicate(cmd1) propagates MULTI and adds cmd1 to also_propagagte
3. RM_Call(cmd2) create a new ctx, calls call() and destroys the ctx.
4. moduleHandlePropagationAfterCommandCallback is called, calling
   alsoPropagates EXEC (Note: EXEC is still not written to socket),
   setting server.in_trnsaction = 0
5. RM_Replicate(cmd3) is called, propagagting yet another MULTI (now
   we have nested MULTI calls, which is no good) and then cmd3

We must prevent RM_Call(cmd2) from resetting server.in_transaction.
REDISMODULE_CTX_MULTI_EMITTED was revived for that purpose.

Bug 2:
Fix issues with nested RM_Call where some have '!' and some don't.
Example:
1. module1.foo does RM_Call of module2.bar without replication (i.e. no '!')
2. module2.bar internally calls RM_Call of INCR with '!'
3. at the end of module1.foo we call RM_ReplicateVerbatim

We want the replica/AOF to see only module1.foo and not the INCR from module2.bar

Introduced a global replication_allowed flag inside RM_Call to determine
whether we need to replicate or not (even if '!' was specified)

Other changes:
Split beforePropagateMultiOrExec to beforePropagateMulti afterPropagateExec
just for better readability
2021-03-10 18:02:17 +02:00
Oran Agra 7778f1b4b0
strip % sign from current_fork_perc info field (#8628)
`master_sync_perc` and `loading_loaded_perc` don't have that sign,
and i think the info field should be a raw floating point number (the name suggests its units).

we already have `used_memory_peak_perc` and `used_memory_dataset_perc` which do add the `%` sign, but:
1) i think it was a mistake but maybe too late to fix now, and maybe not too late to fix for `current_fork_perc`
2) it is more important to be consistent with the two other "progress" "prec" metrics, and not with the "utilization" metric.
2021-03-10 15:25:16 +02:00
sundb 95d6297db8
Add run all test support with define REDIS_TEST (#8570)
1. Add `redis-server test all` support to run all tests.
2. Add redis test to daily ci.
3. Add `--accurate` option to run slow tests for more iterations (so that
   by default we run less cycles (shorter time, and less prints).
4. Move dict benchmark to REDIS_TEST.
5. fix some leaks in tests
6. make quicklist tests run on a specific fill set of options rather than huge ranges
7. move some prints in quicklist test outside their loops to reduce prints
8. removing sds.h from dict.c since it is now used in both redis-server and
   redis-cli (uses hiredis sds)
2021-03-10 09:13:11 +02:00
Huang Zhw f491eee2a1
Cleanup: dictEncObjHash remove redundant conditional statement (#8488) 2021-03-07 18:09:12 +02:00
yoav-steinberg fba391ae66
Fix client_recent_max_input/output_buffer in 'INFO CLIENTS' when all clients drop. (#8588)
When no connected clients variables stopped being updated instead of being zeroed over time.

The other changes are cleanup and optimization (avoiding an unnecessary division per client)
2021-03-03 09:36:27 +02:00
YaacovHazan a031d268b1
Make port, tls-port and bind configurations modifiable (#8510)
Add ability to modify port, tls-port and bind configurations by CONFIG SET command.

To simplify the code and make it cleaner, a new structure
added, socketFds, which contains the file descriptors array and its counter,
and used for TCP, TLS and Cluster sockets file descriptors.
2021-03-01 16:04:44 +02:00
Madelyn Olson 4a474843fb
Moved requirepass and querybuf length to generic configs (#8557)
Moved additional configs to generic infrastructure.
2021-02-25 21:00:27 -08:00
guybe7 aba61d9ebd
XTRIM should take at least 4 args (#8549)
Fix command arity.
Other than key name it must at least take the trimming strategy and it's limit
2021-02-24 16:41:00 +02:00
Yossi Gottlieb d828f90c26
Fix allowed length for REPLCONF ip-address. (#8517)
Originally this was limited to IPv6 address length, but effectively it
has been used for host names and now that Sentinel accepts that as well
we need to be able to store full hostnames.

Fixes #8507
2021-02-21 11:22:36 +02:00
Harkrishn Patro 303465af35
Remove redundant pubsub list to store the patterns. (#8472)
Remove redundant pubsub list to store the patterns.
2021-02-17 14:13:50 -08:00
uriyage fd052d2a86
Adds INFO fields to track fork child progress (#8414)
* Adding current_save_keys_total and current_save_keys_processed info fields.
  Present in replication, BGSAVE and AOFRW.
* Changing RM_SendChildCOWInfo() to RM_SendChildHeartbeat(double progress)
* Adding new info field current_fork_perc. Present in Replication, BGSAVE, AOFRW,
  and module forks.
2021-02-16 16:06:51 +02:00
Yossi Gottlieb b1929bb2b6
Fix incorrect shared ping length. (#8498) 2021-02-15 17:09:33 +02:00
Yossi Gottlieb 141ac8df59
Escape unsafe field name characters in INFO. (#8492)
Fixes #8489
2021-02-15 17:08:53 +02:00
Madelyn Olson 899c85ae67
Moved most static strings into the shared structure (#8411)
Moved most static strings into the shared structure
2021-02-09 11:52:28 -08:00
filipe oliveira f0c5052aa8
Enabled background and reply time tracking on blocked on keys/blocked on background work clients (#7491)
This commit enables tracking time of the background tasks and on replies,
opening the door for properly tracking commands that rely on blocking / background
 work via the slowlog, latency history, and commandstats. 

Some notes:
- The time spent blocked waiting for key changes, or blocked on synchronous
  replication is not accounted for. 

- **This commit does not affect latency tracking of commands that are non-blocking
  or do not have background work.** ( meaning that it all stays the same with exception to
  `BZPOPMIN`,`BZPOPMAX`,`BRPOP`,`BLPOP`, etc... and module's commands that rely
  on background threads ). 

-  Specifically for latency history command we've added a new event class named
  `command-unblocking` that will enable latency monitoring on commands that spawn
  background threads to do the work.

- For blocking commands we're now considering the total time of a command as the
  time spent on call() + the time spent on replying when unblocked.

- For Modules commands that rely on background threads we're now considering the
  total time of a command as the time spent on call (main thread) + the time spent on
  the background thread ( if marked within `RedisModule_MeasureTimeStart()` and
  `RedisModule_MeasureTimeEnd()` ) + the time spent on replying (main thread)

To test for this feature we've added a `unit/moduleapi/blockonbackground` test that relies on
a module that blocks the client and sleeps on the background for a given time. 
- check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time
- check blocked command that uses RedisModule_MeasureTimeStart() is tracking background time even in timeout
- check blocked command with multiple calls RedisModule_MeasureTimeStart()  is tracking the total background time
- check blocked command without calling RedisModule_MeasureTimeStart() is not reporting background time
2021-01-29 15:38:30 +02:00
Yang Bodong b9a0500f16
Add HRANDFIELD and ZRANDMEMBER. improvements to SRANDMEMBER (#8297)
New commands:
`HRANDFIELD [<count> [WITHVALUES]]`
`ZRANDMEMBER [<count> [WITHSCORES]]`
Algorithms are similar to the one in SRANDMEMBER.

Both return a simple bulk response when no arguments are given, and an array otherwise.
In case values/scores are requested, RESP2 returns a long array, and RESP3 a nested array.
note: in all 3 commands, the only option that also provides random order is the one with negative count.

Changes to SRANDMEMBER
* Optimization when count is 1, we can use the more efficient algorithm of non-unique random
* optimization: work with sds strings rather than robj

Other changes:
* zzlGetScore: when zset needs to convert string to double, we use safer memcpy (in
  case the buffer is too small)
* Solve a "bug" in SRANDMEMBER test: it intended to test a positive count (case 3 or
  case 4) and by accident used a negative count

Co-authored-by: xinluton <xinluton@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-29 10:47:28 +02:00
zhaozhao.zz 49b3663332
AOF: recover from last write error after turn on appendonly again (#8030)
The key point is how to recover from last AOF write error, for example:

1. start redis with appendonly yes, and append some write commands

2. short write or something else error happen, `server.aof_last_write_status` changed to `C_ERR`, now redis doesn't accept write commands

3. execute `CONFIG SET appendonly no` to avoid the above problem, now redis can accept write commands again

4. disk error resolved, and execute `CONFIG SET appendonly yes` to reopen AOF, but `server.aof_last_write_status` cannot be changed to `C_OK` (if background aof rewrite run less then 1 second, it will free `server.aof_buf` and then serverCron cannot fix `aof_last_write_status`), then redis cannot accept write commands forever.

This PR use a simple way to fix it:

1. just free `server.aof_buf` when stop appendonly to save memory, if error happens in `flushAppendOnlyFile(1)`, the `server.aof_buf` may contains some data which has not be written to aof, I think we can ignore it because we turn off the appendonly.

2. reset fsync status after stop appendonly and call `flushAppendOnlyFile` only when `aof_state` is ON

3. reset `server.last_write_status` when reopen aof to accept write commands
2021-01-29 14:35:10 +08:00
Allen Farris 0d18a1e85f
implement FAILOVER command (#8315)
Implement FAILOVER command, which coordinates failover
between the server and one of its replicas.
2021-01-28 13:18:05 -08:00
Yossi Gottlieb 4bb5ccbefb
Add proc-title-template option. (#8397)
Make it possible to customize the process title, i.e. include custom
strings, immutable configuration like port, tls-port, unix socket name,
etc.
2021-01-28 18:17:39 +02:00
guybe7 01cbf17ba2
Modules: Add event for fork child birth and termination (#8289)
Useful to avoid doing background jobs that can cause excessive COW
2021-01-28 16:38:49 +02:00
Z. Liu 17b34c7309
Add 'set-proc-title' config so that this mechanism can be disabled (#3623)
if option `set-proc-title' is no, then do nothing for proc title.

The reason has been explained long ago, see following:

We update redis to 2.8.8, then found there are some side effect when
redis always change the process title.

We run several slave instance on one computer, and all these salves
listen on unix socket only, then ps will show:

  1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server *:0
  1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server *:0

for redis 2.6 the output of ps is like following:

  1 S redis 18036 1 0 80 0 - 56130 ep_pol 14:02 ? 00:00:31 /usr/sbin/redis-server /etc/redis/a.conf
  1 S redis 23949 1 0 80 0 - 11074 ep_pol 15:41 ? 00:00:00 /usr/sbin/redis-server /etc/redis/b.conf

Later is more informational in our case. The situation
is worse when we manage the config and process running
state by salt. Salt check the process by running "ps |
grep SIG" (for Gentoo System) to check the running
state, where SIG is the string to search for when
looking for the service process with ps. Previously, we
define sig as "/usr/sbin/redis-server
/etc/redis/a.conf". Since the ps output is identical for
our case, so we have no way to check the state of
specified redis instance.

So, for our case, we prefer the old behavior, i.e, do
not change the process title for the main redis process.
Or add an option such as "set-proc-title [yes|no]" to
control this behavior.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-28 11:12:39 +02:00
Yossi Gottlieb 3a5049042a
Avoid assertions when testing arm64 cow bug. (#8405)
At least in one case the arm64 cow kernel bug test triggers an assert, which is a problem because it cannot be ignored like cases where the bug is found.

On older systems (Linux <4.5) madvise fails because MADV_FREE is not supported. We treat these failures as an indication the system is not affected.

Fixes #8351, #8406
2021-01-28 10:33:45 +02:00
Raghav Muddur 0367a80819
GETEX, GETDEL and SET PXAT/EXAT (#8327)
This commit introduces two new command and two options for an existing command

GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp]
[PXAT milliseconds-timestamp]

The getexCommand() function implements extended options and variants of the GET
command. Unlike GET command this command is not read-only. Only one of the options
can be used at a given time.

1. PERSIST removes any TTL associated with the key.
2. EX Set expiry TTL in seconds.
3. PX Set expiry TTL in milliseconds.
4. EXAT Same like EX instead of specifying the number of seconds representing the
    TTL (time to live), it takes an absolute Unix timestamp
5. PXAT Same like PX instead of specifying the number of milliseconds representing the
    TTL (time to live), it takes an absolute Unix timestamp

Command would return either the bulk string, error or nil.

GETDEL <key>
Would delete the key after getting.

SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>]
[EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>]

Two new options added here are EXAT and PXAT

Key implementation notes
- `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is
  specified (`PX/EX`), replication will always use `PX`.
- `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile`
  as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of
  correct AOF translation.
- `GETEX` without any optional argument behaves like `GET`.
- `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`.
- `GETDEL` command is propagated as `DEL`
- Combined the validation for `SET` and `GETEX` arguments. 
- Test cases to validate AOF/Replication propagation
2021-01-27 19:47:26 +02:00
Wen Hui 1aad55b66f
Sentinel: Fix Config Dependency and Rewrite Sequence (#8271)
This commit fixes a well known and an annoying issue in Sentinel mode.

Cause of this issue:
Currently, Redis rewrite process works well in server mode, however in sentinel mode,
the sentinel config has variant semantics for different configurations, in example configuration
https://github.com/redis/redis/blob/unstable/sentinel.conf, we put comments on these.
However the rewrite process only treat the sentinel config as a single option. During rewrite
process, it will mess up with the lines and comments.

Approaches:
In order to solve this issue, we need to differentiate different subconfig options in sentinel separately,
for example, sentinel monitor <master-name> <ip> <redis-port> <quorum>
we can treat it as sentinel monitor option, instead of the sentinel option.

This commit also fixes the dependency issue when putting configurations in sentinel.conf.
For example before this commit,we must put
`sentinel monitor <master-name> <ip> <redis-port> <quorum>` before
`sentinel auth-pass <master-name> <password>` for a single master,
otherwise the server cannot start and will return error. This commit fixes this issue, as long as
the monitoring master was configured, no matter the sequence is, the sentinel can start and run properly.
2021-01-26 09:31:54 +02:00
Yossi Gottlieb b548ffabbe
CONFIG REWRITE should honor umask settings. (#8371)
Fixes a regression introduced due to a new (safer) way of rewriting configuration files. In the past the file was simply overwritten (same inode), but now Redis creates a new temporary file and later renames it over the old one.

The temp file typically gets created with 0600 permissions so we later fchmod it to fix that. Unlike open with O_CREAT, fchmod doesn't consider umask so we have to do that explicitly.

Fixes #8369
2021-01-20 21:57:24 +02:00
Guy Korland ac5f21d613
Add CI for FreeBSD (#8292)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-20 14:07:09 +02:00
guybe7 baf92f3f1c
Fix firstkey,lastkey,step in COMMAND command for some commands (#8367)
The output for COMMAND command was wrong for some commands.
clients can use firstkey,lastkey,step to find (some) key name arguments, and the
"movablekeys" flag to know that they can't know all (or any) of the key name arguments.

These commands had the wrong output:
1. GEORADIUS*_RO used to have "movablekeys" (which it doesn't really need)
2. XREAD and XREADGROUP used to have (1,1,1). but that's completely wrong.
3. Z*STORE used to have (0,0,0) but it can at lest give the index of the dstkey (1,1,1)
2021-01-20 13:56:45 +02:00
Andy Pan fb66e2e249
Use FD_CLOEXEC in Sentinel, so that FDs don't leak to the scripts it runs (#8242)
Sentinel uses execve to run scripts, so it needs to use FD_CLOEXEC
on all file descriptors, so that they're not accessible by the script it runs.

This commit includes a change to the sentinel tests, which verifies no
FDs are left opened when the script is executed.
2021-01-19 22:57:30 +02:00
sundb 593cde16bc
Fix units in log message of copy-on-write (#8320) 2021-01-13 11:38:02 +02:00
Madelyn Olson b24b490393
Fix issues in wait test (#8310)
This fixes three issues:
1.  Using debug SLEEP was impacting the subsequent test, and causing it to pass reliably even though it should have failed. There was exactly 5 seconds of artificial pause (after 1000, wait 3000, wait 1000) between the debug sleep 5 and when we needed to unblock the client in the subsequent test. Now the test properly makes sure the client is unblocked, and the subsequent test is fixed.
2. Minor, the client pause types were using & comparisons instead of ==, since it was previously a flag.
3. Test is faster now that some of the hand wavy time is removed.
2021-01-12 09:46:24 +02:00
Madelyn Olson 47579bdf5c
Add support for client pause WRITE (#8170)
Implementation of client pause WRITE and client unpause
2021-01-07 23:36:54 -08:00
George Prekas b02780c41d
Add check for the MADV_FREE/fork arm64 Linux kernel bug (#8224)
Older arm64 Linux kernels have a bug that could lead to data corruption during
background save under the following scenario:

1) jemalloc uses MADV_FREE on a page,
2) jemalloc reuses and writes the page,
3) Redis forks the background save process, and
4) Linux performs page reclamation.

Under these conditions, Linux will reclaim the page wrongfully and the
background save process will read zeros when it tries to read the page.

The bug has been fixed in Linux with commit:
ff1712f953e27f0b0718762ec17d0adb15c9fd0b ("arm64: pgtable: Ensure dirty bit is
preserved across pte_wrprotect()")

This Commit adds an ignore-warnings config, when not found, redis will
print a warning and exit on startup (default behavior).

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-07 17:06:05 +02:00
YaacovHazan ea930a352c Report child copy-on-write info continuously
Add INFO field, rdb_active_cow_size, to report COW of a live fork child while
it's active.
- once in 1024 keys check the time, and if there's more than one second since
  the last report send a report to the parent via the pipe.
- refactor the child_info_data struct, it's an implementation detail that
  shouldn't be in the server struct, and not used to communicate data between
  caller and callee
- remove the magic value from that struct (not sure what it was good for), and
  instead add handling of short reads.
- add another value to the structure, cow_type, to indicate if the report is
  for the new rdb_active_cow_size field, or it's the last report of a
  successful operation
- add new Module API to report the active COW
- add more asserts variants to test.tcl
2021-01-07 16:14:29 +02:00
YaacovHazan f9dacf8aac Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
  child.
- redisFork is now responsible of updating the server struct with the pid,
  which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
  repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
  Module), and one that can create several forks to coexist in parallel (LDB,
  but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
  unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
  doesn't clear the pid variable or call wait4, so checkChildrenDone does
  the cleanup for it.
  This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
  updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2021-01-07 16:14:29 +02:00
Jonah H. Harris b5029dfdad
Add ZRANGESTORE command, and improve ZSTORE command (#7844)
Add ZRANGESTORE command, and improve ZSTORE command to deprecated Z[REV]RANGE[BYSCORE|BYLEX].

Syntax for the new ZRANGESTORE command:
ZRANGESTORE [BYSCORE | BYLEX] [REV] [LIMIT offset count]

New syntax for ZRANGE:
ZRANGE [BYSCORE | BYLEX] [REV] [WITHSCORES] [LIMIT offset count]

Old syntax for ZRANGE:
ZRANGE [WITHSCORES]

Other ZRANGE commands remain unchanged.

The implementation uses common code for all of these, by utilizing a consumer interface that in one
command response to the client, and in the other command stores a zset key.

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-07 10:58:53 +02:00
guybe7 714e103ac3
Add XAUTOCLAIM (#7973)
New command: XAUTOCLAIM <key> <group> <consumer> <min-idle-time> <start> [COUNT <count>] [JUSTID]

The purpose is to claim entries from a stale consumer without the usual
XPENDING+XCLAIM combo which takes two round trips.

The syntax for XAUTOCLAIM is similar to scan: A cursor is returned (streamID)
by each call and should be used as start for the next call. 0-0 means the scan is complete.

This PR extends the deferred reply mechanism for any bulk string (not just counts)

This PR carries some unrelated test code changes:
- Renames the term "client" into "consumer" in the stream-cgroups test
- And also changes DEBUG SLEEP into "after"

Co-authored-by: Oran Agra <oran@redislabs.com>
2021-01-06 10:34:27 +02:00
Itamar Haber 9dcdc7e79a
HELP subcommand, continued (#5531)
* man-like consistent long formatting
* Uppercases commands, subcommands and options
* Adds 'HELP' to HELP for all
* Lexicographical order
* Uses value notation and other .md likeness
* Moves const char *help to top
* Keeps it under 80 chars
* Misc help typos, consistent conjuctioning (i.e return and not returns)
* Uses addReplySubcommandSyntaxError(c) all over

Signed-off-by: Itamar Haber <itamar@redislabs.com>
2021-01-04 17:02:57 +02:00
Meir Shpilraien (Spielrein) ecd5351870
Fix assertion on loading AOF with timed out script. (#8284)
If AOF file contains a long Lua script that timed out, then the `evalCommand` calls
`blockingOperationEnds` which sets `server.blocked_last_cron` to 0. later on,
the AOF `whileBlockedCron` function asserts that this value is not 0.

The fix allows nesting call to `blockingOperationStarts` and `blockingOperationEnds`.

The issue was first introduce in this commit: 9ef8d2f67 (Redis 6.2 RC1)
2021-01-04 13:42:17 +02:00
Oran Agra 7896debe6c
Fix rare assertion as a result of: active defrag while loading (#8281)
In #7726 (part of 6.2), we added a mechanism for whileBlockedCron, this
mechanism has an assertion to make sure the timestamp in
whileBlockedCron was always set correctly before the blocking operation
starts.

I now found (thanks to our CI) two bugs in that area:
1) CONFIG RESETSTAT (if it was allowed during loading) would have
   cleared this var
2) the call stopLoading (which calls whileBlockedCron) was made too
   early, while the rio is still in use, in which case the update_cksum
   (rdbLoadProgressCallback) may still be called and whileBlockedCron
   can assert.
2021-01-03 16:09:29 +02:00
Oran Agra 71fbe6e800
Fix leak in new errorstats commit, and a flaky test (#8278) 2021-01-02 08:37:19 +02:00
filipe oliveira 90b9f08e5d
Add errorstats info section, Add failed_calls and rejected_calls to commandstats (#8217)
This Commit pushes forward the observability on overall error statistics and command statistics within redis-server:

It extends INFO COMMANDSTATS to have
- failed_calls in - so we can keep track of errors that happen from the command itself, broken by command.
- rejected_calls - so we can keep track of errors that were triggered outside the commmand processing per se

Adds a new section to INFO, named ERRORSTATS that enables keeping track of the different errors that
occur within redis ( within processCommand and call ) based on the reply Error Prefix ( The first word
after the "-", up to the first space ).

This commit also fixes RM_ReplyWithError so that it can be correctly identified as an error reply.
2020-12-31 16:53:43 +02:00
Oran Agra 049cf8cdf4
Fix memory leaks in error replies due to recent change (#8249)
Recently efaf09ee4 started using addReplyErrorSds in place of
addReplySds the later takes ownership of the string but the former did
not.
This introduced memory leaks when a script returns an error to redis,
and also in clusterRedirectClient (two new usages of
addReplyErrorSds which was mostly unused till now.

This commit chagnes two thanks.
1. change addReplyErrorSds to take ownership of the error string.
2. scripting.c doesn't actually need to use addReplyErrorSds, it's a
perfect match for addReplyErrorFormat (replaces newlines with spaces)
2020-12-27 21:40:12 +02:00
Oran Agra 19d4705ffd
Make the protocol-version argument of HELLO optional (#7377) 2020-12-27 16:37:27 +02:00
Itamar Haber f44186e575
Adds count to L/RPOP (#8179)
Adds: `L/RPOP <key> [count]`

Implements no. 2 of the following strategies:

1. Loop on listTypePop - this would result in multiple calls for memory freeing and allocating (see 769167a079)
2. Iterate the range to build the reply, then call quickListDelRange - this requires two iterations and **is the current choice**
3. Refactor quicklist to have a pop variant of quickListDelRange - probably optimal but more complex

Also:
* There's a historical check for NULL after calling listTypePop that was converted to an assert.
* This refactors common logic shared between LRANGE and the new form of LPOP/RPOP into addListRangeReply (adds test for b/w compat)
* Consequently, it may have made sense to have `LRANGE l -1 -2` and `LRANGE l 9 0` be legit and return a reverse reply. Due to historical reasons that would be, however, a breaking change.
* Added minimal comments to existing commands to adhere to the style, make core dev life easier and get commit karma, naturally.
2020-12-25 21:49:24 +02:00
xhe 2e8f8c9b0c HELLO without protover
Signed-off-by: xhe <xw897002528@gmail.com>
2020-12-24 13:35:41 +08:00
Madelyn Olson efaf09ee4b
Flow through the error handling path for most errors (#8226)
Properly throw errors for invalid replication stream and support https://github.com/redis/redis/pull/8217
2020-12-23 19:06:25 -08:00
Greg Femec 266949c7fc
Fix random element selection for large hash tables. (#8133)
When a database on a 64 bit build grows past 2^31 keys, the underlying hash table expands to 2^32 buckets. After this point, the algorithms for selecting random elements only return elements from half of the available buckets because they use random() which has a range of 0 to 2^31 - 1. This causes problems for eviction policies which use dictGetSomeKeys or dictGetRandomKey. Over time they cause the hash table to become unbalanced because, while new keys are spread out evenly across all buckets, evictions come from only half of the available buckets. Eventually this half of the table starts to run out of keys and it takes longer and longer to find candidates for eviction. This continues until no more evictions can happen.

This solution addresses this by using a 64 bit PRNG instead of libc random().

Co-authored-by: Greg Femec <gfemec@google.com>
2020-12-23 15:52:07 +02:00
Oran Agra 2426aaa099
fix valgrind warning created by recent pidfile fix (#8235)
This isn't a leak, just an warning due to unreachable
allocation on the fork child.
Problem created by 92a483b
2020-12-23 12:55:05 +02:00
Meir Shpilraien (Spielrein) 92a483bca2
Fix issue where fork process deletes the parent pidfile (#8231)
Turns out that when the fork child crashes, the crash log was deleting
the pidfile from the disk (although the parent is still running.

Now we set the pidfile of the fork process to NULL so the fork process
will never deletes it.
2020-12-22 15:17:39 +02:00
Oran Agra 411c18bbce
Remove read-only flag from non-keyspace cmds, different approach for EXEC to propagate MULTI (#8216)
In the distant history there was only the read flag for commands, and whatever
command that didn't have the read flag was a write one.
Then we added the write flag, but some portions of the code still used !read
Also some commands that don't work on the keyspace at all, still have the read
flag.

Changes in this commit:
1. remove the read-only flag from TIME, ECHO, ROLE and LASTSAVE

2. EXEC command used to decides if it should propagate a MULTI by looking at
   the command flags (!read & !admin).
   When i was about to change it to look at the write flag instead, i realized
   that this would cause it not to propagate a MULTI for PUBLISH, EVAL, and
   SCRIPT, all 3 are not marked as either a read command or a write one (as
   they should), but all 3 are calling forceCommandPropagation.

   So instead of introducing a new flag to denote a command that "writes" but
   not into the keyspace, and still needs propagation, i decided to rely on
   the forceCommandPropagation, and just fix the code to propagate MULTI when
   needed rather than depending on the command flags at all.

   The implication of my change then is that now it won't decide to propagate
   MULTI when it sees one of these: SELECT, PING, INFO, COMMAND, TIME and
   other commands which are neither read nor write.

3. Changing getNodeByQuery and clusterRedirectBlockedClientIfNeeded in
   cluster.c to look at !write rather than read flag.
   This should have no implications, since these code paths are only reachable
   for commands which access keys, and these are always marked as either read
   or write.

This commit improve MULTI propagation tests, for modules and a bunch of
other special cases, all of which used to pass already before that commit.
the only one that test change that uncovered a change of behavior is the
one that DELs a non-existing key, it used to propagate an empty
multi-exec block, and no longer does.
2020-12-22 12:03:49 +02:00
valentinogeron 4c13945c37
Fix PFDEBUG commands flag (#8222)
- Mark it as a @hyperloglog command (ACL)
- Should not be allowed in OOM
- Add firstkey, lastkey, step
- Add comment that explains the 'write' flag
2020-12-21 15:40:20 +02:00
sundb 51eb0da824
Fix command reset's arity (#8212) 2020-12-18 14:55:39 +02:00
filipe oliveira 19d46f8f2f
Expose Redis main thread cpu time, and scrape system time via INFO command. (#8132)
Exposes the main thread CPU info via info modules ( linux specific only )
(used_cpu_sys_main_thread and used_cpu_user_main_thread). This is important for:

- distinguish between main thread and io-threads cpu time total cpu time consumed ( check
  what is the first bottleneck on the used config )
- distinguish between main thread and modules threads total cpu time consumed

Apart from it, this commit also exposes the server_time_usec within the Server section so that we can
properly differentiate consecutive collection and calculate for example the CPU% and or / cpu time vs
wall time, etc...
2020-12-13 19:14:46 +02:00
Wang Yuan e3ff414513
Add total_forks to INFO STATS (#8155) 2020-12-13 10:01:18 +02:00
杨博东 4d06d99bf8
Add GEOSEARCH / GEOSEARCHSTORE commands (#8094)
Add commands to query geospatial data with bounding box.

Two new commands that replace the existing 4 GEORADIUS* commands.

GEOSEARCH key [FROMMEMBER member] [FROMLOC long lat] [BYRADIUS radius
unit] [BYBOX width height unit] [WITHCORD] [WITHDIST] [WITHASH] [COUNT
count] [ASC|DESC]

GEOSEARCHSTORE dest_key src_key [FROMMEMBER member] [FROMLOC long lat]
[BYRADIUS radius unit] [BYBOX width height unit] [WITHCORD] [WITHDIST]
[WITHASH] [COUNT count] [ASC|DESC] [STOREDIST]

- Add two types of CIRCULAR_TYPE and RECTANGLE_TYPE to achieve different searches
- Judge whether the point is within the rectangle, refer to:
geohashGetDistanceIfInRectangle
2020-12-12 02:21:05 +02:00
Oran Agra c31055db61 Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
2020-12-06 14:54:34 +02:00
Oran Agra ca1c182567 Sanitize dump payload: ziplist, listpack, zipmap, intset, stream
When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.

We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.

configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]

For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.

changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
  slowed down by sanitation.
2020-12-06 14:54:34 +02:00
guybe7 1df5bb5687
Make sure we do not propagate nested MULTI/EXEC (#8097)
One way this was happening is when a module issued an RM_Call which would inject MULTI.
If the module command that does that was itself issued by something else that already did
added MULTI (e.g. another module, or a Lua script), it would have caused nested MULTI.

In fact the MULTI state in the client or the MULTI_EMITTED flag in the context isn't
the right indication that we need to propagate MULTI or not, because on a nested calls
(possibly a module action called by a keyspace event of another module action), these
flags aren't retained / reflected.

instead there's now a global propagate_in_transaction flag for that.

in addition to that, we now have a global in_eval and in_exec flags, to serve the flags
of RM_GetContextFlags, since their dependence on the current client is wrong for the same
reasons mentioned above.
2020-12-06 13:14:18 +02:00
Wang Yuan 75f9dec644
Limit the main db and expires dictionaries to expand (#7954)
As we know, redis may reject user's requests or evict some keys if
used memory is over maxmemory. Dictionaries expanding may make
things worse, some big dictionaries, such as main db and expires dict,
may eat huge memory at once for allocating a new big hash table and be
far more than maxmemory after expanding.
There are related issues: #4213 #4583

More details, when expand dict in redis, we will allocate a new big
ht[1] that generally is double of ht[0], The size of ht[1] will be
very big if ht[0] already is big. For db dict, if we have more than
64 million keys, we need to cost 1GB for ht[1] when dict expands.

If the sum of used memory and new hash table of dict needed exceeds
maxmemory, we shouldn't allow the dict to expand. Because, if we
enable keys eviction, we still couldn't add much more keys after
eviction and rehashing, what's worse, redis will keep less keys when
redis only remains a little memory for storing new hash table instead
of users' data. Moreover users can't write data in redis if disable
keys eviction.

What this commit changed ?

Add a new member function expandAllowed for dict type, it provide a way
for caller to allow expand or not. We expose two parameters for this
function: more memory needed for expanding and dict current load factor,
users can implement a function to make a decision by them.
For main db dict and expires dict type, these dictionaries may be very
big and cost huge memory for expanding, so we implement a judgement
function: we can stop dict to expand provisionally if used memory will
be over maxmemory after dict expands, but to guarantee the performance
of redis, we still allow dict to expand if dict load factor exceeds the
safe load factor.
Add test cases to verify we don't allow main db to expand when left
memory is not enough, so that avoid keys eviction.

Other changes:

For new hash table size when expand. Before this commit, the size is
that double used of dict and later _dictNextPower. Actually we aim to
control a dict load factor between 0.5 and 1.0. Now we replace *2 with
+1, since the first check is that used >= size, the outcome of before
will usually be the same as _dictNextPower(used+1). The only case where
it'll differ is when dict_can_resize is false during fork, so that later
the _dictNextPower(used*2) will cause the dict to jump to *4 (i.e.
_dictNextPower(1025*2) will return 4096).
Fix rehash test cases due to changing algorithm of new hash table size
when expand.
2020-12-06 11:53:04 +02:00
Itamar Haber c1b1e8c329
Adds pub/sub channel patterns to ACL (#7993)
Fixes #7923.

This PR appropriates the special `&` symbol (because `@` and `*` are taken),
followed by a literal value or pattern for describing the Pub/Sub patterns that
an ACL user can interact with. It is similar to the existing key patterns
mechanism in function (additive) and implementation (copy-pasta). It also adds
the allchannels and resetchannels ACL keywords, naturally.

The default user is given allchannels permissions, whereas new users get
whatever is defined by the acl-pubsub-default configuration directive. For
backward compatibility in 6.2, the default of this directive is allchannels but
this is likely to be changed to resetchannels in the next major version for
stronger default security settings.

Unless allchannels is set for the user, channel access permissions are checked
as follows :
* Calls to both PUBLISH and SUBSCRIBE will fail unless a pattern matching the
  argumentative channel name(s) exists for the user.
* Calls to PSUBSCRIBE will fail unless the pattern(s) provided as an argument
  literally exist(s) in the user's list.

Such failures are logged to the ACL log.

Runtime changes to channel permissions for a user with existing subscribing
clients cause said clients to disconnect unless the new permissions permit the
connections to continue. Note, however, that PSUBSCRIBErs' patterns are matched
literally, so given the change bar:* -> b*, pattern subscribers to bar:* will be
disconnected.

Notes/questions:
* UNSUBSCRIBE, PUNSUBSCRIBE and PUBSUB remain unprotected due to lack of reasons
  for touching them.
2020-12-01 14:21:39 +02:00
Oran Agra bbc2c44541
INFO client_recent_max_input_buffer includes argv array (#8065)
this metric already includes the argv bytes, like what clientsCronTrackClientsMemUsage does, but it's missing the array itself.

p.s. For the purpose of tracking expensive clients we don't need to include the size of the client struct and the static reply buffer in it.
2020-11-25 23:39:01 +02:00
Oran Agra e6fa47380a
Fix bug with module GIL being released prematurely (#8061)
This is hopefully usually harmles.
The server.ready_keys will usually be empty so the code after releasing
the GIL will soon be done.
The only case where it'll actually process things is when a module
releases a client (or module) blocked on a key, by triggering this NOT
from within a command (e.g. a timer event).

This bug was introduced in redis 6.0.9, see #7903
2020-11-22 14:00:51 +02:00
Oran Agra 61954951ed
Fix oom-score-adj-values range, abs options, and bug when used in config file (#8046)
Fix: When oom-score-adj-values is provided in the config file after
oom-score-adj yes, it'll take an immediate action, before
readOOMScoreAdj was acquired, resulting in an error (out of range score
due to uninitialized value. delay the reaction the real call is made by
main().

Since the values are clamped to -1000..1000, and they're
applied as an offset from the value at startup (which may be -1000), we
need to allow the offsets to reach to +2000 so that a value of +1000 is
achievable in case the value at startup was -1000.

Adding an option for absolute values rather than relative ones.
2020-11-22 13:57:56 +02:00
Meir Shpilraien (Spielrein) d87a0d0286
Unified MULTI, LUA, and RM_Call with respect to blocking commands (#8025)
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.

Today, LUA and MULTI have a special (and different) treatment to blocking commands:

LUA   - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.

Disadvantages of the current state are:

No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.

This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).

The new flag is checked on the following commands:

List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).

To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.

To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.

The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.

Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-17 18:58:55 +02:00
Yossi Gottlieb d638b05834
Improve and clean up supervised process support. (#8036)
* Configuration file default should now be "auto".
* Expose "process_supervised" as an info field.
* Log messages improvements (clarify required systemd config, report
  auto-detected supervision mode, etc.)
* Set server.supervised properly, so it can take precedence of
  "daemonize" configuration.
* Produce clear warning if systemd is detected/requested but executable
  is compiled without support for it, instead of silently ignoring.
* Handle systemd notification error on startup, and turn off supervised
  mode if it failed.
2020-11-17 12:52:49 +02:00
swamp0407 ea7cf737a1
Add COPY command (#7953)
Syntax:
COPY <key> <new-key> [DB <dest-db>] [REPLACE]

No support for module keys yet.

Co-authored-by: tmgauss
Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-17 12:03:05 +02:00
chenyangyang c1aaad06d8
Modules callbacks for lazy free effort, and unlink (#7912)
Add two optional callbacks to the RedisModuleTypeMethods structure, which is `free_effort`
and `unlink`. the `free_effort` callback indicates the effort required to free a module memory.
Currently, if the effort exceeds LAZYFREE_THRESHOLD, the module memory may be released
asynchronously. the `unlink` callback indicates the key has been removed from the DB by redis, and
may soon be freed by a background thread.

Add `lazyfreed_objects` info field, which represents the number of objects that have been
lazyfreed since redis was started.

Add `RM_GetTypeMethodVersion` API, which return the current redis-server runtime value of
`REDISMODULE_TYPE_METHOD_VERSION`. You can use that when calling `RM_CreateDataType` to know
which fields of RedisModuleTypeMethods are gonna be supported and which will be ignored.
2020-11-16 10:34:04 +02:00
Felipe Machado d8fd48c436
Add new commands ZDIFF and ZDIFFSTORE (#7961)
- Add ZDIFF and ZDIFFSTORE which work similarly to SDIFF and SDIFFSTORE
- Make sure the new WITHSCORES argument that was added for ZUNION isn't considered valid for ZUNIONSTORE

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-11-15 14:14:25 +02:00
Madelyn Olson 3feff7d78a
Rewritten commands are logged as their original command (#8006)
* Rewritten commands are logged as their original command

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2020-11-10 13:50:03 -08:00
Oran Agra 7ace7231c6
Better INFO fields to track diskless and disk-based replication progress (#7981)
Expose new `loading_rdb_used_mem` showing the used memory of the server
that saved the RDB file we're currently using.
This is useful in diskless replication when the total size of the rdb is
unkown, and can be used as a rought estimation of progres.

Use that new field to calculate the "user friendly"
`loading_loaded_perc` and `loading_eta_seconds`.

Expose `master_sync_total_bytes` and `master_sync_total_bytes` to complement
on the existing `master_sync_total_bytes` (which cannot be used on its own
to calculate progress).

Add "user friendly" field for `master_sync_perc`
2020-11-05 11:46:16 +02:00
Yossi Gottlieb 1fd456f91a
Add RESET command. (#7982)
Perform full reset of all client connection states, is if the client was
disconnected and re-connected. This affects:

* MULTI state
* Watched keys
* MONITOR mode
* Pub/Sub subscription
* ACL/Authenticated state
* Client tracking state
* Cluster read-only/asking state
* RESP version (reset to 2)
* Selected database
* CLIENT REPLY state

The response is +RESET to make it easily distinguishable from other
responses.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-05 10:51:26 +02:00
Oran Agra a698a6391a
Add maxclients and cluster_connections to INFO CLIENTS (#7979)
Few config settings are also reflected by the INFO command.
these are mainly ones that are important for either an instant view of
the server status (to compare a metric to it's limit config),
Important configurations that are necessary in the crash log (which
currently doesn't print the config),
And things that are important for monitoring solutions (such as
Prometheus), which rely on INFO to collect their data.

Add cluster_connections to INFO CLUSTER:
This makes it possible to be combined together with connected_clients
and connected_slaves and be matched against maxclients
2020-11-04 09:53:43 +02:00
Wang Yuan 89c78a9808
Disable rehash when redis has child process (#8007)
In redisFork(), we don't set child pid, so updateDictResizePolicy()
doesn't take effect, that isn't friendly for copy-on-write.

The bug was introduced this in redis 6.0: 56258c6
2020-11-03 17:16:11 +02:00
Meir Shpilraien (Spielrein) f210e197f3
Added crash report on SIGABRT (#8004)
The reason that we want to get a full crash report on SIGABRT
is that the jmalloc, when detecting a corruption, calls abort().
This will cause the Redis to exist silently without any report
and without any way to analyze what happened.
2020-11-03 14:59:21 +02:00
Oran Agra 441bfa2dfb
Optionally (default) fail to start if requested bind address is not available (#7936)
Background:
#3467 (redis 4.0.0), started ignoring ENOPROTOOPT, but did that only for
the default bind (in case bind config wasn't explicitly set).
#5598 (redis 5.0.3), added that for bind addresses explicitly set
(following bug reports in Debian for redis 4.0.9 and 5.0.1), it
also ignored a bunch of other errors like EPROTONOSUPPORT which was
requested in #3894, and also added EADDRNOTAVAIL (wasn't clear why).

This (ignoring EADDRNOTAVAIL) makes redis start successfully, even if a
certain network interface isn't up yet , in which case we rather redis
fail and will be re-tried when the NIC is up, see #7933.

However, it turns out that when IPv6 is disabled (supported but unused),
the error we're getting is EADDRNOTAVAIL. and in many systems the
default config file tries to bind to localhost for both v4 and v6 and
would like to silently ignore the error on v6 if disabled.
This means that we sometimes want to ignore EADDRNOTAVAIL and other times
we wanna fail.

So this commit changes these main things:
1. Ignore all the errors we ignore for both explicitly requested bind
   address and a default implicit one.
2. Add a '-' prefix to allow EADDRNOTAVAIL be ignored (by default that's
   different than the previous behavior).
3. Restructure that function in a more readable and maintainable way see
   below.
4. Make the default behavior of listening to all achievable by setting
  a bind config directive to * (previously only possible by omitting
  it)
5. document everything.

The old structure of this function was that even if there are no bind
addresses requested, the loop that runs though the bind addresses runs
at least once anyway!
In that one iteration of the loop it binds to both v4 and v6 addresses,
handles errors for each of them separately, and then eventually at the
if-else chain, handles the error of the last bind attempt again!
This was very hard to read and very error prone to maintain, instead now
when the bind info is missing we create one with two entries, and run
the simple loop twice.
2020-10-28 21:09:15 +02:00
WuYunlong 66037309c6 Fix waste of CPU time about server log in serverCron.
When all the work is just adding logs, we could pull
the condition out so as to use less CPU time when
loglevel is bigger than LL_VERBOSE.
2020-10-27 11:15:14 -07:00
zhenwei pi a9c0602149
Disable THP if enabled (#7381)
In case redis starts and find that THP is enabled ("always"), instead
of printing a log message, which might go unnoticed, redis will try to
disable it (just for the redis process).

Note: it looks like on self-bulit kernels THP is likely be set to "always" by default.

Some discuss about THP side effect on Linux:
according to http://www.antirez.com/news/84, we can see that
redis latency spikes are caused by linux kernel THP feature.
I have tested on E3-2650 v3, and found that 2M huge page costs
about 0.25ms to fix COW page fault.

Add a new config 'disable-thp', the recommended setting is 'yes',
(default) the redis tries to disable THP by prctl syscall. But
users who really want THP can set it to "no"

Thanks to Oran & Yossi for suggestions.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2020-10-27 15:04:18 +02:00
Wen Hui 0047702aab
Support ACL for Sentinel Mode (#7888)
This commit implements ACL for Sentinel mode, main work of this PR includes:

- Update Sentinel command table in order to better support ACLs.
- Fix couple of things which currently blocks the support for ACL on sentinel mode.
- Provide "sentinel sentinel-user" and "sentinel sentinel-pass " configuration in order to let sentinel authenticate with a specific user in other sentinels.
- requirepass is kept just for compatibility with old config files

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-10-19 07:33:55 +03:00
Oran Agra 457b7073b5
INFO report peak memory before eviction (#7894)
In some cases one command added a very big bulk of memory, and this
would be "resolved" by the eviction before the next command.

Seeing an unexplained mass eviction we would wish to
know the highest momentary usage too.

Tracking it in call() and beforeSleep() adds some hooks in AOF and RDB
loading.

The fix in clientsCronTrackExpansiveClients is related to #7874
2020-10-18 16:56:43 +03:00
guybe7 addf47dcac
Minor improvements to module blocked on keys (#7903)
- Clarify some documentation comments
- Make sure blocked-on-keys client privdata is accessible
  from withing the timeout callback
- Handle blocked clients in beforeSleep - In case a key
  becomes "ready" outside of processCommand

See #7879 #7880
2020-10-12 17:13:38 +03:00
Yossi Gottlieb 9b7f8ba84b Introduce getKeysResult for getKeysFromCommand.
Avoid using a static buffer for short key index responses, and make it
caller's responsibility to stack-allocate a result type. Responses that
don't fit are still allocated on the heap.
2020-10-11 16:04:14 +03:00
Uri Shachar dab5ec9b8d
Support getting configuration from both stdin and file at the same time (#7893)
This allows supplying secret configuration (for example - masterauth) via a secure channel
instead of having it in a plaintext file / command line param, while still allowing for most
of the configuration to reside there.

Also, remove 'special' case handling for --check-rdb which hasn't been relevant
since 4.0.0.
2020-10-11 13:43:23 +03:00
Felipe Machado c3f9e01794
Adds new pop-push commands (LMOVE, BLMOVE) (#6929)
Adding [B]LMOVE <src> <dst> RIGHT|LEFT RIGHT|LEFT. deprecating [B]RPOPLPUSH.

Note that when receiving a BRPOPLPUSH we'll still propagate an RPOPLPUSH,
but on BLMOVE RIGHT LEFT we'll propagate an LMOVE

improvement to existing tests
- Replace "after 1000" with "wait_for_condition" when wait for
  clients to block/unblock.
- Add a pre-existing element to target list on basic tests so
  that we can check if the new element was added to the correct
  side of the list.
- check command stats on the replica to make sure the right
  command was replicated

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-10-08 08:33:17 +03:00
Oran Agra bea40e6a41
memory reporting of clients argv (#7874)
track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.

in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.

This memory is now added to the total clients memory usage, as well as
the client list.
2020-10-05 11:15:36 +03:00
Oran Agra eb6241a3dd
Include internal sds fragmentation in MEMORY reporting (#7864)
The MEMORY command is used for debugging memory usage, so it should include internal
fragmentation, same as used_memory
2020-10-01 11:30:22 +03:00
WuYunlong c2e5546071
Normalize sds test mechanism together with some compile warnings. (#7854) 2020-09-28 11:27:26 +03:00
Wang Yuan 57709c4bc6
Don't write replies if close the client ASAP (#7202)
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-24 16:01:41 +03:00
bodong.ybd e08bf16637 Add ZINTER/ZUNION command
Syntax: ZINTER/ZUNION numkeys key [key ...] [WEIGHTS weight [weight ...]]
[AGGREGATE SUM|MIN|MAX] [WITHSCORES]

see #7624
2020-09-24 08:59:14 +03:00