Commit Graph

1189 Commits

Author SHA1 Message Date
Oran Agra 67750ce3b3 Fix failing tests due to issues with wait_for_log_message (#7572)
- the test now waits for specific set of log messages rather than wait for
  timeout looking for just one message.
- we don't wanna sample the current length of the log after an action, due
  to a race, we need to start the search from the line number of the last
  message we where waiting for.
- when attempting to trigger a full sync, use multi-exec to avoid a race
  where the replica manages to re-connect before we completed the set of
  actions that should force a full sync.
- fix verify_log_message which was broken and unused

(cherry picked from commit 109b5ccdcd)
2020-09-01 09:27:58 +03:00
Jiayuan Chen 096285ab64 Add optional tls verification (#7502)
Adds an `optional` value to the previously boolean `tls-auth-clients` configuration keyword.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
(cherry picked from commit f31260b044)
2020-09-01 09:27:58 +03:00
Oran Agra 6daa8b9adb Stabilize bgsave test that sometimes fails with valgrind (#7559)
on ci.redis.io the test fails a lot, reporting that bgsave didn't end.
increaseing the timeout we wait for that bgsave to get aborted.
in addition to that, i also verify that it indeed got aborted by
checking that the save counter wasn't reset.

add another test to verify that a successful bgsave indeed resets the
change counter.

(cherry picked from commit 8a57969fd7)
2020-09-01 09:27:58 +03:00
Oran Agra e8aa5583d0 testsuite may leave servers alive on error (#7549)
in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}

(cherry picked from commit 36b9494385)
2020-09-01 09:27:58 +03:00
Yossi Gottlieb f1d5d5d28e Tests: drop TCL 8.6 dependency. (#7548)
This re-implements the redis-cli --pipe test so it no longer depends on a close feature available only in TCL 8.6.

Basically what this test does is run redis-cli --pipe, generates a bunch of commands and pipes them through redis-cli, and inspects the result in both Redis and the redis-cli output.

To do that, we need to close stdin for redis-cli to indicate we're done so it can flush its buffers and exit. TCL has bi-directional channels can only offers a way to "one-way close" a channel with TCL 8.6. To work around that, we now generate the commands into a file and feed that file to redis-cli directly.

As we're writing to an actual file, the number of commands is now reduced.

(cherry picked from commit f57e844b2e)
2020-09-01 09:27:58 +03:00
Remi Collet af907e4b6d Fix deprecated tail syntax in tests (#7543)
(cherry picked from commit 3f2fbc4c61)
2020-09-01 09:27:58 +03:00
WuYunlong b1a01fda91 Fix command help for unexpected options (#7476)
(cherry picked from commit 93bdbf5aa4)
2020-07-20 21:08:26 +03:00
Oran Agra 05f8975d21 redis-cli tests, fix valgrind timing issue (#7519)
this test when run with valgrind on github actions takes 160 seconds

(cherry picked from commit 254c962554)
2020-07-20 21:08:26 +03:00
WuYunlong 4780cc5efa Fix out of update help info in tcl tests. (#7516)
Before this commit, the output of "./runtest-cluster --help" is incorrect.
After this commit, the format of the following 3 output is consistent:
./runtest --help
./runtest-cluster --help
./runtest-sentinel --help

(cherry picked from commit 8128d39737)
2020-07-20 21:08:26 +03:00
Oran Agra aea4db2f5a fix recently added time sensitive tests failing with valgrind (#7512)
interestingly the latency monitor test fails because valgrind is slow
enough so that the time inside PEXPIREAT command from the moment of
the first mstime() call to get the basetime until checkAlreadyExpired
calls mstime() again is more than 1ms, and that test was too sensitive.

using this opportunity to speed up the test (unrelated to the failure)
the fix is just the longer time passed to PEXPIRE.

(cherry picked from commit e5227aab89)
2020-07-20 21:08:26 +03:00
Oran Agra b5c5f870a4 runtest --stop pause stops before terminating the redis server (#7513)
in the majority of the cases (on this rarely used feature) we want to
stop and be able to connect to the shard with redis-cli.
since these are two different processes interracting with the tty we
need to stop both, and we'll have to hit enter twice, but it's not that
bad considering it is rarely used.

(cherry picked from commit 02ef355f98)
2020-07-20 21:08:26 +03:00
WuYunlong f838df92d2 Add missing latency-monitor tcl test to test_helper.tcl. (#6782)
(cherry picked from commit d792db7948)
2020-07-20 21:08:26 +03:00
Yossi Gottlieb 7a536c2912 TLS: Session caching configuration support. (#7420)
* TLS: Session caching configuration support.
* TLS: Remove redundant config initialization.

(cherry picked from commit 3e6f2b1a45)
2020-07-20 21:08:26 +03:00
Yossi Gottlieb b057ff81ee TLS: Add missing redis-cli options. (#7456)
* Tests: fix and reintroduce redis-cli tests.

These tests have been broken and disabled for 10 years now!

* TLS: add remaining redis-cli support.

This adds support for the redis-cli --pipe, --rdb and --replica options
previously unsupported in --tls mode.

* Fix writeConn().

(cherry picked from commit d9f970d8d3)
2020-07-20 21:08:26 +03:00
Oran Agra 95ba01b538 RESTORE ABSTTL won't store expired keys into the db (#7472)
Similarly to EXPIREAT with TTL in the past, which implicitly deletes the
key and return success, RESTORE should not store key that are already
expired into the db.
When used together with REPLACE it should emit a DEL to keyspace
notification and replication stream.

(cherry picked from commit 5977a94842)
2020-07-20 21:08:26 +03:00
Oran Agra 33ca884cc5 skip a test that uses +inf on valgrind (#7440)
On some platforms strtold("+inf") with valgrind returns a non-inf result

[err]: INCRBYFLOAT does not allow NaN or Infinity in tests/unit/type/incr.tcl
Expected 'ERR*would produce*' to equal or match '1189731495357231765085759.....'

(cherry picked from commit 909bc97c52)
2020-07-20 21:08:26 +03:00
Oran Agra 2b5f23197c stabilize tests that look for log lines (#7367)
tests were sensitive to additional log lines appearing in the log
causing the search to come empty handed.

instead of just looking for the n last log lines, capture the log lines
before performing the action, and then search from that offset.

(cherry picked from commit 8e76e13472)
2020-07-20 21:08:26 +03:00
Oran Agra 1104113c07 tests/valgrind: don't use debug restart (#7404)
* tests/valgrind: don't use debug restart

DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
   have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.

So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.

since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.

other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
  of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
  watching the startup message in the log), and serve the restarted server too.

* squashme - CR fixes

(cherry picked from commit 69ade87325)
2020-07-20 21:08:26 +03:00
antirez 17aaf5ec97 LPOS: option FIRST renamed RANK.
(cherry picked from commit a5a3a7bbc6)
2020-07-20 21:08:26 +03:00
Oran Agra 05e483cbb3 EXEC always fails with EXECABORT and multi-state is cleared
In order to support the use of multi-exec in pipeline, it is important that
MULTI and EXEC are never rejected and it is easy for the client to know if the
connection is still in multi state.

It was easy to make sure MULTI and DISCARD never fail (done by previous
commits) since these only change the client state and don't do any actual
change in the server, but EXEC is a different story.

Since in the past, it was possible for clients to handle some EXEC errors and
retry the EXEC, we now can't affort to return any error on EXEC other than
EXECABORT, which now carries with it the real reason for the abort too.

Other fixes in this commit:
- Some checks that where performed at the time of queuing need to be re-
  validated when EXEC runs, for instance if the transaction contains writes
  commands, it needs to be aborted. there was one check that was already done
  in execCommand (-READONLY), but other checks where missing: -OOM, -MISCONF,
  -NOREPLICAS, -MASTERDOWN
- When a command is rejected by processCommand it was rejected with addReply,
  which was not recognized as an error in case the bad command came from the
  master. this will enable to count or MONITOR these errors in the future.
- make it easier for tests to create additional (non deferred) clients.
- add tests for the fixes of this commit.

(cherry picked from commit 65a3307bc9)
2020-07-20 21:08:26 +03:00
meir@redislabs.com 51e178454d Fix RM_ScanKey module api not to return int encoded strings
The scan key module API provides the scan callback with the current
field name and value (if it exists). Those arguments are RedisModuleString*
which means it supposes to point to robj which is encoded as a string.
Using createStringObjectFromLongLong function might return robj that
points to an integer and so break a module that tries for example to
use RedisModule_StringPtrLen on the given field/value.

The PR introduces a fix that uses the createObject function and sdsfromlonglong function.
Using those function promise that the field and value pass to the to the
scan callback will be Strings.

The PR also changes the Scan test module to use RedisModule_StringPtrLen
to catch the issue. without this, the issue is hidden because
RedisModule_ReplyWithString knows to handle integer encoding of the
given robj (RedisModuleString).

The PR also introduces a new test to verify the issue is solved.

(cherry picked from commit a89bf734a9)
2020-07-20 21:08:26 +03:00
antirez e0022d8cfe LPOS: tests + crash fix. 2020-06-12 12:08:06 +02:00
antirez a7d3670da5 Adapt EVAL+busy script test to new behavior. 2020-06-09 12:19:30 +02:00
zhaozhao.zz 1d7bf208ce AOF: append origin SET if no expire option 2020-06-09 11:53:01 +02:00
Oran Agra f33de403ed fix pingoff test race 2020-06-06 11:44:21 +02:00
antirez 59cd4c9f65 Test: take PSYNC2 test master timeout high during switch.
This will likely avoid false positives due to trailing pings.
2020-05-28 10:56:14 +02:00
antirez 6c1bb7b194 Test: add the tracking unit as default. 2020-05-28 10:22:29 +02:00
Oran Agra 1aee695e52 tests: find_available_port start search from next port
i.e. don't start the search from scratch hitting the used ones again.
this will also reduce the likelihood of collisions (if there are any
left) by increasing the time until we re-use a port we did use in the
past.
2020-05-28 10:09:51 +02:00
Oran Agra a2ae463520 tests: each test client work on a distinct port range
apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.

this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop

see Probmem 2 in #7314
2020-05-28 10:09:51 +02:00
Oran Agra 86e562d691 32bit CI needs to build modules correctly 2020-05-28 10:09:51 +02:00
Oran Agra ab2984b1e2 adjust revived meaningful offset tests
these tests create several edge cases that are otherwise uncovered (at
least not consistently) by the test suite, so although they're no longer
testing what they were meant to test, it's still a good idea to keep
them in hope that they'll expose some issue in the future.
2020-05-28 10:09:51 +02:00
Oran Agra 1ff5a222de revive meaningful offset tests 2020-05-28 10:09:51 +02:00
antirez 3f8d113f1b Another meaningful offset test removed. 2020-05-28 10:09:51 +02:00
antirez d4541349dc Remove the PSYNC2 meaningful offset test. 2020-05-28 10:09:51 +02:00
antirez 8f10137227 Test: PSYNC2 test can now show server logs. 2020-05-28 10:09:51 +02:00
Qu Chen 58fc456cbd Disconnect chained replicas when the replica performs PSYNC with the master always to avoid replication offset mismatch between master and chained replicas. 2020-05-22 12:37:59 +02:00
Oran Agra 436be34986 fix a rare active defrag edge case bug leading to stagnation
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.

this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.

and also add a test that consistently reproduce this issue.
2020-05-22 12:37:49 +02:00
WuYunlong 2e41827435 Handle keys with hash tag when computing hash slot using tcl cluster client. 2020-05-22 12:37:49 +02:00
WuYunlong eb2c8b2c61 Add a test to prove current tcl cluster client can not handle keys with hash tag. 2020-05-22 12:37:49 +02:00
Oran Agra 00d8b92b89 fix valgrind test failure in replication test
in b4416280c i added more keys to that test to make it run longer
but in valgrind this now means the test times out, give valgrind more
time.
2020-05-22 12:37:49 +02:00
Oran Agra 5e17e6276c add regression test for the race in #7205
with the original version of 6.0.0, this test detects an excessive full
sync.
with the fix in 1a7cd2c0e, this test detects memory corruption,
especially when using libc allocator with or without valgrind.
2020-05-22 12:37:49 +02:00
antirez 96e7c011e2 Improve the PSYNC2 test reliability. 2020-05-22 12:37:49 +02:00
Yossi Gottlieb 16ba33c05b TLS: Fix test failures on recent Debian/Ubuntu.
Seems like on some systems choosing specific TLS v1/v1.1 versions no
longer works as expected. Test is reduced for v1.2 now which is still
good enough to test the mechansim, and matters most anyway.
2020-05-15 22:23:24 +02:00
Oran Agra 9da134cd88 fix redis 6.0 not freeing closed connections during loading.
This bug was introduced by a recent change in which readQueryFromClient
is using freeClientAsync, and despite the fact that now
freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since
it's not called during loading in processEventsWhileBlocked.
furthermore, afterSleep was called in that case but beforeSleep wasn't.

This bug also caused slowness sine the level-triggered mode of epoll
kept signaling these connections as readable causing us to keep doing
connRead again and again for ll of these, which keep accumulating.

now both before and after sleep are called, but not all of their actions
are performed during loading, some are only reserved for the main loop.

fixes issue #7215
2020-05-14 11:29:43 +02:00
antirez f7f219a137 Regression test for #7249. 2020-05-14 11:29:43 +02:00
Oran Agra 5c41802d55 fix unstable replication test
this test which has coverage for varoius flows of diskless master was
failing randomly from time to time.

the failure was:
[err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '*Diskless rdb transfer, last replica dropped, killing fork child*' not found

what seemed to have happened is that the master didn't detect that all
replicas dropped by the time the replication ended, it thought that one
replica is still connected.

now the test takes a few seconds longer but it seems stable.
2020-05-14 11:29:43 +02:00
antirez e48c37316e Test: --dont-clean should do first cleanup. 2020-05-08 10:37:36 +02:00
zhenwei pi d6436eb7cf Support setcpuaffinity on linux/bsd
Currently, there are several types of threads/child processes of a
redis server. Sometimes we need deeply optimise the performance of
redis, so we would like to isolate threads/processes.

There were some discussion about cpu affinity cases in the issue:
https://github.com/antirez/redis/issues/2863

So implement cpu affinity setting by redis.conf in this patch, then
we can config server_cpulist/bio_cpulist/aof_rewrite_cpulist/
bgsave_cpulist by cpu list.

Examples of cpulist in redis.conf:
server_cpulist 0-7:2      means cpu affinity 0,2,4,6
bio_cpulist 1,3           means cpu affinity 1,3
aof_rewrite_cpulist 8-11  means cpu affinity 8,9,10,11
bgsave_cpulist 1,10-11    means cpu affinity 1,10,11

Test on linux/freebsd, both work fine.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2020-05-08 10:37:35 +02:00
Oran Agra 3d3861dd88 add daily github actions with libc malloc and valgrind
* fix memlry leaks with diskless replica short read.
* fix a few timing issues with valgrind runs
* fix issue with valgrind and watchdog schedule signal

about the valgrind WD issue:
the stack trace test in logging.tcl, has issues with valgrind:
==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1:
==28808==   too small or bad protection modes

it seems to be some valgrind bug with SA_ONSTACK.
SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
also, not sure if it's even valid without a call to sigaltstack()
2020-05-08 10:37:35 +02:00
Guy Benoish 6c0bc608a1 Extend XINFO STREAM output
Introducing XINFO STREAM <key> FULL
2020-04-30 13:02:58 +02:00
Oran Agra ea63aea72d fix loading race in psync2 tests 2020-04-28 11:20:15 +02:00
Oran Agra e4d2bb62b2 Keep track of meaningful replication offset in replicas too
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
2020-04-27 15:52:49 +02:00
Guy Benoish 43329c9b64 Add the stream tag to XSETID tests 2020-04-27 15:52:49 +02:00
antirez 3722f89f49 LCS -> STRALGO LCS.
STRALGO should be a container for mostly read-only string
algorithms in Redis. The algorithms should have two main
characteristics:

1. They should be non trivial to compute, and often not part of
programming language standard libraries.
2. They should be fast enough that it is a good idea to have optimized C
implementations.

Next thing I would love to see? A small strings compression algorithm.
2020-04-24 16:49:27 +02:00
yanhui13 f03f1fad67 add tcl test for cluster slots 2020-04-24 10:15:04 +02:00
antirez 06917e581c Tracking: test expired keys notifications. 2020-04-24 10:14:48 +02:00
antirez e434b2ce4f Tracking: NOLOOP tests. 2020-04-24 10:14:48 +02:00
antirez 5c4c73e2c2 A few comments and name changes for #7103. 2020-04-17 13:02:40 +02:00
Oran Agra 6148f94939 testsuite run the defrag latency test solo
this test is time sensitive and it sometimes fail to pass below the
latency threshold, even on strong machines.

this test was the reson we're running just 2 parallel tests in the
github actions CI, revering this.
2020-04-17 13:02:40 +02:00
antirez 5719b3054a LCS: more tests. 2020-04-07 16:52:57 +02:00
antirez 0b16f8d444 LCS tests. 2020-04-07 16:52:57 +02:00
Oran Agra 02b594f6ad diffrent fix for runtest --host --port 2020-04-07 16:52:28 +02:00
Guy Benoish f695d18306 Try to fix time-sensitive tests in blockonkey.tcl
There is an inherent race between the deferring client and the
"main" client of the test: While the deferring client issues a blocking
command, we can't know for sure that by the time the "main" client
tries to issue another command (Usually one that unblocks the deferring
client) the deferring client is even blocked...
For lack of a better choice this commit uses TCL's 'after' in order
to give some time for the deferring client to issues its blocking
command before the "main" client does its thing.
This problem probably exists in many other tests but this commit
tries to fix blockonkeys.tcl
2020-04-07 16:52:28 +02:00
Valentino Geron 3e0d209625 XREAD and XREADGROUP should not be allowed from scripts when BLOCK option is being used 2020-04-07 16:52:04 +02:00
Guy Benoish 024c380b9d Fix no-negative-zero test 2020-04-07 16:52:04 +02:00
Guy Benoish 7764996bec Make sure Redis does not reply with negative zero 2020-04-07 16:52:04 +02:00
Valentino Geron 8cdc153f58 XACK should be executed in a "all or nothing" fashion.
First, we must parse the IDs, so that we abort ASAP.
The return value of this command cannot be an error if
the client successfully acknowledged some messages,
so it should be executed in a "all or nothing" fashion.
2020-04-07 16:52:04 +02:00
Guy Benoish 193fc241ca Fix memory corruption in moduleHandleBlockedClients
By using a "circular BRPOPLPUSH"-like scenario it was
possible the get the same client on db->blocking_keys
twice (See comment in moduleTryServeClientBlockedOnKey)

The fix was actually already implememnted in
moduleTryServeClientBlockedOnKey but it had a bug:
the funxction should return 0 or 1 (not OK or ERR)

Other changes:
1. Added two commands to blockonkeys.c test module (To
   reproduce the case described above)
2. Simplify blockonkeys.c in order to make testing easier
3. cast raxSize() to avoid warning with format spec
2020-04-07 16:52:03 +02:00
Guy Benoish a509400d58 Modules: Test MULTI/EXEC replication of RM_Replicate
Makse sure call() doesn't wrap replicated commands with
a redundant MULTI/EXEC

Other, unrelated changes:
1. Formatting compiler warning in INFO CLIENTS
2. Use CLIENT_ID_AOF instead of UINT64_MAX
2020-03-31 17:12:19 +02:00
Guy Benoish 805c8c94a9 RENAME can unblock XREADGROUP
Other changes:
Support stream in serverLogObjectDebugInfo
2020-03-31 16:57:20 +02:00
antirez 97b80b57e9 Fix the propagate Tcl test after module changes. 2020-03-31 16:57:20 +02:00
antirez 4f6b6b8008 Modify the propagate unit test to show more cases. 2020-03-31 16:57:20 +02:00
antirez 616b1cb7ac Fix module commands propagation double MULTI bug.
37a10cef introduced automatic wrapping of MULTI/EXEC for the
alsoPropagate API. However this collides with the built-in mechanism
already present in module.c. To avoid complex changes near Redis 6 GA
this commit introduces the ability to exclude call() MUTLI/EXEC wrapping
for also propagate in order to continue to use the old code paths in
module.c.
2020-03-31 16:57:20 +02:00
Oran Agra 0f7dfc378c AOFRW on an empty stream created with MKSTREAM loads badkly
the AOF will be loaded successfully, but the stream will be missing,
i.e inconsistencies with the original db.

this was because XADD with id of 0-0 would error.

add a test to reproduce.
2020-03-31 16:57:20 +02:00
antirez e257f121e1 PSYNC2: meaningful offset test. 2020-03-25 15:55:24 +01:00
Oran Agra e43cd8316f MULTI/EXEC during LUA script timeout are messed up
Redis refusing to run MULTI or EXEC during script timeout may cause partial
transactions to run.

1) if the client sends MULTI+commands+EXEC in pipeline without waiting for
response, but these arrive to the shards partially while there's a busy script,
and partially after it eventually finishes: we'll end up running only part of
the transaction (since multi was ignored, and exec would fail).

2) similar to the above if EXEC arrives during busy script, it'll be ignored and
the client state remains in a transaction.

the 3rd test which i added for a case where MULTI and EXEC are ok, and
only the body arrives during busy script was already handled correctly
since processCommand calls flagTransaction
2020-03-25 15:55:24 +01:00
antirez 70a98a43ea Fix BITFIELD_RO test. 2020-03-25 15:55:24 +01:00
bodong.ybd b3e4abf06e Added BITFIELD_RO variants for read-only operations. 2020-03-25 15:55:24 +01:00
WuYunlong 299f1d0258 Add 14-consistency-check.tcl to prove there is a data consistency issue. 2020-03-25 15:54:34 +01:00
antirez 61b98f32a2 Regression test for #7011. 2020-03-25 15:54:34 +01:00
bodong.ybd 76d57161d9 Fix bug of tcl test using external server 2020-03-25 15:54:34 +01:00
Oran Agra 6173815408 fix for flaky psync2 test
*** [err]: PSYNC2: total sum of full synchronizations is exactly 4 in tests/integration/psync2.tcl
Expected 5 == 4 (context: type eval line 6 cmd {assert {$sum == 4}} proc ::test)

issue was that sometime the test got an unexpected full sync since it
tried to switch to the replica before it was in sync with it's master.
2020-03-12 15:53:47 +01:00
Oran Agra 12626ce9bb fix race in module api test for fork
in some cases we were trying to kill the fork before it got created
2020-02-27 18:02:30 +01:00
Oran Agra 635321d47e fix github actions failing latency test for active defrag - part 2
it seems that running two clients at a time is ok too, resuces action
time from 20 minutes to 10. we'll use this for now, and if one day it
won't be enough we'll have to run just the sensitive tests one by one
separately from the others.

this commit also fixes an issue with the defrag test that appears to be
very rare.
2020-02-27 18:02:30 +01:00
Oran Agra 0b988fa9ec fix github actions failing latency test for active defrag
seems that github actions are slow, using just one client to reduce
false positives.

also adding verbose, testing only on latest ubuntu, and building on
older one.

when doing that, i can reduce the test threshold back to something saner
2020-02-27 18:02:30 +01:00
Oran Agra 60096bc1a1 Fix latency sensitivity of new defrag test
I saw that the new defag test for list was failing in CI recently, so i
reduce it's threshold from 12 to 60.

besides that, i add / improve the latency test for that other two defrag
tests (add a sensitive latency and digest / save checks)

and fix bad usage of debug populate (can't overrides existing keys).
this was the original intention, which creates higher fragmentation.
2020-02-27 18:02:30 +01:00
antirez ef3551d149 Test engine: experimental change to avoid busy port problems. 2020-02-27 18:02:30 +01:00
antirez 72c053519c Test engine: detect timeout when checking for Redis startup. 2020-02-27 18:02:30 +01:00
antirez 294c9af469 Test engine: better tracking of what workers are doing. 2020-02-27 18:00:47 +01:00
Guy Benoish 5d0890c0a8 Fix memory leak in test_ld_conv 2020-02-27 18:00:47 +01:00
Oran Agra 349aa24511 Defrag big lists in portions to avoid latency and freeze
When active defrag kicks in and finds a big list, it will create a bookmark to
a node so that it is able to resume iteration from that node later.

The quicklist manages that bookmark, and updates it in case that node is deleted.

This will increase memory usage only on lists of over 1000 (see
active-defrag-max-scan-fields) quicklist nodes (1000 ziplists, not 1000 items)
by 16 bytes.

In 32 bit build, this change reduces the maximum effective config of
list-compress-depth and list-max-ziplist-size (from 32767 to 8191)
2020-02-27 18:00:47 +01:00
Guy Benoish b4ddc7b7ba XGROUP DESTROY should unblock XREADGROUP with -NOGROUP 2020-02-27 18:00:47 +01:00
antirez b6129f86c1 Test is more complex now, increase default timeout. 2020-02-27 18:00:47 +01:00
antirez b7cb28d501 Tracking: first set of tests for the feature. 2020-02-27 18:00:46 +01:00
antirez 51c1a9f8fb ACL LOG: make max log entries configurable. 2020-02-12 14:15:35 +01:00
antirez ea1e1b12c9 ACL LOG: test for AUTH reason. 2020-02-12 14:15:35 +01:00
antirez 9f6e84f6be ACL LOG: implement a few basic tests. 2020-02-12 14:15:35 +01:00
WuYunlong d2509811b7 Add tcl regression test in scripting.tcl to reproduce memory leak. 2020-02-04 10:23:48 +01:00
Guy Benoish 6fe55c2f29 ld2string should fail if string contains \0 in the middle
This bug affected RM_StringToLongDouble and HINCRBYFLOAT.
I added tests for both cases.

Main changes:
1. Fixed string2ld to fail if string contains \0 in the middle
2. Use string2ld in getLongDoubleFromObject - No point of
   having duplicated code here

The two changes above broke RM_SaveLongDouble/RM_LoadLongDouble
because the long double string was saved with length+1 (An innocent
mistake, but it's actually a bug - The length passed to
RM_SaveLongDouble should not include the last \0).
2020-02-04 10:23:48 +01:00
Guy Benoish 40295fb3fe Modules: Fix blocked-client-related memory leak
If a blocked module client times-out (or disconnects, unblocked
by CLIENT command, etc.) we need to call moduleUnblockClient
in order to free memory allocated by the module sub-system
and blocked-client private data

Other changes:
Made blockedonkeys.tcl tests a bit more aggressive in order
to smoke-out potential memory leaks
2020-02-04 10:23:48 +01:00
Guy Benoish 1b5bf40c20 Blocking XREAD[GROUP] should always reply with valid data (or timeout)
This commit solves the following bug:
127.0.0.1:6379> XGROUP CREATE x grp $ MKSTREAM
OK
127.0.0.1:6379> XADD x 666 f v
"666-0"
127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x >
1) 1) "x"
   2) 1) 1) "666-0"
         2) 1) "f"
            2) "v"
127.0.0.1:6379> XADD x 667 f v
"667-0"
127.0.0.1:6379> XDEL x 667
(integer) 1
127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x >
1) 1) "x"
   2) (empty array)

The root cause is that we use s->last_id in streamCompareID
while we should use the last *valid* ID
2020-01-10 13:16:14 +01:00
Guy Benoish cddf1da2e9 Stream: Handle streamID-related edge cases
This commit solves several edge cases that are related to
exhausting the streamID limits: We should correctly calculate
the succeeding streamID instead of blindly incrementing 'seq'
This affects both XREAD and XADD.

Other (unrelated) changes:
Reply with a better error message when trying to add an entry
to a stream that has exhausted last_id
2019-12-29 15:46:31 +01:00
Oran Agra 52ea44e53b config.c adjust config limits and mutable
- make lua-replicate-commands mutable (it never was, but i don't see why)
- make tcp-backlog immutable (fix a recent refactory mistake)
- increase the max limit of a few configs to match what they were before
the recent refactory
2019-12-29 15:46:31 +01:00
antirez e6e58e455c Revert "Geo: output 10 chars of geohash, not 11."
This reverts commit 009862ab7e.
2019-12-18 12:54:46 +01:00
zhaozhao.zz 58554396d6 incrbyfloat: fix issue #5256 ttl lost after propagate 2019-12-18 15:44:51 +08:00
zhaozhao.zz 24044f3356 add a new SET option KEEPTTL that doesn't remove expire time 2019-12-18 15:20:36 +08:00
Madelyn Olson a511a37bb7 Fixed some documentation 2019-12-17 07:49:21 +00:00
Madelyn Olson 67aa527b22 Added some documentation and fixed a test 2019-12-17 07:15:04 +00:00
Madelyn Olson 034dcf185c Add module APIs for custom authentication 2019-12-17 06:59:59 +00:00
Yossi Gottlieb 0283db5883 Improve RM_ModuleTypeReplaceValue() API.
With the previous API, a NULL return value was ambiguous and could
represent either an old value of NULL or an error condition. The new API
returns a status code and allows the old value to be returned
by-reference.

This commit also includes test coverage based on
tests/modules/datatype.c which did not exist at the time of the original
commit.
2019-12-12 18:50:11 +02:00
Oran Agra 5661b19005 fix crash in module short read test 2019-12-02 19:17:35 +02:00
Salvatore Sanfilippo ce7ec725e3
Merge pull request #6624 from oranagra/config_c_step_3
Additional config.c refractory and bugfixes
2019-12-02 08:59:36 +01:00
Oran Agra 07b365b7d7 revert an accidental test code change done as part of the tls project
it seems that commit b087dd1db6 accidentially changed
gen_write_load to not use deferred client, which causes them to be slower and not
generate high load which they should, making some tests less effecitive
2019-12-01 16:10:09 +02:00
Oran Agra 18e72c5cc7 Converting more configs to use generic infra, and moving defaults to config.c
Changes in behavior:
- Change server.stream_node_max_entries from int64_t to long long, so that it can be used by the generic infra
- standard error reply instead of "repl-backlog-size must be 1 or greater" and such
- tls-port and a few TLS booleans were readable (config get) even when USE_OPENSSL was off (now they aren't)
- syslog-enabled, syslog-ident, cluster-enabled, appendfilename, and supervised didn't have a get (now they do)
- pidfile was initialized to NULL in InitServerConfig but had CONFIG_DEFAULT_PID_FILE in rewriteConfig (so the real default was "", but rewrite would cause it to be set), fixed the rewrite.
- TLS config in server.h was uninitialized (if no tls config args were provided)

Adding test for sanity and coverage
2019-11-28 11:24:57 +02:00
Salvatore Sanfilippo a1b654819c
Merge pull request #6598 from oranagra/module-hook-test
try to fix an unstable test (module hook for loading progress)
2019-11-25 17:54:21 +01:00
Salvatore Sanfilippo 64c2508ee3
Merge branch 'unstable' into rm_get_server_info 2019-11-21 10:06:15 +01:00
Salvatore Sanfilippo 808394b77d
Merge pull request #6603 from daidaotong/typofix
fix typo in scripting.acl
2019-11-20 10:06:33 +01:00
Daniel Dai 99b5696390 fix typo 2019-11-19 20:14:59 -05:00
Oran Agra ed2269762b try to fix an unstable test (module hook for loading progress)
there were two lssues, one is taht BGREWRITEAOF failed since the initial one was still in progress
the solution for this one is to enable appendonly from the server startup so there's no initial aofrw.

the other problem was 0 loading progress events, theory is that on some
platforms a sleep of 1 will cause a much greater delay due to the context
switch, but on other platform it doesn't. in theory a sleep of 100 micro
for 1k keys whould take 100ms, and with hz of 500 we should be gettering
50 events (one every 2ms). in practise it doesn't work like that, so trying
to find a sleep that would be long enough but still not cause the test to take
too long.
2019-11-19 15:01:51 +02:00
Salvatore Sanfilippo e7144fbed8
Merge branch 'unstable' into module-long-double 2019-11-19 12:15:45 +01:00
Salvatore Sanfilippo e916058f0b
Merge pull request #6557 from oranagra/rm_lru_lfu_revized
rename RN_SetLRUOrLFU -> RM_SetLRU and RN_SetLFU
2019-11-19 11:58:07 +01:00
antirez 446f24e66d Merge branch 'unstable' of github.com:/antirez/redis into unstable 2019-11-19 11:52:40 +01:00
Salvatore Sanfilippo b311a368e0
Merge pull request #6558 from oranagra/module_testrdb_leak
fix leak in module api rdb test
2019-11-19 11:49:43 +01:00
antirez 936e01e5bb Fix stream test after addition of 0-0 ID test. 2019-11-19 11:49:05 +01:00
Salvatore Sanfilippo 397a8b57cc
Merge pull request #6513 from oranagra/test_assertions
test infra: improve prints on failed assertions
2019-11-19 11:34:11 +01:00
Salvatore Sanfilippo 656e40eed2
Merge branch 'unstable' into scan_module_impl 2019-11-19 11:08:02 +01:00
Salvatore Sanfilippo 3d89210477
Merge pull request #3383 from yossigo/datatype_load_save
Redis Module API calls to allow re-use of data type RDB save/load.
2019-11-19 10:55:42 +01:00
Guy Benoish 4a12047c61 XADD with ID 0-0 stores an empty key
Calling XADD with 0-0 or 0 would result in creating an
empty key and storing it in the database.
Even worse, because XADD will reply with error the action
will not be replicated, creating a master-replica
inconsistency
2019-11-13 16:47:30 +05:30
Oran Agra 0f8692b464 Add RM_ScanKey to scan hash, set, zset, changes to RM_Scan API
- Adding RM_ScanKey
- Adding tests for RM_ScanKey
- Refactoring RM_Scan API

Changes in RM_Scan
- cleanup in docs and coding convention
- Moving out of experimantal Api
- Adding ctx to scan callback
- Dont use cursor of -1 as an indication of done (can be a valid cursor)
- Set errno when returning 0 for various reasons
- Rename Cursor to ScanCursor
- Test filters key that are not strings, and opens a key if NULL
2019-11-11 16:05:55 +02:00
meir@redislabs.com 11c6ce812a Added scan implementation to module api.
The implementation expose the following new functions:
1. RedisModule_CursorCreate - allow to create a new cursor object for
keys scanning
2. RedisModule_CursorRestart - restart an existing cursor to restart the
scan
3. RedisModule_CursorDestroy - destroy an existing cursor
4. RedisModule_Scan - scan keys

The RedisModule_Scan function gets a cursor object, a callback and void*
(used as user private data).
The callback will be called for each key in the database proving the key
name and the value as RedisModuleKey.
2019-11-11 16:01:41 +02:00
Oran Agra 28c20b4ef9 rename RN_SetLRUOrLFU -> RM_SetLRU and RN_SetLFU
- the API name was odd, separated to two apis one for LRU and one for LFU
- the LRU idle time was in 1 second resolution, which might be ok for RDB
  and RESTORE, but i think modules may need higher resolution
- adding tests for LFU and for handling maxmemory policy mismatch
2019-11-10 09:27:01 +02:00
Oran Agra 02f21113ab fix leak in module api rdb test
recently added more reads into that function, if a later read fails, i must
either free what's already allocated, or return the pointer so that the free
callback will release it.
2019-11-10 09:21:19 +02:00
Yossi Gottlieb 5350e7669e Add ModuleDataType to/from string serialization.
Add two new functions that leverage the RedisModuleDataType mechanism
for RDB serialization/deserialization and make it possible to use it
to/from arbitrary strings:

* RM_SaveDataTypeToString()
* RM_LoadDataTypeFromString()
2019-11-04 21:53:59 +02:00
Salvatore Sanfilippo a15a5d7097
Merge pull request #6540 from oranagra/simple_module_api_tests
Test coverage for new module APIs: dbsize, flushall, randomkey, lru get/set
2019-11-04 18:52:31 +01:00
artix a68c19df6c RM_CreateStringFromLongDouble: use new ld2string 'mode' type 2019-11-04 18:49:56 +01:00
artix 060af1858d Fix RedisModule_ReplyWithLongDouble ptr definition, add tests 2019-11-04 18:49:50 +01:00
Oran Agra 3adf10b809 Test coverage for new module APIs: dbsize, flushall, randomkey, lru get/set 2019-11-04 19:30:31 +02:00
Salvatore Sanfilippo 576a7b8caf
Merge pull request #6539 from kristoff-it/unstable
fix unreported overflow in autogerenared stream IDs
2019-11-04 17:37:28 +01:00
Salvatore Sanfilippo bbdbfabf9e
Merge pull request #6531 from oranagra/rm_save_long_double
Module API for loading and saving long double
2019-11-04 17:33:35 +01:00
Loris Cro b12d2f65d6 fix unreported overflow in autogerenared stream IDs 2019-11-04 16:36:06 +01:00
Salvatore Sanfilippo 23c681b51d
Merge pull request #6535 from guybe7/module_block_on_keys_tests
Modules: Test RedisModule_BlockClientOnKeys
2019-11-04 10:53:59 +01:00
Guy Benoish b81f486c2f Modules: Test RedisModule_BlockClientOnKeys 2019-11-04 14:20:24 +05:30
Oran Agra 0423309768 Add RM_ServerInfoGetFieldUnsigned
rename RM_ServerInfoGetFieldNumerical RM_ServerInfoGetFieldSigned
move string2ull to util.c
fix leak in RM_GetServerInfo when duplicate info fields exist
2019-11-04 08:50:29 +02:00
Oran Agra deebed23e1 Adding RM_ServerInfoGetFieldC 2019-11-04 07:57:52 +02:00
Oran Agra 779aebc91c Module API for loading and saving long double
looks like each platform implements long double differently (different bit count)
so we can't save them as binary, and we also want to avoid creating a new RDB
format version, so we save these are hex strings using "%La".

This commit includes a change in the arguments of ld2string to support this.
as well as tests for coverage and short reads.

coded by @guybe7
2019-11-03 16:42:31 +02:00
Oran Agra 4d580438b0 Add module api for looking into INFO fields
- Add RM_GetServerInfo and friends
- Add auto memory for new opaque struct
- Add tests for new APIs

other minor fixes:
- add const in various char pointers
- requested_section in modulesCollectInfo was actually not sds but char*
- extract new string2d out of getDoubleFromObject for code reuse

Add module API for
2019-11-03 15:02:25 +02:00
Oran Agra 51c3ff8d75 Modules hooks: complete missing hooks for the initial set of hooks
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above

startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
2019-10-29 17:59:09 +02:00
Oran Agra a0cfd519e3 test infra: improve prints on failed assertions
sometimes we have several assertions with the same condition in the same test
at different stages, and when these fail (the ones that print the condition
text) you don't know which one it was. other assertions didn't print the
condition text (variable names), just the expected and unexpected values.

So now, all assertions print context line, and conditin text.

besides, one of the major differences between 'assert' and 'assert_equal',
is that the later is able to print the value that doesn't match the expected.
if there is a rare non-reproducible failure, it is helpful to know what was
the value the test encountered and how far it was from the threshold.

So now, adding assert_lessthan and assert_range that can be used in some places.
were we used just 'assert { a > b }' so far.
2019-10-29 17:38:12 +02:00
Oran Agra 0399b5a27e Module api tests for RM_Call
Adding a test for coverage for RM_Call in a new "misc" unit
to be used for various short simple tests

also solves compilation warnings in redismodule.h and fork.c
2019-10-28 12:39:57 +02:00
Salvatore Sanfilippo c328c807e7
Merge pull request #6496 from oranagra/module_tests_valgrind
Make module tests pass with valgrind, and fix a leak in diskless load
2019-10-28 10:05:52 +01:00