* man-like consistent long formatting
* Uppercases commands, subcommands and options
* Adds 'HELP' to HELP for all
* Lexicographical order
* Uses value notation and other .md likeness
* Moves const char *help to top
* Keeps it under 80 chars
* Misc help typos, consistent conjuctioning (i.e return and not returns)
* Uses addReplySubcommandSyntaxError(c) all over
Signed-off-by: Itamar Haber <itamar@redislabs.com>
The crash log attempts to print the current client info, and when it
does that it attempts to check if the first argument happens to be a key
but it did so for commands with no arguments too, which caused the crash
log to crash half way and not reach its end.
Turns out that when the fork child crashes, the crash log was deleting
the pidfile from the disk (although the parent is still running.
Now we set the pidfile of the fork process to NULL so the fork process
will never deletes it.
* Allow runtest-moduleapi use a different 'make', for systems where GNU Make is 'gmake'.
* Fix issue with builds on Solaris re-building everything from scratch due to CFLAGS/LDFLAGS not stored.
* Fix compile failure on Solaris due to atomicvar and a bunch of warnings.
* Fix garbled log timestamps on Solaris.
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.
It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.
Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
assertion, so that it doesn't fail the test. (since we set the server to use
`exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress
test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
The bug was introduced by #5021 which only attempted avoid EXIST on an
already expired key from returning 1 on a replica.
Before that commit, dbExists was used instead of
lookupKeyRead (which had an undesired effect to "touch" the LRU/LFU)
Other than that, this commit fixes OBJECT to also come empty handed on
expired keys in replica.
And DEBUG DIGEST-VALUE to behave like DEBUG OBJECT (get the data from
the key regardless of it's expired state)
* Debug Populate: Add checks for count and keysize to avoid crash
* provide getRangeLongFromObjectOrReply and getPositiveLongFromObjectOrReply for range check
The reason that we want to get a full crash report on SIGABRT
is that the jmalloc, when detecting a corruption, calls abort().
This will cause the Redis to exist silently without any report
and without any way to analyze what happened.
If one thread got SIGSEGV, function sigsegvHandler() would be triggered,
it would call bioKillThreads(). But call pthread_cancel() to cancel itself
would make it block. Also note that if SIGSEGV is caught by bio thread, it
should kill the main thread in order to give a positive report.
This is a catch-all test to confirm that that rewrite produces a valid
output for all parameters and that this process does not introduce
undesired configuration changes.
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.
Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
of a microsecond sleep
DEBUG ZIPLIST <key> currently returns the following error string if the
key is not a ziplist: "ERR Not an sds encoded string.". This looks like
an accidental copy/paste error from the error returned in the else if
branch above where this string is returned if the key is not an sds
string. The command was added in
ac61f90625 and looking at the commit,
nothing indicates that it is not an accidental typo.
The error string now returns a correct error: "Not a ziplist encoded
object", which accurately describes the error.
The previous fix using _Atomic was insufficient, since we check and set it in
different places.
The implications of this bug are just that a portion of the bug report will be shown
twice, in the race case of two concurrent crashes.
this race would only happen when two threads paniced at the same time,
and even then the only consequence is some extra log lines.
race reported in #7391
This makes it possible to add tests that generate assertions, and run
them with valgrind, making sure that there are no memory violations
prior to the assertion.
New config options:
- crash-log-enabled - can be disabled for cleaner core dumps
- crash-memcheck-enabled - useful for faster termination after a crash
- use-exit-on-panic - to be used by the test suite so that valgrind can
detect leaks and memory corruptions
Other changes:
- Crash log is printed even on system that dont HAVE_BACKTRACE, i.e. in
both SIGSEGV and assert / panic
- Assertion and panic won't print registers and code around EIP (which
was useless), but will do fast memory test (which may still indicate
that the assertion was due to memory corrpution)
I had to reshuffle code in order to re-use it, so i extracted come code
into function without actually doing any changes to the code:
- logServerInfo
- logModulesInfo
- doFastMemoryTest (with the exception of it being conditional)
- dumpCodeAroundEIP
changes to the crash report on segfault:
- logRegisters is called right after the stack trace (before info) done
just in order to have more re-usable code
- stack trace skips the first two items on the stack (the crash log and
signal handler functions)
valsize was not modified during the for loop below instead of getting from c->argv[4], therefore there is no need to put inside the for loop.. Moreover, putting the check outside loop will also avoid memory leaking, decrRefCount(key) should be called in the original code if we put the check in for loop
* tests/valgrind: don't use debug restart
DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.
So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.
since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.
other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
watching the startup message in the log), and serve the restarted server too.
* squashme - CR fixes
* fix memlry leaks with diskless replica short read.
* fix a few timing issues with valgrind runs
* fix issue with valgrind and watchdog schedule signal
about the valgrind WD issue:
the stack trace test in logging.tcl, has issues with valgrind:
==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1:
==28808== too small or bad protection modes
it seems to be some valgrind bug with SA_ONSTACK.
SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
also, not sure if it's even valid without a call to sigaltstack()
* replication hooks: role change, master link status, replica online/offline
* persistence hooks: saving, loading, loading progress
* misc hooks: cron loop, shutdown, module loaded/unloaded
* change the way hooks test work, and add tests for all of the above
startLoading() now gets flag indicating what is loaded.
stopLoading() now gets an indication of success or failure.
adding startSaving() and stopSaving() with similar args and role.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
of aarch64.
The content comes from the definition of the sigcontext and tested on
my aarch64 server.
sigcontext defined at the linux kernel code:
arch/arm64/include/uapi/asm/sigcontext.h
jemalloc 5 doesn't immediately release memory back to the OS, instead there's a decaying
mechanism, which doesn't work when there's no traffic (no allocations).
this is most evident if there's no traffic after flushdb, the RSS will remain high.
1) enable jemalloc background purging
2) explicitly purge in flushdb
__x86_64__ is defined on the on the x32 architecture and the conditionals in
debug.c therefore assume the size of (void*) etc:
debug.c: In function 'getMcontextEip':
debug.c:757:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
return (void*) uc->uc_mcontext.gregs[16]; /* Linux 64 */
^
debug.c: In function 'logRegisters':
debug.c:920:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
logStackContent((void**)uc->uc_mcontext.gregs[15]);
We can remedy this by checking for __ILP32__ first. See:
https://wiki.debian.org/ArchitectureSpecificsMemo
... for more info.
and will not be inconsistent after we call debug loadaof.
Before this commit, there were 2 problems:
1, When appendonly is set to no and there is not a appendonly file,
redis-server will crash if we call DEBUG LOADAOF.
2, When appendonly is set to no and there is a appendonly file,
redis-server will hold different data after loading appendonly
file.
The main change introduced by this commit is pretending that help
arrays are more text than code, thus indenting them at level 0. This
improves readability, and is an old practice when defining arrays of
C strings describing text.
Additionally a few useless return statements are removed, and the HELP
subcommand capitalized when printed to the user.
With PSYNC2 to force a full SYNC in tests is hard. With this new DEBUG
subcommand we just need to call it and then CLIENT KILL TYPE master in
the slave.
This adds a new `addReplyHelp` helper that's used by commands
when returning a help text. The following commands have been
touched: DEBUG, OBJECT, COMMAND, PUBSUB, SCRIPT and SLOWLOG.
WIP
Fix entry command table entry for OBJECT for HELP option.
After #4472 the command may have just 2 arguments.
Improve OBJECT HELP descriptions.
See #4472.
WIP 2
WIP 3
This commit is a reinforcement of commit c1c99e9.
1. Replication information can be stored when the RDB file is
generated by a mater using server.slaveseldb when server.repl_backlog
is not NULL, or set repl_stream_db be -1. That's safe, because
NULL server.repl_backlog will trigger full synchronization,
then master will send SELECT command to replicaiton stream.
2. Only do rdbSave* when rsiptr is not NULL,
if we do rdbSave* without rdbSaveInfo, slave will miss repl-stream-db.
3. Save the replication informations also in the case of
SAVE command, FLUSHALL command and DEBUG reload.
This change attempts to switch to an hash function which mitigates
the effects of the HashDoS attack (denial of service attack trying
to force data structures to worst case behavior) while at the same time
providing Redis with an hash function that does not expect the input
data to be word aligned, a condition no longer true now that sds.c
strings have a varialbe length header.
Note that it is possible sometimes that even using an hash function
for which collisions cannot be generated without knowing the seed,
special implementation details or the exposure of the seed in an
indirect way (for example the ability to add elements to a Set and
check the return in which Redis returns them with SMEMBERS) may
make the attacker's life simpler in the process of trying to guess
the correct seed, however the next step would be to switch to a
log(N) data structure when too many items in a single bucket are
detected: this seems like an overkill in the case of Redis.
SPEED REGRESION TESTS:
In order to verify that switching from MurmurHash to SipHash had
no impact on speed, a set of benchmarks involving fast insertion
of 5 million of keys were performed.
The result shows Redis with SipHash in high pipelining conditions
to be about 4% slower compared to using the previous hash function.
However this could partially be related to the fact that the current
implementation does not attempt to hash whole words at a time but
reads single bytes, in order to have an output which is endian-netural
and at the same time working on systems where unaligned memory accesses
are a problem.
Further X86 specific optimizations should be tested, the function
may easily get at the same level of MurMurHash2 if a few optimizations
are performed.
This is of great interest because allows us to print debugging
informations that could be of useful when debugging, like in the
following example:
serverPanic("Unexpected encoding for object %d, %d",
obj->type, obj->encoding);
The commit improves ziplistRepr() and adds a new debugging subcommand so
that we can trigger the dump directly from the Redis API.
This command capability was used while investigating issue #3684.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:
1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.
2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.
3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.
In order to obtain this, the following main changes were operated:
* Slaves also take a replication backlog, not just masters.
* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).
* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).
* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.
* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.
* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.
* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.
An old design document is available here:
https://gist.github.com/antirez/ae068f95c0d084891305
However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
The old test, designed to do a transformation on the bits that was
invertible, in order to avoid touching the original memory content, was
not effective as it was redis-server --test-memory. The former often
reported OK while the latter was able to spot the error.
So the test was substituted with one that may perform better, however
the new one must backup the memory tested, so it tests memory in small
pieces. This limits the effectiveness because of the CPU caches. However
some attempt is made in order to trash the CPU cache between the fill
and the check stages, but not for the addressing test unfortunately.
We'll see if this test will be able to find errors where the old failed.
There are some cases of printing unsigned integer with %d conversion
specificator and vice versa (signed integer with %u specificator).
Patch by Sergey Polovko. Backported to Redis from Disque.
Currently this feature is only accessible via DEBUG for testing, since
otherwise depending on the instance configuration a given script works
or is broken, which is against the Redis philosophy.
The command reports information about the hash table internal state
representing the specified database ID.
This can be used in order to investigate rehashings, memory usage issues
and for other debugging purposes.
It's possible large objects could be larger than 'int', so let's
upgrade all size counters to ssize_t.
This also fixes rdbSaveObject serialized bytes calculation.
Since entire serializations of data structures can be large,
so we don't want to limit their calculated size to a 32 bit signed max.
This commit increases object size calculation and
cascades the change back up to serializedlength printing.
Before:
127.0.0.1:6379> debug object hihihi
... encoding:quicklist serializedlength:-2147483559 ...
After:
127.0.0.1:6379> debug object hihihi
... encoding:quicklist serializedlength:2147483737 ...
Adds: ql_compressed (boolean, 1 if compression enabled for list, 0
otherwise)
Adds: ql_uncompressed_size (actual uncompressed size of all quicklistNodes)
Adds: ql_ziplist_max (quicklist max ziplist fill factor)
Compression ratio of the list is then ql_uncompressed_size / serializedlength
We report ql_uncompressed_size for all quicklists because serializedlength
is a _compressed_ representation anyway.
Sample output from a large list:
127.0.0.1:6379> llen abc
(integer) 38370061
127.0.0.1:6379> debug object abc
Value at:0x7ff97b51d140 refcount:1 encoding:quicklist serializedlength:19878335 lru:9718164 lru_seconds_idle:5 ql_nodes:21945 ql_avg_node:1748.46 ql_ziplist_max:-2 ql_compressed:0 ql_uncompressed_size:1643187761
(1.36s)
The 1.36s result time is because rdbSavedObjectLen() is serializing the
object, not because of any new stats reporting.
If we run DEBUG OBJECT on a compressed list, DEBUG OBJECT takes almost *zero*
time because rdbSavedObjectLen() reuses already-compressed ziplists:
127.0.0.1:6379> debug object abc
Value at:0x7fe5c5800040 refcount:1 encoding:quicklist serializedlength:19878335 lru:9718109 lru_seconds_idle:5 ql_nodes:21945 ql_avg_node:1748.46 ql_ziplist_max:-2 ql_compressed:1 ql_uncompressed_size:1643187761
Let user set how many nodes to *not* compress.
We can specify a compression "depth" of how many nodes
to leave uncompressed on each end of the quicklist.
Depth 0 = disable compression.
Depth 1 = only leave head/tail uncompressed.
- (read as: "skip 1 node on each end of the list before compressing")
Depth 2 = leave head, head->next, tail->prev, tail uncompressed.
- ("skip 2 nodes on each end of the list before compressing")
Depth 3 = Depth 2 + head->next->next + tail->prev->prev
- ("skip 3 nodes...")
etc.
This also:
- updates RDB storage to use native quicklist compression (if node is
already compressed) instead of uncompressing, generating the RDB string,
then re-compressing the quicklist node.
- internalizes the "fill" parameter for the quicklist so we don't
need to pass it to _every_ function. Now it's just a property of
the list.
- allows a runtime-configurable compression option, so we can
expose a compresion parameter in the configuration file if people
want to trade slight request-per-second performance for up to 90%+
memory savings in some situations.
- updates the quicklist tests to do multiple passes: 200k+ tests now.
Added field 'ql_nodes' and 'ql_avg_per_node'.
ql_nodes is the number of quicklist nodes in the quicklist.
ql_avg_node is the average fill level in each quicklist node. (LLEN / QL_NODES)
Sample output:
127.0.0.1:6379> DEBUG object b
Value at:0x7fa42bf2fed0 refcount:1 encoding:quicklist serializedlength:18489 lru:8983768 lru_seconds_idle:3 ql_nodes:430 ql_avg_per_node:511.73
127.0.0.1:6379> llen b
(integer) 220044
Slaves key expire is orchestrated by the master. Sometimes the master
will send the synthesized DEL to expire keys on the slave with a non
trivial delay (when the key is not accessed, only the incremental expiry
algorithm will expire it in background).
During that time, a key is logically expired, but slaves still return
the key if you GET (or whatever) it. This is a bad behavior.
However we can't simply trust the slave view of the key, since we need
the master to be able to send write commands to update the slave data
set, and DELs should only happen when the key is expired in the master
in order to ensure consistency.
However 99.99% of the issues with this behavior is when a client which
is not a master sends a read only command. In this case we are safe and
can consider the key as non existing.
This commit does a few changes in order to make this sane:
1. lookupKeyRead() is modified in order to return NULL if the above
conditions are met.
2. Calls to lookupKeyRead() in commands actually writing to the data set
are repliaced with calls to lookupKeyWrite().
There are redundand checks, so for example, if in "2" something was
overlooked, we should be still safe, since anyway, when the master
writes the behavior is to don't care about what expireIfneeded()
returns.
This commit is related to #1768, #1770, #2131.