Commit Graph

3723 Commits

Author SHA1 Message Date
antirez e88e6a6334 Cluster: use long long for timestamps in clusterGenNodesDescription().
Ping sent and pong received fields need to be casted to long long to be
printed correctly into 32 bit systems.
2013-12-17 09:38:11 +01:00
antirez 2dfc5e35a9 Makefile.dep updated. 2013-12-13 13:10:05 +01:00
antirez b1ba58f341 SDIFF iterator misuse bug regression test added.
See commit c00453d for more info about the bug.
2013-12-13 11:37:13 +01:00
antirez c00453da1d SDIFF iterator misuse fixed in diff algorithm #1.
The bug could be easily triggered by:

    SADD foo a b c 1 2 3 4 5 6
    SDIFF foo foo

When the key was the same in two sets, an unsafe iterator was used to
check existence of elements in the same set we were iterating.
Usually this would just result into a wrong output, however with the
dict.c API misuse protection we have in place, the result was actually
an assertion failed that was triggered by the CI test, while creating
random datasets for the "MASTER and SLAVE consistency" test.
2013-12-13 11:34:21 +01:00
antirez 5320148883 Sentinel: dead code removed. 2013-12-13 11:01:13 +01:00
antirez 452dea30f6 Makefile: remove odd syntax not compatible with some make versions.
See issue #1448.
2013-12-12 15:19:39 +01:00
Salvatore Sanfilippo 62e4956936 Merge pull request #1415 from Dieken/fix-typo
fix typo in redis.conf and sentinel.conf
2013-12-12 02:30:11 -08:00
Salvatore Sanfilippo a99c751d6c Merge pull request #1460 from codeeply/simplify2
comment mistake fixed
2013-12-12 02:23:44 -08:00
codeeply 0f06f8df07 comment mistake fixed 2013-12-12 16:33:29 +08:00
antirez a5ec247f13 Replication: publish the slave_repl_offset when disconnected from master.
When a slave was disconnected from its master the replication offset was
reported as -1. Now it is reported as the replication offset of the
previous master, so that failover can be performed using this value in
order to try to select a slave with more processed data from a set of
slaves of the old master.
2013-12-11 15:23:15 +01:00
Salvatore Sanfilippo 0a89d9a0b1 Merge pull request #1451 from yossigo/unbalanced-quotes-fix
Return proper error on requests with an unbalanced number of quotes.
2013-12-11 03:06:18 -08:00
Yossi Gottlieb 88a5cede88 Fix wrong repldboff type which causes dropped replication in rare cases. 2013-12-11 11:38:02 +01:00
Yubao Liu 6d5fa2e06c fix typo in redis.conf and sentinel.conf 2013-12-11 15:46:42 +08:00
antirez 11120689c4 Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:47:31 +01:00
antirez 2eb781b35b dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:46:24 +01:00
antirez 2c4ab8a534 Log empty DB + Loading data into two separated messages. 2013-12-10 18:43:25 +01:00
antirez 7c531eb5ad Don't send more than 1 newline/sec while loading RDB. 2013-12-10 18:43:19 +01:00
antirez 27db38d069 Slaves heartbeat while loading RDB files.
Starting with Redis 2.8 masters are able to detect timed out slaves,
while before 2.8 only slaves were able to detect a timed out master.

Now that timeout detection is bi-directional the following problem
happens as described "in the field" by issue #1449:

1) Master and slave setup with big dataset.
2) Slave performs the first synchronization, or a full sync
   after a failed partial resync.
3) Master sends the RDB payload to the slave.
4) Slave loads this payload.
5) Master detects the slave as timed out since does not receive back the
   REPLCONF ACK acknowledges.

Here the problem is that the master has no way to know how much the
slave will take to load the RDB file in memory. The obvious solution is
to use a greater replication timeout setting, but this is a shame since
for the 0.1% of operation time we are forced to use a timeout that is
not what is suited for 99.9% of operation time.

This commit tries to fix this problem with a solution that is a bit of
an hack, but that modifies little of the replication internals, in order
to be back ported to 2.8 safely.

During the RDB loading time, we send the master newlines to avoid
being sensed as timed out. This is the same that the master already does
while saving the RDB file to still signal its presence to the slave.

The single newline is used because:

1) It can't desync the protocol, as it is only transmitted all or
nothing.
2) It can be safely sent while we don't have a client structure for the
master or in similar situations just with write(2).
2013-12-09 20:26:00 +01:00
antirez eaf1bfb88b Handle inline requested terminated with just \n. 2013-12-09 13:28:39 +01:00
Yossi Gottlieb 6e70c01148 Return proper error on requests with an unbalanced number of quotes. 2013-12-08 12:58:12 +02:00
antirez c590549e40 Sentinel: fix reported role info sampling.
The way the role change was recoded was not sane and too much
convoluted, causing the role information to be not always updated.

This commit fixes issue #1445.
2013-12-06 12:46:56 +01:00
antirez 2b414a4b5f Sentinel: fix reported role fields when master is reset.
When there is a master address switch, the reported role must be set to
master so that we have a chance to re-sample the INFO output to check if
the new address is reporting the right role.

Otherwise if the role was wrong, it will be sensed as wrong even after
the address switch, and for enough time according to the role change
time, for Sentinel consider the master SDOWN.

This fixes isue #1446, that describes the effects of this bug in
practice.
2013-12-06 11:37:46 +01:00
antirez 8534a290d3 Fixed typo in redis.conf. 2013-12-06 10:48:46 +01:00
Salvatore Sanfilippo 2ef57f8d47 Merge pull request #1439 from AnuragRamdasan/patch-3
Grammar fix.
2013-12-05 09:53:45 -08:00
Anurag Ramdasan 839ed7a60b Grammar fix. 2013-12-05 23:15:47 +05:30
Salvatore Sanfilippo 026e561446 Merge pull request #1438 from AnuragRamdasan/patch-2
fixed typo
2013-12-05 08:18:20 -08:00
Anurag Ramdasan fb6b9b14bd fixed typo 2013-12-05 21:47:17 +05:30
Salvatore Sanfilippo cbaad0b26f Merge pull request #1437 from AnuragRamdasan/patch-1
Fixed grammar: 'usually' to 'usual'
2013-12-05 07:42:05 -08:00
Anurag Ramdasan 74431b80a3 Fixed grammar: 'usually' to 'usual' 2013-12-05 21:09:31 +05:30
antirez 11e81a1e9a Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez 74da4a574f Fixed typos in redis.conf file. 2013-12-05 16:28:35 +01:00
antirez 58713c6b13 Fix clients timeout handling.
During the refactoring of blocking operations, commit
82b672f633, a bug was introduced where
a milliseconds time is compared to a seconds time, so all the clients
always appear to timeout if timeout is set to non-zero value.

Thanks to Jonathan Leibiusky for finding the bug and helping verifying
the cause and fix.
2013-12-05 14:55:07 +01:00
antirez c5618e7fdd WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez c2f305545a blocked.c API commented. 2013-12-03 18:03:15 +01:00
antirez 82b672f633 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez 2e027c48e5 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:06 +01:00
antirez e4025ea926 Grammar fix in freeClient(). 2013-12-03 13:40:41 +01:00
antirez f80cf7363a Sentinel: don't write HZ when flushing config.
See issue #1419.
2013-12-02 15:56:10 +01:00
antirez dffebbc904 Sentinel: better time desynchronization.
Sentinels are now desynchronized in a better way changing the time
handler frequency between 10 and 20 HZ. This way on average a
desynchronization of 25 milliesconds is produced that should be larger
enough compared to network latency, avoiding most split-brain condition
during the vote.

Now that the clocks are desynchronized, to have larger random delays when
performing operations can be easily achieved in the following way.
Take as example the function that starts the failover, that is
called with a frequency between 10 and 20 HZ and will start the
failover every time there are the conditions. By just adding as an
additional condition something like rand()%4 == 0, we can amplify the
desynchronization between Sentinel instances easily.

See issue #1419.
2013-12-02 12:29:42 +01:00
antirez 6fa42b7507 Cluster: nodes re-addition blacklist API. 2013-12-02 11:12:23 +01:00
antirez 8f18345ef0 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez 3db825fde4 Cluster: some code about clusterHandleSlaveFailover() marginally improved.
80 cols friendly, some minor change to the code to make it simpler.
2013-11-29 16:17:05 +01:00
antirez 55f90b11c9 Stop writes on MISCONF only if instance is a master.
From the point of view of the slave not accepting writes from the master
can only create a bigger consistency issue.
2013-11-28 16:29:26 +01:00
antirez 60817bb262 Reply to PING with error when there is a MISCONF state. 2013-11-28 16:17:10 +01:00
antirez 0addf8aff1 Sentinel: log vote received from other Sentinels. 2013-11-28 15:23:46 +01:00
Salvatore Sanfilippo c7136269c6 Merge pull request #1401 from huangz1990/fix_sentinel_bug
fix a bug in sentinel.c about pub/sub link
2013-11-26 06:15:25 -08:00
huangz1990 86a540a66e fix a bug in sentinel.c about pub/sub link 2013-11-26 19:55:51 +08:00
antirez 6f4fd55762 Sentinel: fixes inverted strcmp() test preventing config updates.
The result of this one-char bug was pretty serious, if the new master
had the same port of the previous master, but just a different IP
address, non-leader Sentinels would not be able to recognize the
configuration change.

This commit fixes issue #1394.

Many thanks to @shanemadden that reported the bug and helped
investigating it.
2013-11-25 10:59:53 +01:00
antirez 8d547ebd56 Sentinel: fix type specifier for Hello msg generation.
This fixes issue #1395.
2013-11-25 10:24:34 +01:00
antirez fcebd9b0f9 Fix false positive in memory efficiency test.
Fixes issue #1298.
2013-11-25 10:21:46 +01:00