Commit Graph

4053 Commits

Author SHA1 Message Date
antirez e257ab2bfe Sentinel: sentinelFlushConfig() to CONFIG REWRITE + fsync. 2013-11-19 10:13:04 +01:00
antirez 5998769c28 Sentinel: CONFIG REWRITE support for Sentinel config. 2013-11-19 09:48:12 +01:00
antirez 47df12d5d9 Sentinel: can-failover option removed, many comments fixed. 2013-11-19 09:28:47 +01:00
antirez cd4ff9992b Fix typo 'configuraiton' in rewriteConfigRewriteLine() comment. 2013-11-18 18:18:10 +01:00
antirez 232cdb95ab Sentinel: added config options useful to take state on config rewrite.
We'll use CONFIG REWRITE (internally) in order to store the new
configuration of a Sentinel after the internal state changes. In order
to do so, we need configuration options (that usually the user will not
touch at all) about config epoch of the master, Sentinels and Slaves
known for this master, and so forth.
2013-11-18 16:03:03 +01:00
antirez 3a374b0511 Sentinel: failover abort function simplified. 2013-11-18 11:43:35 +01:00
antirez e0750acf11 Sentinel: slaves reconfig delay modified.
The time Sentinel waits since the slave is detected to be configured to
the wrong master, before reconfiguring it, is now the failover_timeout
time as this makes more sense in order to give the Sentinel performing
the failover enoung time to reconfigure the slaves slowly (if required
by the configuration).

Also we now PUBLISH more frequently the new configuraiton as this allows
to switch the reapprearing master back to slave faster.
2013-11-18 11:37:24 +01:00
antirez 83316f515c Sentinel: failover restart time is now multiple of failover timeout.
Also defaulf failover timeout changed to 3 minutes as the failover is a
fairly fast procedure most of the times, unless there are a very big
number of slaves and the user picked to configure them sequentially (in
that case the user should change the failover timeout accordingly).
2013-11-18 11:30:08 +01:00
antirez 3a56013acb Sentinel: state machine and timeouts simplified. 2013-11-18 11:12:58 +01:00
antirez 4be53b1c5d Sentinel: election timeout define. 2013-11-18 10:08:06 +01:00
antirez 69d826a354 Sentinel: fix address of master in Hello messages.
Once we switched configuration during a failover, we should advertise
the new address.

This was a serious race condition as the Sentinel performing the
failover for a moment advertised the old address with the new
configuration epoch: once trasmitted to the other Sentinels the broken
configuration would remain there forever, until the next failover
(because a greater configuration epoch is required to overwrite an older
one).
2013-11-14 10:25:55 +01:00
antirez e4c65e72c6 Sentinel: master address selection in get-master-address refactored. 2013-11-14 10:23:54 +01:00
antirez c0d7229364 Sentinel: fix conditional to only affect slaves with wrong master. 2013-11-14 10:23:05 +01:00
antirez dfbd9c5aeb Sentinel: simplify and refactor slave reconfig code. 2013-11-14 00:36:43 +01:00
antirez 64ad6648a8 Sentinel: reconfigure slaves to right master. 2013-11-14 00:29:38 +01:00
antirez 3e27d678da Sentinel: remember last time slave changed master. 2013-11-14 00:20:15 +01:00
antirez 8297745fa6 Sentinel: redirect-to-master is not ok with new algorithm.
Now Sentinel believe the current configuration is always the winner and
should be applied by Sentinels instead of trying to adapt our view of
the cluster based on what we observe.

So the only way to modify what a Sentinel believe to be the truth is to
win an election and advertise the new configuration via Pub / Sub with a
greater configuration epoch.
2013-11-13 17:03:48 +01:00
antirez 76a88f56e5 Sentinel: safer slave reconfig, master reported role should match. 2013-11-13 17:02:09 +01:00
antirez ddaad9fe2d Sentinel: role reporting fixed and added in SENTINEL output. 2013-11-13 16:39:57 +01:00
antirez a0afa66f4b Sentinel: being a master and reporting as slave is considered SDOWN. 2013-11-13 16:28:56 +01:00
antirez 17718fdcba Sentinel: make sure role_reported is always updated. 2013-11-13 16:21:58 +01:00
antirez 46a053d34b Sentinel: track role change time. Wait before reconfigurations. 2013-11-13 16:18:23 +01:00
antirez 9e40c46f5e Sentinel: fix no-down check in master->slave conversion code. 2013-11-13 13:43:59 +01:00
antirez ae35b7e240 Sentinel: readd slaves back after a master reset. 2013-11-13 13:01:11 +01:00
antirez 6bd4f6bffe Sentinel: sentinelResetMaster() new flag to avoid removing set of sentinels.
This commit also removes some dead code and cleanup generic flags.
2013-11-13 10:30:45 +01:00
antirez 1569af1f23 Sentinel: receive Pub/Sub messages from slaves. 2013-11-12 23:07:33 +01:00
antirez dfa5f8b777 Sentinel: change event name when converting master to slave. 2013-11-12 23:00:17 +01:00
antirez 24158d1488 Sentinel: added config-epoch to SENTINEL masters output. 2013-11-12 17:22:04 +01:00
antirez d2bc6dc39a Sentinel: new failover algo, desync slaves and update config epoch. 2013-11-12 17:07:31 +01:00
antirez 4a128b949d Sentinel: when starting failover seek for votes ASAP. 2013-11-12 16:38:02 +01:00
antirez e6b9d5e97e Sentinel: +new-epoch events. 2013-11-12 13:35:25 +01:00
antirez 54c447be52 Sentinel: wait some time between failover attempts. 2013-11-12 13:30:31 +01:00
antirez ab4b2ec88f Sentinel: allow to vote for myself. 2013-11-12 11:32:40 +01:00
antirez b6b65b29c0 Sentinel: fix PUBLISH to masters and slaves. 2013-11-12 11:12:48 +01:00
antirez 90ab62fd5e Sentinel: epoch introduced in leader vote. 2013-11-12 11:09:35 +01:00
antirez 8c1bf9a2bd Sentinel: leadership handling changes WIP.
Changes to leadership handling.

Now the leader gets selected by every Sentinel, for a specified epoch,
when the SENTINEL is-master-down-by-addr is sent.

This command now includes the runid and the currentEpoch of the instance
seeking for a vote. The Sentinel only votes a single time in a given
epoch.

Still a work in progress, does not even compile at this stage.
2013-11-11 18:30:14 +01:00
antirez 0bac36d0a1 Sentinel: handle Hello messages received via slaves correctly.
Even when messages are received via the slave, we should perform
operations (like adding a new Sentinel) in the context of the master.
2013-11-11 17:12:27 +01:00
antirez 9e1b27d49e Sentinel: remove code not useful in the new design. 2013-11-11 12:06:11 +01:00
antirez b93b0adc89 Sentinel: epoch introduced.
Sentinel state now includes the idea of current epoch and config epoch.
In the Hello message, that is now published both on masters and slaves,
a Sentinel no longer just advertises itself but also broadcasts its
current view of the configuration: the master name / ip / port and its
current epoch.

Sentinels receiving such information switch to the new master if the
configuration epoch received is newer and the ip / port of the master
are indeed different compared to the previos ones.
2013-11-11 11:05:58 +01:00
antirez b2f834390c Log to what master a slave is going to connect to. 2013-11-11 09:25:36 +01:00
antirez a5e7358a12 Cluster: removed not needed newline at end of redisLog() msg. 2013-11-08 17:28:02 +01:00
antirez 28071caf38 Cluster: send a single UPDATE packet for now. 2013-11-08 17:25:49 +01:00
antirez d289c628b1 Cluster: replace hardcoded 4096 for bus msg len with sizeof(). 2013-11-08 17:19:19 +01:00
antirez 94a07d5901 Cluster: slots update refactored + UPDATE msg processing.
Now there is a function that handles the update of the local slot
configuration every time we have some new info about a node and its set
of served slots and configEpoch.

Moreoever the UPDATE packets are now processed when received (it was a
work in progress in the previous commit).
2013-11-08 17:02:10 +01:00
antirez dc43f66eac Cluster: UPDATE msg data structure and sending function. 2013-11-08 16:26:50 +01:00
antirez 6c6572be95 Cluster: refactoring of slots update code and more.
The commit also introduces detection of nodes publishing not updated
configuration. More work in progress to send an UPDATE packet to inform
of the config change.
2013-11-08 10:32:16 +01:00
antirez d3588dc194 Fix broken rdbWriteRaw() return value check in rdb.c.
Thanks to @PhoneLi for reporting.
2013-11-07 23:53:18 +01:00
antirez d75ab87226 redis-trib: fixed slot allocation when --replicas is used. 2013-11-07 16:12:06 +01:00
antirez e1a1d904de Merge branch 'unstable' of github.com:/antirez/redis into unstable 2013-11-06 11:30:44 +01:00
antirez 80da056c29 Sentinel: sentinelSendSlaveOf() was missing a var and the prototype. 2013-11-06 11:23:53 +01:00