Solve a race between BGSAVE and FLUSHALL messing up the dirty counter (#13361)

If we run FLUSHALL when the 'save' config is set, and there's a fork
child ding BGSAVE, there's a chance the child is already finished, and
the parent process is unaware of it. in that case the child will not get
the kill signal and will finish successfully, but the parent process
thinks it killed it and will reset the dirty counter to 0, then the
backgroundSaveDoneHandlerDisk method can set the dirty counter to a
negative value.
This commit is contained in:
Oran Agra 2024-07-01 09:04:52 +03:00 committed by GitHub
parent 69b7137d32
commit 799c5e5f51
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 25 additions and 1 deletions

View File

@ -3765,6 +3765,12 @@ void killRDBChild(void) {
* This includes:
* - resetChildState
* - rdbRemoveTempFile */
/* However, there's a chance the child already exited, and will not receive the signal,
* in that case it could have been resulsted in success and the done handler will override some
* server metrics (e.g. the dirty counter) which it shouldn't (e.g. in case of FLUSHALL).
* so we just for completion once (don't wait for it). */
checkChildrenDone();
}
/* Spawn an RDB child that writes the RDB to the sockets of the slaves

View File

@ -1576,7 +1576,8 @@ void rdbPipeReadHandler(struct aeEventLoop *eventLoop, int fd, void *clientData,
}
/* Remove the pipe read handler if at least one write handler was set. */
if (server.rdb_pipe_numconns_writing || stillAlive == 0) {
aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE);
if (server.rdb_pipe_read != -1)
aeDeleteFileEvent(server.el, server.rdb_pipe_read, AE_READABLE);
break;
}
}
@ -1599,6 +1600,9 @@ void updateSlavesWaitingBgsave(int bgsaveerr, int type) {
while((ln = listNext(&li))) {
client *slave = ln->value;
/* We can get here via freeClient()->killRDBChild()->checkChildrenDone(). skip disconnected slaves. */
if (!slave->conn) continue;
if (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_END) {
struct redis_stat buf;

View File

@ -75,6 +75,20 @@ start_server {tags {"other"}} {
r flushall
assert_equal [s rdb_changes_since_last_save] 0
}
test {FLUSHALL and bgsave} {
r config set save "3600 1 300 100 60 10000"
r set x y
r bgsave
r set x y
r multi
r debug sleep 1
# by the time we'll get to run flushall, the child will finish,
# but the parent will be unaware of it, and it could wrongly set the dirty counter.
r flushall
r exec
assert_equal [s rdb_changes_since_last_save] 0
}
}
test {BGSAVE} {