# Overview
Users utilize the `FLUSHDB SYNC` and `FLUSHALL SYNC` commands for a variety of
reasons. The main issue with this command is that if the database becomes
substantial in size, the server will be unresponsive for an extended period.
Other than freezing application traffic, this may also lead some clients making
incorrect judgments about the server's availability. For instance, a watchdog may
erroneously decide to terminate the process, resulting in potential adverse
outcomes. While a `FLUSH* ASYNC` can address these issues, it might not be used
for two reasons: firstly, it's not the default, and secondly, in some cases, the
client issuing the flush wants to wait for its completion before repopulating the
database.
Between the option of triggering FLUSH* asynchronously in the background without
indication for completion versus running it synchronously in the foreground by
the main thread, there is another more appealing option. We can block the
client that requested the flush, execute the flush command in the background, and
once done, unblock the client and return notification for completion. This approach
ensures the server remains responsive to other clients, and the blocked client
receives the expected response only after the flush operation has been successfully
carried out.
# Implementation details
Instead of defining yet another flavor to the flush command, we can modify
`FLUSHALL SYNC` and `FLUSHDB SYNC` always run in this new mode.
## Extending BIO Threads capabilities
Today jobs that are carried out by BIO threads don't have the capability to
indicate completion to the main thread. We can add this infrastructure by having
an additional dummy job, coined as completion-job, that eventually will be written
by BIO threads to a response-queue. The main thread will take care to consume items
from the response-queue and call the provided callback function of each
completion-job.
## FLUSH* SYNC to run as blocking ASYNC
Command `FLUSH* SYNC` will be modified to create one or more async jobs to flush
DB(s) and afterward will push additional completion-job request. By sending the
completion job request only at the end, the main thread will be called back only
after all the preceding jobs completed their task in the background. During that
time, the client of the command is suspended and marked as `BLOCKED_LAZYFREE`
whereas any other client will be able to communicate with the server without any
issue.
The External tests started failing recently for unclear reason:
```
*** [err]: Tracking invalidation message of eviction keys should be before response in tests/unit/tracking.tcl
Expected '0' to be equal to 'invalidate volatile-key' (context: type eval line 21 cmd {assert_equal $res {invalidate volatile-key}} proc ::test)
```
I suspect the issue is that the used_memory sample is taken while a lazy free is still being processed.
i've seen this CI failure a couple of times on MacOS:
*** [err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
lazyfree isn't done
only reason i can think of is that 500ms is sometimes not enough on slow systems.
This commit revives the improves the ability to run the test suite against
external servers, instead of launching and managing `redis-server` processes as
part of the test fixture.
This capability existed in the past, using the `--host` and `--port` options.
However, it was quite limited and mostly useful when running a specific tests.
Attempting to run larger chunks of the test suite experienced many issues:
* Many tests depend on being able to start and control `redis-server` themselves,
and there's no clear distinction between external server compatible and other
tests.
* Cluster mode is not supported (resulting with `CROSSSLOT` errors).
This PR cleans up many things and makes it possible to run the entire test suite
against an external server. It also provides more fine grained controls to
handle cases where the external server supports a subset of the Redis commands,
limited number of databases, cluster mode, etc.
The tests directory now contains a `README.md` file that describes how this
works.
This commit also includes additional cleanups and fixes:
* Tests can now be tagged.
* Tag-based selection is now unified across `start_server`, `tags` and `test`.
* More information is provided about skipped or ignored tests.
* Repeated patterns in tests have been extracted to common procedures, both at a
global level and on a per-test file basis.
* Cleaned up some cases where test setup was based on a previous test executing
(a major anti-pattern that repeats itself in many places).
* Cleaned up some cases where test teardown was not part of a test (in the
future we should have dedicated teardown code that executes even when tests
fail).
* Fixed some tests that were flaky running on external servers.
I recently saw this failure:
[err]: lazy free a stream with all types of metadata in tests/unit/lazyfree.tcl
Expected '2' to be equal to '1' (context: type eval line 23 cmd {assert_equal [s lazyfreed_objects] 1} proc ::test)
The only explanation for such a thing is that the async flushdb wasn't
done before we did the resetstat