8.3 KiB
Mechanisms for Coordination Between Garbage Collector and Mutator
Most garbage collection work can proceed concurrently with the client or mutator Java threads. But in certain places, for example while tracing from thread stacks, the garbage collector needs to ensure that Java data processed by the collector is consistent and complete. At these points, the mutators should not hold references to the heap that are invisible to the garbage collector. And they should not be modifying the data that is visible to the collector.
Logically, the collector and mutator share a reader-writer lock on the Java heap and associated data structures. Mutators hold the lock in reader or shared mode while running Java code or touching heap-related data structures. The collector holds the lock in writer or exclusive mode while it needs the heap data structures to be stable. However, this reader-writer lock has a very customized implementation that also provides additional facilities, such as the ability to exclude only a single thread, so that we can specifically examine its heap references.
In order to ensure consistency of the Java data, the compiler inserts "suspend points", sometimes also called "safe points" into the code. These allow a thread to respond to external requests.
Whenever a thread is runnable, i.e. whenever a thread logically holds the mutator lock in shared mode, it is expected to regularly execute such a suspend point, and check for pending requests. They are currently implemented by setting a flag in the thread structure1, which is then explicitly tested by the compiler-generated code.
A thread responds to suspend requests only when it is "runnable", i.e. logically running Java code. When it runs native code, or is blocked in a kernel call, it logically releases the mutator lock. When the garbage collector needs mutator cooperation, and the thread is not runnable, it is assured that the mutator is not touching Java data, and hence the collector can safely perform the required action itself, on the mutator thread's behalf.
Normally, when a thread makes a JNI call, it is not considered runnable while
executing native code. This makes the transitions to and from running native JNI
code somewhat expensive (see below). But these transitions are necessary to
ensure that such code, which does not execute "suspend points", and can thus not
cooperate with the GC, doesn't delay GC completion. @FastNative
and
@CriticalNative
calls avoid these transitions, instead allowing the thread to
remain "runnable", at the expense of potentially delaying GC operations for the
duration of the call.
Although we say that a thread is "suspended" when it is not running Java code, it may in fact still be running native code and touching data structures that are not considered "Java data". This distinction can be a fine line. For example, a Java thread blocked on a Java monitor will normally be "suspended" and blocked on a mutex contained in the monitor data structure. But it may wake up for reasons beyond ARTs control, which will normally result in touching the mutex. The monitor code must be quite careful to ensure that this does not cause problems, especially if the ART runtime was shut down in the interim and the monitor data structure has been reclaimed.
Calls to change thread state
When a thread changes between running Java and native code, it has to
correspondingly change its state between "runnable" and one of several
other states, all of which are considered to be "suspended" for our purposes.
When a Java thread starts to execute native code, and may thus not respond
promptly to suspend requests, it will normally create an object of type
ScopedThreadSuspension
. ScopedThreadSuspension
's constructor changes state to
the "suspended" state given as an argument, logically releasing the mutator lock
and promising to no longer touch Java data structures. It also handles any
pending suspension requests that slid in just before it changed state.
Conversely, ScopedThreadSuspension
's destructor waits until the GC has finished
any actions it is currently performing on the thread's behalf and effectively
released the mutator exclusive lock, and then returns to runnable state,
re-acquiring the mutator lock.
Occasionally a thread running native code needs to temporarily again access Java
data structures, performing the above transitions in the opposite order.
ScopedObjectAccess
is a similar RAII object whose constructor and destructor
perform those transitions in the reverse order from ScopedThreadSuspension
.
Mutator lock implementation
The mutator lock is not implemented as a conventional mutex. But it plays by the
rules of our normal static thread-safety analysis. Thus a function that is
expected to be called in runnable state, with the ability to access Java data,
should be annotated with REQUIRES_SHARED(Locks::mutator_lock_)
.
There is an explicit mutator_lock_
object, of type MutatorMutex
. MutatorMutex
is
seemingly a minor refinement of ReaderWriterMutex
, but it is used entirely
differently. It is acquired explicitly by clients that need to hold it
exclusively, and in a small number of cases, it is acquired in shared mode, e.g.
via SharedTryLock()
, or by the GC itself. However, more commonly
MutatorMutex::TransitionFromSuspendedToRunnable()
, is used to logically acquire
the mutator mutex, e.g. as part of ScopedObjectAccess
construction.
TransitionFromSuspendedToRunnable()
does not physically acquire the
ReaderWriterMutex
in shared mode. Thus any thread acquiring the lock in exclusive mode
must, in addition, explicitly arrange for mutator threads to be suspended via the
thread suspension mechanism, and then make them runnable again on release.
Logically the mutator lock is held in shared/reader mode if either the underlying reader-writer lock is held in shared mode, or if a mutator is in runnable state.
Suspension and checkpoint API
Suspend point checks enable three kinds of communication with mutator threads:
- Checkpoints
- Checkpoint requests are used to get a thread to perform an action
on our behalf.
RequestCheckpoint()
asks a specific thread to execute the closure supplied as an argument at its leisure.RequestSynchronousCheckpoint()
in addition waits for the thread to complete running the closure, and handles suspended threads by running the closure on their behalf. In addition to these functions provided byThread
,ThreadList
provides theRunCheckpoint()
function that runs a checkpoint function on behalf of each thread, either by usingRequestCheckpoint()
to run it inside a running thread, or by ensuring that a suspended thread stays suspended, and then running the function on its behalf.RunCheckpoint()
does not wait for completion of the function calls triggered by the resultingRequestCheckpoint()
invocations. - Empty Checkpoints
- ThreadList provides
RunEmptyCheckpoint()
, which waits until all threads have either passed a suspend point, or have been suspended. This ensures that no thread is still executing Java code inside the same suspend-point-delimited code interval it was executing before the call. For example, a read-barrier started before aRunEmptyCheckpoint()
call will have finished before the call returns. - Thread suspension
- ThreadList provides a number of
SuspendThread...()
calls and aSuspendAll()
call to suspend one or all threads until they are resumed byResume()
orResumeAll()
. TheSuspend...
calls guarantee that the target thread(s) are suspended (again, only in the sense of not running Java code) when the call returns.
-
Some comments in the code refer to a not-yet-really-implemented scheme in which the compiler-generated code would load through the address at
tlsPtr_.suspend_trigger
. A thread suspension is requested by setting this to null, triggering aSIGSEGV
, causing that thread to check for GC cooperation requests. The real mechanism instead sets an appropriateThreadFlag
entry to request suspension or a checkpoint. Note that the actual checkpoint function value is set, along with the flag, while holdingsuspend_count_lock_
. If the target thread notices that a checkpoint is requested, it then acquires thesuspend_count_lock_
to read the checkpoint function. ↩︎