217 lines
9.3 KiB
HTML
217 lines
9.3 KiB
HTML
<html>
|
|
<head>
|
|
<title>Dalvik Bytecode Verifier Notes</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Dalvik Bytecode Verifier Notes</h1>
|
|
|
|
<p>
|
|
The bytecode verifier in the Dalvik VM attempts to provide the same sorts
|
|
of checks and guarantees that other popular virtual machines do. We
|
|
perform generally the same set of checks as are described in _The Java
|
|
Virtual Machine Specification, Second Edition_, including the updates
|
|
planned for the Third Edition.
|
|
|
|
<p>
|
|
Verification can be enabled for all classes, disabled for all, or enabled
|
|
only for "remote" (non-bootstrap) classes. It should be performed for any
|
|
class that will be processed with the DEX optimizer, and in fact the
|
|
default VM behavior is to only optimize verified classes.
|
|
|
|
|
|
<h2>Why Verify?</h2>
|
|
|
|
<p>
|
|
The verification process adds additional time to the build and to
|
|
the installation of new applications. It's fairly quick for app-sized
|
|
DEX files, but rather slow for the big "core" and "framework" files.
|
|
Why do it all, when our system relies on UNIX processes for security?
|
|
<p>
|
|
<ol>
|
|
<li>Optimizations. The interpreter can ignore a lot of potential
|
|
error cases because the verifier guarantees that they are impossible.
|
|
Also, we can optimize the DEX file more aggressively if we start
|
|
with a stronger set of assumptions about the bytecode.
|
|
<li>"Precise" GC. The work peformed during verification has significant
|
|
overlap with the work required to compute register use maps for
|
|
type-precise GC.
|
|
<li>Intra-application security. If an app wants to download bits
|
|
of interpreted code over the network and execute them, it can safely
|
|
do so using well-established security mechanisms.
|
|
<li>3rd party app failure analysis. We have no way to control the
|
|
tools and post-processing utilities that external developers employ,
|
|
so when we get bug reports with a weird exception or native crash
|
|
it's very helpful to start with the assumption that the bytecode
|
|
is valid.
|
|
</ol>
|
|
<p>
|
|
It's also a convenient framework to deal with certain situations, notably
|
|
replacement of instructions that access volatile 64-bit fields with
|
|
more rigorous versions that guarantee atomicity.
|
|
|
|
|
|
<h2>Verifier Differences</h2>
|
|
|
|
<p>
|
|
There are a few checks that the Dalvik bytecode verifier does not perform,
|
|
because they're not relevant. For example:
|
|
<ul>
|
|
<li>Type restrictions on constant pool references are not enforced,
|
|
because Dalvik does not have a pool of typed constants. (Dalvik
|
|
uses a simple index into type-specific pools.)
|
|
<li>Verification of the operand stack size is not performed, because
|
|
Dalvik does not have an operand stack.
|
|
<li>Limitations on <code>jsr</code> and <code>ret</code> do not apply,
|
|
because Dalvik doesn't support subroutines.
|
|
</ul>
|
|
|
|
In some cases they are implemented differently, e.g.:
|
|
<ul>
|
|
<li>In a conventional VM, backward branches and exceptions are
|
|
forbidden when a local variable holds an uninitialized reference. The
|
|
restriction was changed to mark registers as invalid when they hold
|
|
references to the uninitialized result of a previous invocation of the
|
|
same <code>new-instance</code> instruction.
|
|
This solves the same problem -- trickery potentially allowing
|
|
uninitialized objects to slip past the verifier -- without unduly
|
|
limiting branches.
|
|
</ul>
|
|
|
|
There are also some new ones, such as:
|
|
<ul>
|
|
<li>The <code>move-exception</code> instruction can only appear as
|
|
the first instruction in an exception handler.
|
|
<li>The <code>move-result*</code> instructions can only appear
|
|
immediately after an appropriate <code>invoke-*</code>
|
|
or <code>filled-new-array</code> instruction.
|
|
</ul>
|
|
|
|
<p>
|
|
The VM is permitted but not required to enforce "structured locking"
|
|
constraints, which are designed to ensure that, when a method returns, all
|
|
monitors locked by the method have been unlocked an equal number of times.
|
|
This is not currently implemented.
|
|
|
|
<p>
|
|
The Dalvik verifier is more restrictive than other VMs in one area:
|
|
type safety on sub-32-bit integer widths. These additional restrictions
|
|
should make it impossible to, say, pass a value outside the range
|
|
[-128, 127] to a function that takes a <code>byte</code> as an argument.
|
|
|
|
|
|
<h2>Monitor Verification</h2>
|
|
|
|
<p>
|
|
If a method locks an object with a <code>synchronized</code> statement, the
|
|
object must be unlocked before the method returns. At the bytecode level,
|
|
this means the method must execute a matching <code>monitor-exit</code>
|
|
for every <code>monitor-enter</code> instruction, whether the function
|
|
completes normally or abnormally. The bytecode verifier optionally
|
|
enforces this.
|
|
|
|
<p>
|
|
The verifier uses a fairly simple-minded model. If you enter a monitor
|
|
held in register N, you can exit the monitor using register N or any
|
|
subsequently-made copies of register N. The verifier does not attempt
|
|
to identify previously-made copies, track loads and stores through
|
|
fields, or recognize identical constant values (for example, the result
|
|
values from two <code>const-class</code> instructions on the same class
|
|
will be the same reference, but the verifier doesn't recognize this).
|
|
|
|
<p>
|
|
Further, you may only exit the monitor most recently entered. "Hand
|
|
over hand" locking techniques, e.g. "lock A; lock B; unlock A; unlock B",
|
|
are not allowed.
|
|
|
|
<p>
|
|
This means that there are a number of situations in which the verifier
|
|
will throw an exception on code that would execute correctly at run time.
|
|
This is not expected to be an issue for compiler-generated bytecode.
|
|
|
|
<p>
|
|
For implementation convenience, the maximum nesting depth of
|
|
<code>synchronized</code> statements has been set to 32. This is not
|
|
a limitation on the recursion count. The only way to trip this would be
|
|
to have a single method with more than 32 nested <code>synchronized</code>
|
|
statements, something that is unlikely to occur.
|
|
|
|
|
|
<h2>Verification Failures</h2>
|
|
|
|
<p>
|
|
The verifier may reject a class immediately, or it may defer throwing
|
|
an exception until the code is actually used. For example, if a class
|
|
attempts to perform an illegal access on a field, the VM should throw
|
|
an IllegalAccessError the first time the instruction is encountered.
|
|
On the other hand, if a class contains an invalid bytecode, it should be
|
|
rejected immediately with a VerifyError.
|
|
|
|
<p>
|
|
Immediate VerifyErrors are accompanied by detailed, if somewhat cryptic,
|
|
information in the log file. From this it's possible to determine the
|
|
exact instruction that failed, and the reason for the failure.
|
|
|
|
<p>
|
|
It's a bit tricky to implement deferred verification errors in Dalvik.
|
|
A few approaches were considered:
|
|
|
|
<ol>
|
|
<li>We could replace the invalid field access instruction with a special
|
|
instruction that generates an illegal access error, and allow class
|
|
verification to complete successfully. This type of verification must
|
|
be deferred to first class load, rather than be performed ahead of time
|
|
during DEX optimization, because some failures will depend on the current
|
|
execution environment (e.g. not all classes are available at dexopt time).
|
|
At that point the bytecode instructions are mapped read-only during
|
|
verification, so rewriting them isn't possible.
|
|
</li>
|
|
|
|
<li>We can perform the access checks when the field/method/class is
|
|
resolved. In a typical VM implementation we would do the check when the
|
|
entry is resolved in the context of the current classfile, but our DEX
|
|
files combine multiple classfiles together, merging the field/method/class
|
|
resolution results into a single large table. Once one class successfully
|
|
resolves the field, every other class in the same DEX file would be able
|
|
to access the field. This is incorrect.
|
|
</li>
|
|
|
|
<li>Perform the access checks on every field/method/class access.
|
|
This adds significant overhead. This is mitigated somewhat by the DEX
|
|
optimizer, which will convert many field/method/class accesses into a
|
|
simpler form after performing the access check. However, not all accesses
|
|
can be optimized (e.g. accesses to classes unknown at dexopt time),
|
|
and we don't currently have an optimized form of certain instructions
|
|
(notably static field operations).
|
|
</li>
|
|
</ol>
|
|
|
|
<p>
|
|
In early versions of Dalvik (as found in Android 1.6 and earlier), the verifier
|
|
simply regarded all problems as immediately fatal. This generally worked,
|
|
but in some cases the VM was rejecting classes because of bits of code
|
|
that were never used. The VerifyError itself was sometimes difficult to
|
|
decipher, because it was thrown during verification rather than at the
|
|
point where the problem was first noticed during execution.
|
|
<p>
|
|
The current version uses a variation of approach #1. The dexopt
|
|
command works the way it did before, leaving the code untouched and
|
|
flagging fully-correct classes as "pre-verified". When the VM loads a
|
|
class that didn't pass pre-verification, the verifier is invoked. If a
|
|
"deferrable" problem is detected, a modifiable copy of the instructions
|
|
in the problematic method is made. In that copy, the troubled instruction
|
|
is replaced with an "always throw" opcode, and verification continues.
|
|
|
|
<p>
|
|
In the example used earlier, an attempt to read from an inaccessible
|
|
field would result in the "field get" instruction being replaced by
|
|
"always throw IllegalAccessError on field X". Creating copies of method
|
|
bodies requires additional heap space, but since this affects very few
|
|
methods overall the memory impact should be minor.
|
|
|
|
<p>
|
|
<address>Copyright © 2008 The Android Open Source Project</address>
|
|
|
|
</body>
|
|
</html>
|