327 lines
15 KiB
HTML
327 lines
15 KiB
HTML
<html>
|
|
<head>
|
|
<title>Dalvik Optimization and Verification</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Dalvik Optimization and Verification With <i>dexopt</i></h1>
|
|
|
|
<p>
|
|
The Dalvik virtual machine was designed specifically for the Android
|
|
mobile platform. The target systems have little RAM, store data on slow
|
|
internal flash memory, and generally have the performance characteristics
|
|
of decade-old desktop systems. They also run Linux, which provides
|
|
virtual memory, processes and threads, and UID-based security mechanisms.
|
|
<p>
|
|
The features and limitations caused us to focus on certain goals:
|
|
|
|
<ul>
|
|
<li>Class data, notably bytecode, must be shared between multiple
|
|
processes to minimize total system memory usage.
|
|
<li>The overhead in launching a new app must be minimized to keep
|
|
the device responsive.
|
|
<li>Storing class data in individual files results in a lot of
|
|
redundancy, especially with respect to strings. To conserve disk
|
|
space we need to factor this out.
|
|
<li>Parsing class data fields adds unnecessary overhead during
|
|
class loading. Accessing data values (e.g. integers and strings)
|
|
directly as C types is better.
|
|
<li>Bytecode verification is necessary, but slow, so we want to verify
|
|
as much as possible outside app execution.
|
|
<li>Bytecode optimization (quickened instructions, method pruning) is
|
|
important for speed and battery life.
|
|
<li>For security reasons, processes may not edit shared code.
|
|
</ul>
|
|
|
|
<p>
|
|
The typical VM implementation uncompresses individual classes from a
|
|
compressed archive and stores them on the heap. This implies a separate
|
|
copy of each class in every process, and slows application startup because
|
|
the code must be uncompressed (or at least read off disk in many small
|
|
pieces). On the other hand, having the bytecode on the local heap makes
|
|
it easy to rewrite instructions on first use, facilitating a number of
|
|
different optimizations.
|
|
<p>
|
|
The goals led us to make some fundamental decisions:
|
|
|
|
<ul>
|
|
<li>Multiple classes are aggregated into a single "DEX" file.
|
|
<li>DEX files are mapped read-only and shared between processes.
|
|
<li>Byte ordering and word alignment are adjusted to suit the local
|
|
system.
|
|
<li>Bytecode verification is mandatory for all classes, but we want
|
|
to "pre-verify" whatever we can.
|
|
<li>Optimizations that require rewriting bytecode must be done ahead
|
|
of time.
|
|
</ul>
|
|
|
|
<p>
|
|
The consequences of these decisions are explained in the following sections.
|
|
|
|
|
|
<h2>VM Operation</h2>
|
|
|
|
<p>
|
|
Application code is delivered to the system in a <code>.jar</code>
|
|
or <code>.apk</code> file. These are really just <code>.zip</code>
|
|
archives with some meta-data files added. The Dalvik DEX data file
|
|
is always called <code>classes.dex</code>.
|
|
<p>
|
|
The bytecode cannot be memory-mapped and executed directly from the zip
|
|
file, because the data is compressed and the start of the file is not
|
|
guaranteed to be word-aligned. These problems could be addressed by
|
|
storing <code>classes.dex</code> without compression and padding out the zip
|
|
file, but that would increase the size of the package sent across the
|
|
data network.
|
|
<p>
|
|
We need to extract <code>classes.dex</code> from the zip archive before
|
|
we can use it. While we have the file available, we might as well perform
|
|
some of the other actions (realignment, optimization, verification) described
|
|
earlier. This raises a new question however: who is responsible for doing
|
|
this, and where do we keep the output?
|
|
|
|
<h3>Preparation</h3>
|
|
|
|
<p>
|
|
There are at least three different ways to create a "prepared" DEX file,
|
|
sometimes known as "ODEX" (for Optimized DEX):
|
|
<ol>
|
|
<li>The VM does it "just in time". The output goes into a special
|
|
<code>dalvik-cache</code> directory. This works on the desktop and
|
|
engineering-only device builds where the permissions on the
|
|
<code>dalvik-cache</code> directory are not restricted. On production
|
|
devices, this is not allowed.
|
|
<li>The system installer does it when an application is first added.
|
|
It has the privileges required to write to <code>dalvik-cache</code>.
|
|
<li>The build system does it ahead of time. The relevant <code>jar</code>
|
|
/ <code>apk</code> files are present, but the <code>classes.dex</code>
|
|
is stripped out. The optimized DEX is stored next to the original
|
|
zip archive, not in <code>dalvik-cache</code>, and is part of the
|
|
system image.
|
|
</ol>
|
|
<p>
|
|
The <code>dalvik-cache</code> directory is more accurately
|
|
<code>$ANDROID_DATA/data/dalvik-cache</code>. The files inside it have
|
|
names derived from the full path of the source DEX. On the device the
|
|
directory is owned by <code>system</code> / <code>system</code>
|
|
and has 0771 permissions, and the optimized DEX files stored there are
|
|
owned by <code>system</code> and the
|
|
application's group, with 0644 permissions. DRM-locked applications will
|
|
use 640 permissions to prevent other user applications from examining them.
|
|
The bottom line is that you can read your own DEX file and those of most
|
|
other applications, but you cannot create, modify, or remove them.
|
|
<p>
|
|
Preparation of the DEX file for the "just in time" and "system installer"
|
|
approaches proceeds in three steps:
|
|
<p>
|
|
First, the dalvik-cache file is created. This must be done in a process
|
|
with appropriate privileges, so for the "system installer" case this is
|
|
done within <code>installd</code>, which runs as root.
|
|
<p>
|
|
Second, the <code>classes.dex</code> entry is extracted from the the zip
|
|
archive. A small amount of space is left at the start of the file for
|
|
the ODEX header.
|
|
<p>
|
|
Third, the file is memory-mapped for easy access and tweaked for use on
|
|
the current system. This includes byte-swapping and structure realigning,
|
|
but no meaningful changes to the DEX file. We also do some basic
|
|
structure checks, such as ensuring that file offsets and data indices
|
|
fall within valid ranges.
|
|
<p>
|
|
The build system uses a hairy process that involves starting the
|
|
emulator, forcing just-in-time optimization of all relevant DEX files,
|
|
and then extracting the results from <code>dalvik-cache</code>. The
|
|
reasons for doing this, rather than using a tool that runs on the desktop,
|
|
will become more apparent when the optimizations are explained.
|
|
<p>
|
|
Once the code is byte-swapped and aligned, we're ready to go. We append
|
|
some pre-computed data, fill in the ODEX header at the start of the file,
|
|
and start executing. (The header is filled in last, so that we don't
|
|
try to use a partial file.) If we're interested in verification and
|
|
optimization, however, we need to insert a step after the initial prep.
|
|
|
|
<h3>dexopt</h3>
|
|
|
|
<p>
|
|
We want to verify and optimize all of the classes in the DEX file. The
|
|
easiest and safest way to do this is to load all of the classes into
|
|
the VM and run through them. Anything that fails to load is simply not
|
|
verified or optimized. Unfortunately, this can cause allocation of some
|
|
resources that are difficult to release (e.g. loading of native shared
|
|
libraries), so we don't want to do it in the same virtual machine that
|
|
we're running applications in.
|
|
<p>
|
|
The solution is to invoke a program called <code>dexopt</code>, which
|
|
is really just a back door into the VM. It performs an abbreviated VM
|
|
initialization, loads zero or more DEX files from the bootstrap class
|
|
path, and then sets about verifying and optimizing whatever it can from
|
|
the target DEX. On completion, the process exits, freeing all resources.
|
|
<p>
|
|
It is possible for multiple VMs to want the same DEX file at the same
|
|
time. File locking is used to ensure that dexopt is only run once.
|
|
|
|
|
|
<h2>Verification</h2>
|
|
|
|
<p>
|
|
The bytecode verification process involves scanning through the instructions
|
|
in every method in every class in a DEX file. The goal is to identify
|
|
illegal instruction sequences so that we don't have to check for them at
|
|
run time. Many of the computations involved are also necessary for "exact"
|
|
garbage collection. See
|
|
<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more
|
|
information.
|
|
<p>
|
|
For performance reasons, the optimizer (described in the next section)
|
|
assumes that the verifier has run successfully, and makes some potentially
|
|
unsafe assumptions. By default, Dalvik insists upon verifying all classes,
|
|
and only optimizes classes that have been verified. If you want to
|
|
disable the verifier, you can use command-line flags to do so. See also
|
|
<a href="embedded-vm-control.html"> Controlling the Embedded VM</a>
|
|
for instructions on controlling these
|
|
features within the Android application framework.
|
|
<p>
|
|
Reporting of verification failures is a tricky issue. For example,
|
|
calling a package-scope method on a class in a different package is
|
|
illegal and will be caught by the verifier. We don't necessarily want
|
|
to report it during verification though -- we actually want to throw
|
|
an exception when the method call is attempted. Checking the access
|
|
flags on every method call is expensive though. The
|
|
<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document
|
|
addresses this issue.
|
|
<p>
|
|
Classes that have been verified successfully have a flag set in the ODEX.
|
|
They will not be re-verified when loaded. The Linux access permissions
|
|
are expected to prevent tampering; if you can get around those, installing
|
|
faulty bytecode is far from the easiest line of attack. The ODEX file has
|
|
a 32-bit checksum, but that's chiefly present as a quick check for
|
|
corrupted data.
|
|
|
|
|
|
<h2>Optimization</h2>
|
|
|
|
<p>
|
|
Virtual machine interpreters typically perform certain optimizations the
|
|
first time a piece of code is used. Constant pool references are replaced
|
|
with pointers to internal data structures, operations that always succeed
|
|
or always work a certain way are replaced with simpler forms. Some of
|
|
these require information only available at runtime, others can be inferred
|
|
statically when certain assumptions are made.
|
|
<p>
|
|
The Dalvik optimizer does the following:
|
|
<ul>
|
|
<li>For virtual method calls, replace the method index with a
|
|
vtable index.
|
|
<li>For instance field get/put, replace the field index with
|
|
a byte offset. Also, merge the boolean / byte / char / short
|
|
variants into a single 32-bit form (less code in the interpreter
|
|
means more room in the CPU I-cache).
|
|
<li>Replace a handful of high-volume calls, like String.length(),
|
|
with "inline" replacements. This skips the usual method call
|
|
overhead, directly switching from the interpreter to a native
|
|
implementation.
|
|
<li>Prune empty methods. The simplest example is
|
|
<code>Object.<init></code>, which does nothing, but must be
|
|
called whenever any object is allocated. The instruction is
|
|
replaced with a new version that acts as a no-op unless a debugger
|
|
is attached.
|
|
<li>Append pre-computed data. For example, the VM wants to have a
|
|
hash table for lookups on class name. Instead of computing this
|
|
when the DEX file is loaded, we can compute it now, saving heap
|
|
space and computation time in every VM where the DEX is loaded.
|
|
</ul>
|
|
|
|
<p>
|
|
All of the instruction modifications involve replacing the opcode with
|
|
one not defined by the Dalvik specification. This allows us to freely
|
|
mix optimized and unoptimized instructions. The set of optimized
|
|
instructions, and their exact representation, is tied closely to the VM
|
|
version.
|
|
<p>
|
|
Most of the optimizations are obvious "wins". The use of raw indices
|
|
and offsets not only allows us to execute more quickly, we can also
|
|
skip the initial symbolic resolution. Pre-computation eats up
|
|
disk space, and so must be done in moderation.
|
|
<p>
|
|
There are a couple of potential sources of trouble with these
|
|
optimizations. First, vtable indices and byte offsets are subject to
|
|
change if the VM is updated. Second, if a superclass is in a different
|
|
DEX, and that other DEX is updated, we need to ensure that our optimized
|
|
indices and offsets are updated as well. A similar but more subtle
|
|
problem emerges when user-defined class loaders are employed: the class
|
|
we actually call may not be the one we expected to call.
|
|
<p>These problems are addressed with dependency lists and some limitations
|
|
on what can be optimized.
|
|
|
|
|
|
<h2>Dependencies and Limitations</h2>
|
|
|
|
<p>
|
|
The optimized DEX file includes a list of dependencies on other DEX files,
|
|
plus the CRC-32 and modification date from the originating
|
|
<code>classes.dex</code> zip file entry. The dependency list includes the
|
|
full path to the <code>dalvik-cache</code> file, and the file's SHA-1
|
|
signature. The timestamps of files on the device are unreliable and
|
|
not used. The dependency area also includes the VM version number.
|
|
<p>
|
|
An optimized DEX is dependent upon all of the DEX files in the bootstrap
|
|
class path. DEX files that are part of the bootstrap class path depend
|
|
upon the DEX files that appeared earlier. To ensure that nothing outside
|
|
the dependent DEX files is available, <code>dexopt</code> only loads the
|
|
bootstrap classes. References to classes in other DEX files fail, which
|
|
causes class loading and/or verification to fail, and classes with
|
|
external dependencies are simply not optimized.
|
|
<p>
|
|
This means that splitting code out into many separate DEX files has a
|
|
disadvantage: virtual method calls and instance field lookups between
|
|
non-boot DEX files can't be optimized. Because verification is pass/fail
|
|
with class granularity, no method in a class that has any reliance on
|
|
classes in external DEX files can be optimized. This may be a bit
|
|
heavy-handed, but it's the only way to guarantee that nothing breaks
|
|
when individual pieces are updated.
|
|
<p>
|
|
Another negative consequence: any change to a bootstrap DEX will result
|
|
in rejection of all optimized DEX files. This makes it hard to keep
|
|
system updates small.
|
|
<p>
|
|
Despite our caution, there is still a possibility that a class in a DEX
|
|
file loaded by a user-defined class loader could ask for a bootstrap class
|
|
(say, String) and be given a different class with the same name. If a
|
|
class in the DEX file being processed has the same name as a class in the
|
|
bootstrap DEX files, the class will be flagged as ambiguous and references
|
|
to it will not be resolved during verification / optimization. The class
|
|
linking code in the VM does additional checks to plug another hole;
|
|
see the verbose description in the VM sources for details (vm/oo/Class.c).
|
|
<p>
|
|
If one of the dependencies is updated, we need to re-verify and
|
|
re-optimize the DEX file. If we can do a just-in-time <code>dexopt</code>
|
|
invocation, this is easy. If we have to rely on the installer daemon, or
|
|
the DEX was shipped only in ODEX, then the VM has to reject the DEX.
|
|
<p>
|
|
The output of <code>dexopt</code> is byte-swapped and struct-aligned
|
|
for the host, and contains indices and offsets that are highly VM-specific
|
|
(both version-wise and platform-wise). For this reason it's tricky to
|
|
write a version of <code>dexopt</code> that runs on the desktop but
|
|
generates output suitable for a particular device. The safest way to
|
|
invoke it is on the target device, or on an emulator for that device.
|
|
|
|
|
|
<h2>Generated DEX</h2>
|
|
|
|
<p>
|
|
Some languages and frameworks rely on the ability to generate bytecode
|
|
and execute it. The rather heavy <code>dexopt</code> verification and
|
|
optimization model doesn't work well with that.
|
|
<p>
|
|
We intend to support this in a future release, but the exact method is
|
|
to be determined. We may allow individual classes to be added or whole
|
|
DEX files; may allow Java bytecode or Dalvik bytecode in instructions;
|
|
may perform the usual set of optimizations, or use a separate interpreter
|
|
that performs on-first-use optimizations directly on the bytecode (which
|
|
won't be mapped read-only, since it's locally defined).
|
|
|
|
<address>Copyright © 2008 The Android Open Source Project</address>
|
|
|
|
</body>
|
|
</html>
|