This adds the -s option to not print out Cow information, providing a
simple benchmark for reading and iterating through a cow file one time.
Bug: 172026020
Test: inspect cow -s [file]
Change-Id: I6e8b80a80c71155364b8467acd556d00f1dd2a42
This introduces clusters of CowOperations to the Cow Format, improving
reader performance by allowing many operations to be read at once.
When the header's cluster_ops is a nonzero value, operations will be
clumped into packs of that number, where the last op is a special
CowClusterOp, which points to the start of the next cluster. Each
cluster is immediately followed by it's data, with the footer following
at the end of the file.
cluster_ops must not be 1, as we must include a CowClusterOp in each
group.
Change-Id: I2855288958d56569d4078f377ef0b8c01ca5c6e7
Bug: 172026020
Test: cow_api_test
Kernel will batch the merge operations only when
block numbers of source and cow device are contiguous.
Daemon will read the COW file and post-process
the operations so that kernel can batch merge the potential
operations.
There are two key changes done in post-processing:
1: COW file contains all the copy operations at the
beginning of the file. We cannot allow batch
merging of COPY operations as a crash in between
the overlapping copies can result in a corrupted state.
Hence, allow copying individual blocks at a time.
2: Replace and Zero operations can be batch merged.
However, reading our existing COW format as-is
will not allow kernel to batch merge. Hence,
process the operations in such a way that kernel can batch
merge them.
Furthermore, it is observed that sync() after every
merge is a overhead. Hence, we will only sync()
for copy operations only. For replace and zero
operations, we will not explicitly sync. This is ok as
crash in between replace/zero merge operations can
redo those operations. However for copy, we have
to make sure that sync is completed before next copy
operation is initiated.
Merge time of a full OTA on bramble is around ~60
seconds as compared to ~10+ minutes prior to this
optimization.
Note that we still have copy operations which are not
batch merged. Hence, OTA with significant number of
copy operations can still have overhead on merge timings.
Bug: 174112589
Test: vts_libsnapshot, cow_snapuserd_test
Full OTA on bramble.
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I1dc286067a26ea399fa5d4e8e826e5622ce3fa58
Refact cow_snapuserd unit test. This will
test code changes in daemon in isolation. Currently it tests
1: Reading snapshot device and validate data
2: Merge and validate data.
More tests related to merge will be added later; but this
should provide a baseline unit test for daemon.
BUG: 168311203
Test: cow_snapuserd_test
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I4d77c435a9f045204a4ff3ece524c857f9fba50f
We only need to iterate exceptions_per_area times during
merge. Additional iteration overhsoots the buffer and hence triggers
the assert.
BUG: 168311203
Test: Full OTA with VABC - Verified merge complete
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: If6e8dee08802ea70a8fd40e93ad63a47f39ce96e
Currently, when init unlinks /dev/socket/snapuserd, the daemon currently
polling the socket will fail in accept4(). This works because the
process won't exit until all threads have terminated, but it's
confusing, and it creates log spam.
Instead, add an explicit "detach" message so that snapuserd will stop
listening for new connections and will immediately join on remaining
threads. This way snapuserd will gracefully exit when all threads are
complete, and we're guaranteed no new threads will be created.
This will only be used for first-stage instances of snapuserd.
Bug: 173476209
Test: full OTA with VABC, no accept4() errors
Change-Id: Ibeffa4a35043ed43a70166eeee86204e2b3d03aa
Because init needs tighter integration with how to launch snapuserd,
this code will be moved directly into init instead.
Bug: 173476209
Test: manual test
Change-Id: Ibce3bac9699620882eae44188d937d4900f022d3
This adds a compression bit to SnapshotUpdateStatus. This is so init can
tell whether it needs to launch snapuserd, without reading the
individual state of each snapshot, since this state is global.
Bug: 173476209
Test: manual test
Change-Id: I8c3bbfb0d184f70e661e7b043afc37e335d1e187
Normally, DeviceMapper::CreateDevice() handles this for us. However, it
does not work in first-stage init, because ueventd is not running.
Therefore this patch adds a way for first-stage init to set a callback
to manually regenerate and process uevents.
Additionally, even with ueventd, dm-user misc device creation needs a
WaitForFile() call, since ueventd is asynchronous.
The WaitForDevice() helper in this patch accounts for both of these
scenarios.
Bug: 173476209
Test: device boots into first-stage init after full VABC ota
Change-Id: Ib7a9bfc2a5a5095aa00b358072f9cb1743c19ab2
We change read_all() and write_all() to return signed values, so
our negative error return values will be correctly caught.
Test: TreeHugger
Change-Id: I4d03d4475bccb40ae4b84a846966139b34ede2bd
prepare_fs_for_mount() attempts to mount() the block device at the
mount_point, so it must be created beforehand, as opposed to later in
__mount() where it is done now.
Also fix slight bugs with using fstab.mount_point instead of an
alternative specified mount_point.
Test: boot
Change-Id: I758b6a6efe2dd99433fc1974b69cb6f7ce3bc0c2
Using ro.virtual_ab.compression.enabled does not work in first-stage
init, since properties aren't available. Instead, attach a compression
bit to SnapshotStatus. Although there are no plans to have per-snapshot
compression toggles, this route avoids creating a new state or indicator
file elsewhere. And SnapshotStatus is available in every place that
cares about compression.
Bug: N/A
Test: apply OTA with VABC
Change-Id: I51deb5693a08417ea7a2c524e240191748907123
Handle flush operations by dm-snap post merge and
the ABI changes from dm-user. This
is now in sync with the latest dm-user patch (patch 25).
In case of any failures observed in daemon in the IO
path, return error code back to dm-user which
will eventually fail the IO.
Bug: 168311203
Test: vts_libsnapshot_test, cow_snapuserd_test
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I4af63845f8c3e1c445f6c55374ea58b6f3454795
This provides a block device via dm-user, with all accesses backed by
in-memory storage. It's essentially the same as what I have in
selftests, with the kselftests stuff removed so it'll build in Android.
Test: mkfs.f2fs, dd, fsck.f2fs
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Change-Id: I68515d6e9001c2f6d199d394e67ebe528b382406
This fixes a number of small bugs in libsnapshot. It also refactors the
handler list a bit. Previously, it was a list of unique_ptrs. Now it is
a list of shared_ptrs to simplify ownership.
Additionally, Snapuserd is now keyed solely on the misc device name.
This allows two identical snapshots to run in the same daemon, with
different control names (a scenario that comes up in the
DaemonTransition test). As part of this change, the two-stage
initialization process has been refactored slightly. The "init" message
sets all the device paths, and the "start" message needs only the misc
name.
Both the init and start messages now validate that no duplicate handlers
exist, and that we're not overwriting any previous thread.
This cleanup also fixes a bug in DmUserHandler cleanup - if a control
device shut down raced with WaitForDelete(), the std::thread object
would delete without a call to detach() or join(). In the new
RemoveHandler(), we now correctly detach() in this scenario.
This also fixes a bug where, if a COW had no partition component (it
only resided on /data), the second-stage transition would fail because
it used the wrong device-mapper name.
Bug: N/A
Test: vts_libsnapshot_test
Change-Id: Ib4a281a3b5fe665c727c7077672e3c6b0b3abdba
This will fail in vts_libsnapshot_test because the pid isn't set (and
even if it was, we don't want to kill any running snapuserd instances).
Rather than start an instance just for the sake of killing it, use
dependency injection instead.
Bug: N/A
Test: vts_libsnapshot_test
Change-Id: I1569f927e018319cf4008673a32d5526573d8d57
This simplifies the footer and label resume system for COW files.
Previously, a footer could be missing, and we'd try to recover the file.
Any ops we could recover were handed off to the CowWriter, which then
searched up to the correct resume label.
In the new model, a reader is opened with an optional label (resume
mode), or without a label.
When resuming to a label, a footer is optional. The read is terminated
by EOF, discovery of the requested label, or the presence of a footer.
However the requested label must be found, and parsing fails if it is
not found.
When opening a COW without a label, the footer must be present, as this
signals the file was successfully flushed. Parsing fails if no footer is
found.
Bug: N/A
Test: cow_api_test
full OTA with VABC
Change-Id: Ie79ab5259f532dd16c72f0e42da7568c5c5c4623
When we're writing to a block device, we can't truncate. Ignore those
commands. Truncate mostly just saves time in the read phase by chopping
off unused data in the event finalize wasn't called.
Bug: 172026020
Test: cow_api_test
Change-Id: I3befc71fa9597edf8243d0f9e17440db91409aea
dm-user device requires sector information which is retrieved
from snapuserd daemon once the metadata read is completed.
Split up the initialization of daemon into two APIs. Once
the metadata read is completed, send the sector information
back to libsnapshot to create the dm-user device.
On a sidenote, remove unused code from snapuserd_client
Bug: 168311203
Test: vts_libsnapshot_test, cow_snapuserd_test
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I0647ebbeea3a4705599966974bfec2318a3e1e4e
Clang's static analyzer flagged the following potential null pointer
dereferences. Looks like a transitive caller of this function has
```
if (change) *change = foo;
```
...and no nullness checks between that and these unconditional
assignments to `*change`.
> system/core/fs_mgr/fs_mgr_overlayfs.cpp:1100:13: warning: Dereference
of null pointer (loaded from variable 'change')
[clang-analyzer-core.NullDereference] >
system/core/fs_mgr/fs_mgr_overlayfs.cpp:1167:17: warning: Dereference of
null pointer (loaded from variable 'change')
[clang-analyzer-core.NullDereference]
Bug: None
Test: TreeHugger
Change-Id: I656e3b0f56b18ec3ca3d1db773feb81adf163122
Handle write IO during snapshot merge. When merge
is completed, dm-snapshot-merge issues WRITE IO
to indicate the completion of merge for a specific
operation. Snapuserd daemon tracks the merge completion
process in the COW header; this is required to handle
resuming merge operation gracefully if there is a crash
during merge.
Bug: 168311203
Test: vts_libsnapshot_test, cow_snapuserd_test
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I4fc8cc23a6ec5b26b7ae3339212efdd3b9a367e8
This is required for merge IO path. During merge process,
kernel scans the metadata page from backwards when merge
is initiated. Since, merge ordering should follow our internal
COW format, read the COW operations from backwards and populate the metadata.
Additionally, Kernel can merge successive operations if the two chunk IDs
are contiguous. This can be problematic when there is a crash
during merge; specifically when the merge operation has dependency.
These dependencies can only happen during copy operations of the
overlapping regions. To avoid this problem, we make sure that
copy operations are merged individually.
Test: cow_snapuserd_test
Bug: 168311203
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: Ic9c52b323ce58af897cfa343bf9277c8f1f022e3
If host system is Android Q and guest system is S, then reenable DSU in
the guest system (`adb shell gsi_tool enable -s`) would create an empty
DSU active indicator file.
In this case, use the default slot name "DSU" if DSU active indicator
file is non-existent or empty.
Bug: 172541102
Bug: 168571434
Test: Boot GSI S on Pixel Q with DSU
Test: Verify reenable DSU works
Test: adb shell gsi_tool enable -s && adb reboot
Change-Id: Ie59e4378d1aa4dca085c399a75abd4e662340030
It looks like in some rare scenarios device-mapper won't create an
uevent while handling DM_DEV_REMOVE ioctl.
Since failure to generate uevent will result in ueventd not cleaning up
/dev paths in user space, it should be ok to just fail DeleteDevice call
without aborting the entire process.
Test: presubmit
Bug: 171266986
Change-Id: Iece28e6e02c35ac4bf33b9be74dfe1bbbb466bd4
This will test the IO operation
between dm-snapshot/dm-user/snapuserd daemon. This test was broken
after cow format and daemon related changes.
Test: cow_snapuserd_test
Bug: 168311203
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I3e5f435332957a2549fd029c84efe13efb98b909
DSU slotting support is added in Android R, so
android::gsi::GetActiveDsu() is guaranteed to fail if DSU guest system
is S and host is Q.
In that case, just assume the the DSU slot to be "dsu".
Bug: 172541102
Bug: 168571434
Test: Boot GSI S on Q pixel with DSU
Change-Id: I96e191907ce2fedf8c22b7887e930a7c455eb593
Certain ops (label, footer) do not have actual block mappings, so ignore
these.
Bug: NA
Test: apply OTA with VABC
Change-Id: I0b9ce27239553d8126c17767f35c2651df951ea0