This is a race between init process and bionic libc initialization of
snapuserd.
init->fork() ----------------> SecondStageMain() -> PropertyInit()
|
|
v
execveat ---> __libc_init_common() -> __system_properties_init()
(snapuserd)
When init process calls PropertyInit(), /dev/__properties__ directory
is created. When bionic libc of snapuserd daemon invokes __system_properties_init
_after_ init process PropertyInit() function is invoked, libc will
try to initialize the property by reading
/system/etc/selinux/plat_property_contexts. Since any reads on /system
has to be served by snapuserd, this specific read from libc cannot be
serviced leading to deadlock.
Reproduce the race by inducing a sleep of 1500ms just before execveat()
so that init process calls PropertyInit() before bionic libc
initialization. This leads to deadlock
immediately and with additional kernel instrumentation with debug
logs confirms the failure:
======================================================
init: Relaunched snapuserd with pid: 428
ext4_file_open: SNAPUSERD: path /system/etc/selinux/plat_property_contexts - Pid: 428 comm 8
ext4_file_read_iter: SNAPUSERD for path: /system/etc/selinux/plat_property_contexts pid: 428 comm 8
[ 25.418043][ T428] ext4_file_read_iter+0x3dc/0x3e0
[ 25.423000][ T428] vfs_read+0x2e0/0x354
[ 25.426986][ T428] ksys_read+0x7c/0xec
[ 25.430894][ T428] __arm64_sys_read+0x20/0x30
[ 25.435419][ T428] el0_svc_common.llvm.17612735770287389485+0xd0/0x1e0
[ 25.442095][ T428] do_el0_svc+0x28/0xa0
[ 25.446100][ T428] el0_svc+0x14/0x24
[ 25.449825][ T428] el0_sync_handler+0x88/0xec
[ 25.454343][ T428] el0_sync+0x1c0/0x200
=====================================================
Fix:
Before starting init second stage, we will wait
for snapuserd daemon to be up and running. We do a simple probe by
reading system partition. This read will eventually be serviced by
daemon confirming that daemon is up and running. Furthermore,
we are still in the kernel domain and sepolicy has not been enforced yet.
Thus, access to these device mapper block devices are ok even though
we may see audit logs.
Note that daemon will re-initialize the __system_property_init()
as part of WaitForSocket() call. This is subtle but important; since
bionic libc initialized had failed silently, it is important
that this re-initialization is done.
Bug: 207298357
Test: Induce the failure by explicitly delaying the call of execveat().
With fix, no issues observed.
Tested incremental OTA on pixel ~15 times.
Ignore-AOSP-First: cherry-pick from AOSP
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: I86c2de977de052bfe9dcdc002dcbd9026601d0f3
(cherry picked from commit 65f3d09445699910947fd73b35979121b2bf07fb)
Merged-In:I86c2de977de052bfe9dcdc002dcbd9026601d0f3
This CHECK prevents a release build from resuming a two-phase merge if
the merge initially failed in the first pass.
Bug: 213031779
Bug: 213253413
Bug: 193549218
Ignore-AOSP-First: cherry-pick from AOSP
Test: vts_libsnapshot_test
Test: update_engine_unittests
Change-Id: I8bf00e3016546ef7039bb0b18eb977cc3dc1066a
Merged-In: I8bf00e3016546ef7039bb0b18eb977cc3dc1066a
If a merge fails we write a new snapshot status indicating that the merge
failed. If this happens to occur during the second merge phase, we fail
to propagate the phase counter to the new status. This means the merge
is unlikely to make progress and succeed later.
Bug: 213031779
Bug: 213253413
Bug: 193549218
Ignore-AOSP-First: cherry-pick from AOSP
Test: inject transient failure into CheckMergeConsistency, apply OTA,
reboot and complete merge.
Change-Id: I31fdae6bde48e3a71b6f3fcc663541257f7ebd8f
Merged-In: I31fdae6bde48e3a71b6f3fcc663541257f7ebd8f
Judging from local experiments, it looks like device-mapper doesn't
always generate a uevent after DM_DEV_CREATE ioctl.
Test: presubmit
Bug: 193462349
Change-Id: I8a74375631b20c14a32a41dbaf38380ebc0078e6
Merged-In: I8a74375631b20c14a32a41dbaf38380ebc0078e6
Since vendor has a way to override the group cpu/schedtune setup, we
cannot assume the group will always return valid data. This CL let
get_sched_policy to fallback to cpuset if no valid data found in
cpu/schedtune cgroup. In longer term, we should find a way to cache the
group or app's process state in framework other than relying on reading
cgroup back.
Test: /data/nativetest64/libcutils_test/libcutils_test
Bug: 210066228
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I8b4396365a7fc2d93e3a22746195585c140eef3c
Using WriteStringToFile will hold kernfs_mutex which is a big lock,
so use WriteStringToFd instead. Besides, also support fd cache for it.
Bug: 206970384
Test: build pass
Change-Id: Id79f9e1095f52079393c58edb9a4d526f4cc6b5e
Merged-In: Id79f9e1095f52079393c58edb9a4d526f4cc6b5e
UserLifecycleTests test disables all Little cores in the course of the
test, which causes attempts to add a process into /dev/cpuset/restricted
cpuset cgroup to fail with ENOSPC error code, indicating that a process
is joining a cpuset cgroup with no online cpus. Current libprocessgroup
implementation will log an error on each such occurrence, which spams
the logs and makes it hard to analyze test results. Because this
situation does not happen in production environment (we do not offline
cpus), we can prevent flooding the logs by identifying this case,
logging an appropriate error one time and ignore all later similar errors.
Bug: 158766131
Test: adb shell "echo 0 > /sys/devices/system/cpu/cpu[0-3]/online"
Test: start some apps, observe libprocessgroup errors in the logcat
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia91d8839d86787569c255481bde077be51c43d93
Merged-In: Ia91d8839d86787569c255481bde077be51c43d93
1) App doesn't have cgroup access and there is no purpose of reading cgroup
for app. For system_server it should be known in foreground group. So
there is no benefit of reading group.
2) Reading cgroup in apps can also cause contention for other cgroup
operations.
3) vendor can change cgroup setting and get_sched_policy may return
incorrect information for get_sched_policy_profile_name.
Test: Boot
Bug: 210011562
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I8e8c8b346984781c56ec93c0616121f7d5c99fe5
When the scratch space is mmap'ed, the metadata buffer
will be un-aligned. This may lead to alignment fault
on 32-bit systems. Address this by temporarily copying
it to buffer.
No perf impact as this code path is not in I/O path
and the copy is a for the size of metadata buffer which
is 8k.
Bug: 206426215
Test: Full and Incremental OTA on pixel
1: Compile snapuserd as 32 bit and reproduced the bug on pixel.
2: With fix - OTA applied successfully.
3: Reboot the device when merge was in-flight as the fix is primarily
in that path.
4: Verify merge completion and data integrity post merge.
Signed-off-by: Akilesh Kailash <akailash@google.com>
Change-Id: Icd4a21d6a61f1ab36e65994c06a4d049a2ee741c
Merged-In: I63c0d862057ebf138c9d1696a942030e30598739
This change only *adds* /system_ext/etc/selinux to the debug policy
search path, and does not change any preconditions to load the debug
policy. The device still needs to be bootloader-unlocked and has the
debug ramdisk flashed to be able to use the debug policy. The only thing
changed is that now the debug policy can be loaded from /system_ext or
/debug_ramdisk when system partition is compliance testing GSI.
The debug policy in the boot ramdisk may be outdated if the system
partition is flashed with a image built from a different revision.
This happens frequently when running the compliance testing VTS, where
the device is flashed with (A) GSI and (B) device vendor image & debug
boot image, and (A) and (B) are built from different git revisions.
To address this, we install a copy of the debug policy under
/system_ext, so that the version desync between (A) & (B) wouldn't be a
problem anymore because (A) no longer relies on the debug policy file
from (B).
Bug: 188067818
Test: Flash RQ2A.201207.001 bramble-user with debug ramdisk & flash
gsi_arm64-user from master, device can boot and `adb root` works
Change-Id: I4d6235c73472e4d97619b2230292e6a0bc4b3e05
Merged-In: I4d6235c73472e4d97619b2230292e6a0bc4b3e05
(cherry picked from commit 650b29d2349253a0dd6000564ccb2c19b51352db)
While llkd helps in discovering issues in apps which leave zombies, it
creates issues for dogfooders when apps are killed. Disable it by
default.
Bug: 202411543
Test: boot and check llkd not running
Test: `setprop ro.llk.enable true` enables llkd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If93bf9e981eaa3921a9da5f3160db26c4fe17e66
Merged-In: If93bf9e981eaa3921a9da5f3160db26c4fe17e66
Create a new group for dex2oat in cpu cgroup, which is dedicated for
dex2oat processes. Also modify task profiles for this change.
Bug: 201223712
Test: dex2oat group created
Change-Id: Ic61f4b8a64d01c03549b680970805e12b9ce4fcc
Merged-In: Ic61f4b8a64d01c03549b680970805e12b9ce4fcc
If a checkpointing operation is in progress, discard any write operations
that are flagged as STORAGE_MSG_FLAG_PRE_COMMIT_CHECKPOINT. In tandem
with trusty-side changes that set the flag appropriately, this avoids
the awkward case where the checkpoint is rolled back, which potentially
leads to inconsistency between the data and the superblock.
Based on Stephen's CL/1845477 "Add helper to check checkpoint state of
mounts".
Original change: https://android-review.googlesource.com/c/platform/system/core/+/1850058
Test: m storageproxyd
Bug: 194313068
Change-Id: I0924084f7f0b20018cbb71f5153469c8a686e262
Merged-In: I0924084f7f0b20018cbb71f5153469c8a686e262
(cherry picked from commit 34404f4ab1326eb179f889201dcfd04eb6002d0e)
This reverts commit 220f604ca5.
Reason for revert: from bug 202436407 it looks like the initial revert wasn't needed.
Change-Id: I81dba47ee4bc55da2d4c5212dfc1b6200719b8be
Since we are adding SCHED_SP_SYSTEM into framework, the policy is needed
on older releases as well.
Bug: 201428466
Test: Build
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I0784820bfec84fd8ef98f2feb2caee084d0a14a2
Merged-In: I0784820bfec84fd8ef98f2feb2caee084d0a14a2