Commit Graph

94 Commits

Author SHA1 Message Date
Mark Salyzyn ca53d9716d Merge "llkd: bootstat: propagate detailed livelock canonical boot reason" am: 481a8125a6
am: 0033edc469

Change-Id: Id05e8ca453e6f761aaa540eadddcd0e18fe28efd
2018-04-20 14:20:20 -07:00
Mark Salyzyn afd66f2fd3 llkd: bootstat: propagate detailed livelock canonical boot reason
Report kernel_panic,sysrq,livelock,<state> reboot reason via last
dmesg (pstore console).  Add ro.llk.killtest property, which will
allow reliable ABA platforms to drop kill test and go directly
to kernel panic.  This should also allow some manual unit testing
of the canonical boot reason report.

New canonical boot reasons from llkd are:
- kernel_panic,sysrq,livelock,alarm llkd itself locked up (Hail Mary)
- kernel_panic,sysrq,livelock,driver uninterrruptible D state
- kernel_panic,sysrq,livelock,zombie uninterrruptible Z state

Manual test assumptions:
- llkd is built by the platform and landed on system partition
- unit test is built and landed in /data/nativetest (could
  land in /data/nativetest64, adjust test correspondingly)
- llkd not enabled, ro.llk.enable and ro.llk.killtest
  are not set by platform allowing test to adjust all the
  configuration properties and start llkd.
- or, llkd is enabled, ro.llk.enable is true, and killtest is
  disabled, ro.llk.killtest is false, setup by the platform.
  This breaks the go/apct generic operations of the unit test
  for llk.zombie and llk.driver as kernel panic results
  requiring manual intervention otherwise.  If test moves to
  go/apct, then we will be forced to bypass these tests under
  this condition (but allow them to run if ro.llk.killtest
  is "off" so specific testing above/below can be run).

for i in driver zombie; do
        adb shell su root setprop ro.llk.killtest off
        adb shell /data/nativetest/llkd_unit_test/llkd_unit_test --gtest_filter=llkd.${i}
        adb wait-for-device
        adb shell su root setprop ro.llk.killtest off
        sleep 60
        adb shell getprop sys.boot.reason
        adb shell /data/nativetest/llkd_unit_test/llkd_unit_test --gtest_filter=llkd.${i}
done

Test: llkd_unit_test (see test assumptions)
Bug: 33808187
Bug: 72838192
Change-Id: I2b24875376ddfdbc282ba3da5c5b3567de85dbc0
2018-04-18 14:02:16 -07:00
Luis Hector Chavez 5191e9d8a3 Merge "Make bootstat container-friendly" into pi-dev
am: 4807bc214f

Change-Id: I2104e73d62ac6111314d6a63343aa28bde87053b
2018-04-16 17:47:58 -07:00
Luis Hector Chavez 03aae156b8 Make bootstat container-friendly
This change allows bootstat to read the ro.boot.boottime_offset
property, which is set on devices where Android runs in a container.
This is because the CLOCK_BOOTTIME clock is not reset when (from
Android's perspective) the device restarts, since the host OS does not
restart itself.

Bug: 77273909
Test: CtsBootStatsTestCases

Change-Id: Ifb792864394be0b4686cc9d555c0afced856f4b4
Merged-In: Ifb792864394be0b4686cc9d555c0afced856f4b4
2018-04-16 11:17:42 -07:00
Luis Hector Chavez 583d34c1c6 Make bootstat container-friendly
This change allows bootstat to read the ro.boot.boottime_offset
property, which is set on devices where Android runs in a container.
This is because the CLOCK_BOOTTIME clock is not reset when (from
Android's perspective) the device restarts, since the host OS does not
restart itself.

Bug: 77273909
Test: CtsBootStatsTestCases

Change-Id: Ifb792864394be0b4686cc9d555c0afced856f4b4
2018-04-16 11:15:49 -07:00
Mark Salyzyn cfc702e548 Merge "bootstat: refine "Kernel panic - not syncing"" am: 3b353ed41c
am: cf63c66865

Change-Id: I102019c09fb52fc3963ded5f08097283b00587bd
2018-03-23 00:02:27 +00:00
Mark Salyzyn 3f48fa9cb2 bootstat: refine "Kernel panic - not syncing"
Provide some easy kernel panic subreasons mined from last kmesg,
generates a canonical boot reason (system boot reason) that
may aid triage.

Notably report kernel_panic,hung if [khungtaskd] triggers on
a livelock condition, forms a signals on the dashboards.

Helper function getSubreason modified to optionally enable checking
for a single quote resulting in a refactoring to ease maintenance of
the termination detection in the face of single bit errors heuristics.

Test: boot_reason_test.sh
Bug: 63736262
Bug: 33808187
Change-Id: I7fdd1e57e7a26095738175074306f0d2d59b1d69
2018-03-22 13:57:31 -07:00
Mark Salyzyn c74992bfbb Merge changes Ibf543273,Id56946b6 am: 7c681665f0
am: f6d3c609ee

Change-Id: Ifb3bc66b8776f2f4871b409f382d0f55b83180a6
2018-03-20 23:10:09 +00:00
Mark Salyzyn 39cc3e7073 bootstat: allow kernel_panic,sysrq to propagate subreason
Allow for a daemon to write to last kmsg to propagate a detailed
subreason to kernel_panic,sysrq actions.  A minor refactor moves
common code into a helper function getSubreason for retrieval and
bit error correction operations.

A sysrq crash generally produces a kernel-provided message:
    SysRq : Trigger a crash
which is used to generate a canonical boot reason kernel_panic,sysrq.
A user daemon could write to /dev/kmsg just prior to the sysrq with
    SysRq : Trigger a crash : '<subreason>'
to change the canonical boot reason to kernel_panic,sysrq,<subreason>.

Administration added pending kBootReasonMap entries present in TRON.

Test: manual echo into /dev/kmsg and /proc/sysrq-trigger and check
Test: boot_reason_test.sh
Bug: 33808187
Bug: 63736262
Change-Id: Ibf5432737e5a3449ebe40a8c6cf2d3e912ed6bbc
2018-03-20 13:11:44 -07:00
Mark Salyzyn 08475ad935 Merge changes If35518c0,I04bede0b am: bda0a3e637
am: bbdddfa030

Change-Id: I28e587ea4e0818fcdecea35f78d71c5e7b617605
2018-03-17 00:16:17 +00:00
Mark Salyzyn a25c4e6292 Merge "bootstat: add support for regex in aliasReasons and powerkeys" am: f4be06450a
am: 4134f4c701

Change-Id: Iaf9c670ed74aa395455252bfda9a2b3933654d11
2018-03-17 00:15:01 +00:00
Mark Salyzyn 281932803d bootstat: add support for suppressing needle match on output
For aliasReasons allow one to optionally suppress needle for output
member using a <bang> (!) character prefix.

Test: boot_reason_test.sh
Bug: 63736262
Bug: 74595769
Change-Id: If35518c08cf909c6c78a16275e9d8dfd0ff839a9
2018-03-16 15:47:27 -07:00
Mark Salyzyn 8aa36c67f8 bootstat: usb and rtc refinement
For all known cases, if usb is present in the bootloader reason,
then it is actually reporting a cold,charger canonical boot reason.
This signifies that the device was powered down, and was woken
up by the charger being connected.

For all known cases, if rtc is present in the bootloader reason,
then it is actually reporting a cold,rtc canonical boot reason.
This signifies that the device was powered down, and was woken
up by the rtc clock.

Test: boot_reason_test.sh
Bug: 74595769
Bug: 63736262
Change-Id: I04bede0b7ccaa1b859943f7def93521a8f7b25c6
2018-03-16 15:47:27 -07:00
Mark Salyzyn d9dde08660 Merge "bootstat: alias underline to space in bit error rate handling" am: 40858e95b6
am: ee3c8e64dc

Change-Id: I9f25ad71f8abfbd39c9a0055e2c96c9024224eca
2018-03-16 21:48:01 +00:00
Mark Salyzyn 7ad1c9e388 Merge "bootstat: comment out duplicate entries in kBootReasonMap" am: 1408ceec4f
am: 07e4e9b5fc

Change-Id: If9f7c72d286b926253602f141245ccf9385e23d4
2018-03-16 21:44:12 +00:00
Mark Salyzyn 057ba5845c Merge "bootstat: 3.18 kernel does not report "SysRq : Trigger a crash"" am: 5e2b2681ef
am: a2b4750771

Change-Id: Id6dc67787ae80e7ca043258cc7dd8e498f8fe0c5
2018-03-16 21:28:23 +00:00
Mark Salyzyn 25900dd772 bootstat: add support for regex in aliasReasons and powerkeys
Add support for regex in aliasReasons for the alias member.  Use this
new feature to check powerkey|power_key|PowerKey for a single entry.

Test: boot_reason_test.sh
Bug: 63736262
Change-Id: Ia6add99b9e33f3197643dbaab88dde20aa726f90
2018-03-16 11:41:17 -07:00
Mark Salyzyn 1e7d1c77fa bootstat: alias underline to space in bit error rate handling
When we are matching existing known boot reasons, we should try with
compliant underlines first, then again with underlines replaced with
spaces.  Replace references to Ber with BitError for maintenance
clarity.  Replace helper functions with C++11 lambdas.

Test: boot_reason_test.sh
Bug: 63736262
Change-Id: I91b865013513526a55a85523080c7127f198968c
2018-03-16 10:37:03 -07:00
Mark Salyzyn 2b820536e6 bootstat: comment out duplicate entries in kBootReasonMap
Two entries can be reused. The third "unknown" entry is not really
a duplicate since the kUnknownBootReason is not checked.  Duplicate
entries reused in the future, should have
analysis/uma/configs/clearcut/TRON/histograms.xml updated first.

Test: boot_reason_test.sh
Change-Id: If2071a18160dc2c93e851fecc6b8c11fc76c9845
2018-03-16 10:36:53 -07:00
Mark Salyzyn 853bb80e58 bootstat: 3.18 kernel does not report "SysRq : Trigger a crash"
Use an alternate means to determine that the sysrq crash was
requested.  Also, to be CTS compliant, the kernel_panic subreason
must be in lower case.

Test: boot_reason_test.sh
Bug: 74595769
Bug: 63736262
Change-Id: Ica06960ce62d220a909006e365951376d672b7e6
2018-03-16 09:39:17 -07:00
Tej Singh fe3e762b6d Fix performance degradation from BootSequence atom
Adding the boot sequence reported atom in ag/3518079 caused the duration
of bootstat to increase, as seen in b/72864061. I isolated the cause
down to calling BootReasonStrToReason. However, this function also gets
called in ReportBootReason, so I created another function that does the
parsing and sets the system boot reason property, and made
RecordBootReason and statsd logging get that property.

Bug: 72864061
Test: rebooted phone, verified boot events were received in adb shell
logcat -b stats and verified adb shell bootstat -p printed correct
values. Ran timing tests as well on walleye with 20 boots: before this
change, the average was ~150-160ms. After, it was ~80ms.

Change-Id: I92dbc9880328835647be7d9d50c7861b42f27bdb
2018-02-12 21:42:29 +00:00
TreeHugger Robot 94cde4e86c Merge "Atom: BootSequenceReported" 2018-01-31 04:24:54 +00:00
Tej Singh 4eacd38366 Atom: BootSequenceReported
Logs information about boot time and reason to statsd.
Specifically: bootloader boot reason, system boot reason, bootloader
boot time, total boot time, time that the boot finished, and time since
last boot.

Test: booted the phone and verified "adb logcat -b stats" received the
event

Change-Id: I769df9a09263ed3667f7085c81b3d072e868cbda
2018-01-26 16:14:13 -08:00
James Hawkins 79a4ee260b bootstat: Add more boot reasons.
Bug: none
Test: none
Change-Id: I483e4e2c6ea4219ac21b64a24bfac58c2919a40b
2018-01-26 14:31:04 -08:00
Treehugger Robot b91021a2a8 Merge "bootstat: Add more boot reasons." 2018-01-09 00:03:22 +00:00
James Hawkins b607daec94 bootstat: Add more boot reasons.
These appear to be winding down...

Bug: none
Test: none
Change-Id: I00bc84d8e50b1563f97f812f722aa6bdbd6265b2
2018-01-05 14:42:55 -08:00
Ivan Lozano 9ddf491492 Merge "Fix sanitizer errors in bootstat.cpp." 2017-12-21 06:42:55 +00:00
Ivan Lozano 44d3cacfce Fix sanitizer errors in bootstat.cpp.
Integer overflow sanitized builds are throwing an error on the while
loop decrement in the rfind function. This refactors the loop to prevent
decrementing the value on the final iteration.

Test: Compiled and device boots without runtime error.
Bug: 30969751
Change-Id: Ice4532cce933062b3c14adf2d9749cfdea4ad84c
Merged-In: Ice4532cce933062b3c14adf2d9749cfdea4ad84c
2017-12-20 14:59:19 -08:00
Elliott Hughes 579e682628 Add std::string StartsWith*/EndsWith* overloads.
We should have done this from the beginning. Thanks to Windows, we're not
going to be able to switch libbase over to std::string_view any time soon.

Bug: N/A
Test: ran tests
Change-Id: Iff2f56986e39de53f3ac484415378af17dacf26b
2017-12-20 09:42:22 -08:00
James Hawkins e2c2724aee bootstat: Add tons of new boot reasons.
Bug: none
Test: none
Change-Id: I57e89a20f23d3b3cff2fa6667e25f1c37cb2870c
2017-12-18 13:40:27 -08:00
James Hawkins 00433a2305 bootstat: Add more boot reasons.
Bug: none
Test: none
Change-Id: I4cc6e2e39b6c5954387006d48f2119211fa37dd5
2017-12-04 14:20:21 -08:00
James Hawkins f4444f058c bootstat: Add four more boot reasons.
Bug: none
Test: none
Change-Id: Ifd62e3a8bb6142e89fcde819eab1caba3d7933b7
2017-11-30 15:01:40 -08:00
James Hawkins 5f85f8354b bootstat: Three more boot reasons.
Bug: none
Test: none
Change-Id: I37b6ed6ce73f403aa5f9db2c0234e9dc4c0f7c01
2017-11-29 14:30:06 -08:00
James Hawkins 74b1758cad bootstat: Add a few more bootreasons.
Bug: none
Test: none
Change-Id: I9b482e642a9eec26670118a265046194f3cc2a7a
2017-11-28 15:20:44 -08:00
James Hawkins 8ac79bc820 bootstat: Add another boot reason.
Bug: None
Test: None
Change-Id: I865a12e9c8833c4a345284181aa1812be3eddadd
2017-10-31 10:07:34 -07:00
Mark Salyzyn 293cb3b217 bootstat: handle a bad bit error rate issue with pstore
Create a private rfind that allows a fuzzy match based on a bit error
rate (BER) of 1 every 8 bits.  last kmsg is affected by pstore ramoops
backing that suffers from data corruption.  Add some additional
validation based on possible data corruption scenarios, as a noisy
match means higher chance of noisy data.

Noisy data notably can affect the battery level detection, but do not
typically result in false positives.  Battery level, or failure, is
the responsibility of the BatteryStats service, providing a positive
signal and strong device-independent algorithm.  The checking done in
bootstat is likely to be deprecated in favour of an API request to
BatteryStats once their algorithms deal with surprise outages due to
aging.

The kernel logging heuristic and BER fixup handily deals with a
prevalent issue where some bootloaders failure to properly notify us
of panics.  This is where the gains are noticed with this improvement.

Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I93b4210f12fb47c5c036f4d6eb4cafeee4896d35
2017-10-26 14:17:14 -07:00
Mark Salyzyn 747c0e6216 bootstat: better validation of battery level (shutdown,battery)
Replace simple strtoull with loop that ensures no leading zeros.
Restrict size of value buffer being checked as allocation was
going to end of retrieved buffer, which can cause unnecessary
memory pressure during boot.

Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: Ifdc1d4fd3a73794c001577024ce7cbfde9c25028
2017-10-26 14:17:14 -07:00
Mark Salyzyn dafced93a5 bootstat: Do not allow unknown boot reasons to land in first field.
If we sniff an unknown boot reason from last kmsg, make sure it
has a "reboot," prefix.

Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: Ia1c401b8899d1f0c56bd4f5d8d2d19b7fc889a30
2017-10-26 14:14:40 -07:00
Mark Salyzyn 88d692c09e bootstat: move boot reason validation transformation policy into subroutine.
Allow for future policy adjustments.

SideEffects: None
Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I571fb7dafc6b80c75d2809a3da3f9b96784cef06
2017-10-26 14:11:04 -07:00
Mark Salyzyn 72a8ea3d3c bootstat: test: fix Its_Just_So_Hard_reboot
Add the test injection to known list, and deal with an error
propagation issue.

Test: system/core/bootstat/boot_reason_test.sh Its_Just_So_Hard_reboot
Bug: 63736262
Change-Id: I4799956978a8884c69c830fcedd7febd143651fd
2017-10-26 14:10:31 -07:00
Mark Salyzyn c89f9da060 bootstat: add shutdown,thermal,battery
Test: system/core/bootreason/boot_reason_test.sh
Bug: 63736262
Change-Id: I30d6b5308fd8616eae31ad13e925d66c9fc07367
2017-10-25 15:45:49 -07:00
Mark Salyzyn cabbe4ff35 bootstat: add reboot,2sec
Test: system/core/bootreason/boot_reason_test.sh
Bug: 63736262
Change-Id: Ie2e5872adb14e0979ff87eb4d2f57ab2849facd4
2017-10-25 15:45:49 -07:00
Treehugger Robot a03c935e27 Merge "bootstat: Add more boot reasons found by logging in the wild." 2017-10-19 19:41:34 +00:00
James Hawkins 34073b583a bootstat: Add more boot reasons found by logging in the wild.
Bug: none
Test: none
Change-Id: I7e388326be44aeb3e6fdb79d2e173dc17eb8bc1f
2017-10-19 10:44:22 -07:00
Josh Gao 52315ae329 bootstat: remove extraneous android_logcat_destroy.
android_logcat_pclose already does that.

Bug: http://b/67960705
Test: mma
Change-Id: I03d06b10807e8a313c9654c2e1db36bfb59e3f99
2017-10-18 16:57:33 -07:00
Mark Salyzyn 444e3a9f74 Merge "bootstat: Add some reserved sys.boot.reason values to enumeration" 2017-10-11 20:32:00 +00:00
Mark Salyzyn 243fa29804 bootstat: Add some reserved sys.boot.reason values to enumeration
Add cold,powerkey, warm,s3_wakeup and hard,hw_reset so that
sys.boot.reason values can also be enumerated.  Also add
some reserved speculative entries associated with forced
suspend to RAM and DISK; shutdown,suspend and shutdown,hibernate
respectively.

Test: system/core/bootstat/boot_reason_test.sh
Bug: 67636061
Change-Id: Ic43523748e6006aaca882f8eec7c1f0c08431bd8
2017-10-11 10:56:42 -07:00
James Hawkins 25f7122512 bootstat: Special case logging for empty boot reason.
Empty boot reason is mostly unexpected but may take up the bulk of
unknown reported boot reason values.

Bug: none
Test: none
Change-Id: I9978658c2b052d5cf5d28299861b0d1125ba9fc0
2017-10-10 16:37:05 -07:00
Mark Salyzyn 6290982ee3 init+bootstat: reduce last reboot reason to canonical alias
To make parsing easier for last reboot reason.  This also ensures that
last boot reason matches the content that is typically returned by the
bootloader or in turn landed in the canonical system boot reason.

Simplify parsing in bootstat.  Adjust and fix boot_reason_test.sh for
new reality.  Allow boot reason tests battery and kernel_panic to pass
if device does not support pstore (empty before and after the test).
If device somehow landed in fastboot mode while waiting for the
display, issue a fastboot reboot to move the test along.  Some cleanup
and standardization changes to the test script.

Test: system/core/bootstat/boot_reason_test.sh
Bug: 63736262
Change-Id: I97d5467c0b4a6d65df3525f1a2d0051db813d5ad
2017-10-10 09:37:59 -07:00
Mark Salyzyn 0af71a5327 bootstat: system_boot_reason does not handle reboot_<reason> case
Heuristic would translate reboot_<reason> to reboot._<reason>. Add
checking/squashing for leading comma (,) _or_ underscore (_).

Test: compile
Bug: 63736262
Change-Id: I7ef4930ec8c064a8151183ac08d292931a8584c8
2017-10-05 14:42:08 -07:00