From 00b2ce7005bfab7fb4d01671867964a10916b2aa Mon Sep 17 00:00:00 2001 From: Mark Salyzyn Date: Tue, 7 Aug 2018 10:16:47 -0700 Subject: [PATCH] llkd: Add cma_alloc stack symbol checking Add ro.llk.stack to list a set of symbols that should rarely happen but if persistent in multiple checks, indicates a live lock condition. At ro.llk.stack.timeout_ms the process is sent a kill, if it remains, then panic the kernel. There is no ABA detection in the paths, the condition for the stack symbol being present instantaneously must be its rarity of being caught. If a livelock occurs in the path of the symbol, then it is possible more than one path could be stuck in the state, but the best candidate symbols are found underneath a lock resulting in only one process being the culprit, and the best aim. There may be processes that induce a look of persistence, if so the symbol is not a candidate for checking. Adding cma_alloc to the default list. It is not behind a lock, so multiple references can happen. The hope is the first one to spin or wait gets the kill, but there is the possibility that both will get the kill. It is unknown if this will escalate to a kernel panic at this time. It is also suspect that a RT task could cause this by starving the background worker, and llkd could suffer a similar fate as it is SCHED_BATCH policy. Test: compile Bug: 33808187 Bug: 111910505 Bug: 80502612 Change-Id: I49c9f0646d627869144c5c1ca32272515ed60f7b --- llkd/README.md | 2 +- llkd/include/llkd.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/llkd/README.md b/llkd/README.md index 9a2468295..e5be8502e 100644 --- a/llkd/README.md +++ b/llkd/README.md @@ -127,7 +127,7 @@ Only active on userdebug and eng builds. default 2 minutes samples of threads for D or Z. #### ro.llk.stack -default __get_user_pages, comma separated list of kernel symbols. +default cma_alloc,__get_user_pages, comma separated list of kernel symbols. The string "*false*" is the equivalent to an *empty* list. Look for kernel stack symbols that if ever persistently present can indicate a subsystem is locked up. diff --git a/llkd/include/llkd.h b/llkd/include/llkd.h index 915ee90cf..c724fce10 100644 --- a/llkd/include/llkd.h +++ b/llkd/include/llkd.h @@ -48,7 +48,7 @@ unsigned llkCheckMilliseconds(void); /* LLK_CHECK_MS_DEFAULT = actual timeout_ms / LLK_CHECKS_PER_TIMEOUT_DEFAULT */ #define LLK_CHECKS_PER_TIMEOUT_DEFAULT 5 #define LLK_CHECK_STACK_PROPERTY "ro.llk.stack" -#define LLK_CHECK_STACK_DEFAULT "__get_user_pages" +#define LLK_CHECK_STACK_DEFAULT "cma_alloc,__get_user_pages" #define LLK_BLACKLIST_PROCESS_PROPERTY "ro.llk.blacklist.process" #define LLK_BLACKLIST_PROCESS_DEFAULT \ "0,1,2,init,[kthreadd],[khungtaskd],lmkd,lmkd.llkd,llkd,watchdogd,[watchdogd],[watchdogd/0]"