Go to file
Balasubramani Vivekanandan b9023b91dd tick: broadcast-hrtimer: Fix a race in bc_set_next
When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.

The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.

In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().

In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings

Here is a depiction of the race condition

CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                  and subscribe to
                                                  tick broadcast)
---------------------                             ---------------------

__run_hrtimer()                                   tick_broadcast_enter()

  bc_handler()                                      __tick_broadcast_oneshot_control()

    tick_handle_oneshot_broadcast()

      raw_spin_lock(&tick_broadcast_lock);

      dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
      //next_event for tick broadcast clock
      set to KTIME_MAX since no other cores
      subscribed to tick broadcasting

      raw_spin_unlock(&tick_broadcast_lock);

    if (dev->next_event == KTIME_MAX)
      return HRTIMER_NORESTART
    // callback function exits without
       restarting the hrtimer                      //tick_broadcast_lock acquired
                                                   raw_spin_lock(&tick_broadcast_lock);

                                                   tick_broadcast_set_event()

                                                     clockevents_program_event()

                                                       dev->next_event = expires;

                                                       bc_set_next()

                                                         hrtimer_try_to_cancel()
                                                         //returns -1 since the timer
                                                         callback is active. Exits without
                                                         restarting the timer
  cpu_base->running = NULL;

The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.

Fixes: 5d1638acb9 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
2019-09-27 14:45:55 +02:00
Documentation Merge branch 'akpm' (patches from Andrew) 2019-09-26 10:29:42 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch s390 updates for the 5.4 merge window #2 2019-09-26 11:30:16 -07:00
block for-5.4/post-2019-09-24 2019-09-24 16:31:50 -07:00
certs Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" 2019-07-10 18:43:43 -07:00
crypto - crypto and DM crypt advances that allow the crypto API to reclaim 2019-09-21 10:40:37 -07:00
drivers s390 updates for the 5.4 merge window #2 2019-09-26 11:30:16 -07:00
fs NFS Client Updates for Linux 5.3 2019-09-26 12:20:14 -07:00
include NFS Client Updates for Linux 5.3 2019-09-26 12:20:14 -07:00
init mm: use CPU_BITS_NONE to initialize init_mm.cpu_bitmask 2019-09-24 15:54:10 -07:00
ipc ipc/sem.c: convert to use built-in RCU list checking 2019-09-25 17:51:41 -07:00
kernel tick: broadcast-hrtimer: Fix a race in bc_set_next 2019-09-27 14:45:55 +02:00
lib lib: untag user pointers in strn*_user 2019-09-25 17:51:41 -07:00
mm Fix hardened usercopy under CONFIG_DEBUG_VIRTUAL 2019-09-26 12:27:33 -07:00
net NFS Client Updates for Linux 5.3 2019-09-26 12:20:14 -07:00
samples rpmsg updates for v5.4 2019-09-22 10:58:15 -07:00
scripts checkpatch: check for nested (un)?likely() calls 2019-09-25 17:51:42 -07:00
security KEYS: trusted: correctly initialize digests and fix locking issue 2019-09-25 02:43:53 +03:00
sound sound fixes for 5.4-rc1 2019-09-24 16:46:16 -07:00
tools Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-26 15:38:07 -07:00
usr kbuild: clean compressed initramfs image 2019-09-24 15:54:07 -07:00
virt * s390: ioctl hardening, selftests 2019-09-18 09:49:13 -07:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2019-09-18 12:34:53 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS CREDITS: Update email address 2019-09-13 17:21:38 +03:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS tpmdd fixes for Linux v5.4 2019-09-25 09:40:24 -07:00
Makefile Modules updates for v5.4 2019-09-22 10:34:46 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.