43347d56c8
Live patching consistency model is of LEAVE_PATCHED_SET and SWITCH_THREAD. This means that all tasks in the system have to be marked one by one as safe to call a new patched function. Safe means when a task is not (sleeping) in a set of patched functions. That is, no patched function is on the task's stack. Another clearly safe place is the boundary between kernel and userspace. The patching waits for all tasks to get outside of the patched set or to cross the boundary. The transition is completed afterwards. The problem is that a task can block the transition for quite a long time, if not forever. It could sleep in a set of patched functions, for example. Luckily we can force the task to leave the set by sending it a fake signal, that is a signal with no data in signal pending structures (no handler, no sign of proper signal delivered). Suspend/freezer use this to freeze the tasks as well. The task gets TIF_SIGPENDING set and is woken up (if it has been sleeping in the kernel before) or kicked by rescheduling IPI (if it was running on other CPU). This causes the task to go to kernel/userspace boundary where the signal would be handled and the task would be marked as safe in terms of live patching. There are tasks which are not affected by this technique though. The fake signal is not sent to kthreads. They should be handled differently. They can be woken up so they leave the patched set and their TIF_PATCH_PENDING can be cleared thanks to stack checking. For the sake of completeness, if the task is in TASK_RUNNING state but not currently running on some CPU it doesn't get the IPI, but it would eventually handle the signal anyway. Second, if the task runs in the kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not handled on return from the interrupt. It would be handled on return to the userspace in the future when the fake signal is sent again. Stack checking deals with these cases in a better way. If the task was sleeping in a syscall it would be woken by our fake signal, it would check if TIF_SIGPENDING is set (by calling signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with ERESTART* return values are restarted in case of the fake signal (see do_signal()). EINTR is propagated back to the userspace program. This could disturb the program, but... * each process dealing with signals should react accordingly to EINTR return values. * syscalls returning EINTR happen to be quite common situation in the system even if no fake signal is sent. * freezer sends the fake signal and does not deal with EINTR anyhow. Thus EINTR values are returned when the system is resumed. The very safe marking is done in architectures' "entry" on syscall and interrupt/exception exit paths, and in a stack checking functions of livepatch. TIF_PATCH_PENDING is cleared and the next recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also call klp_update_patch_state() before do_signal(), so that recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING immediately and thus prevent a double call of do_signal(). Note that the fake signal is not sent to stopped/traced tasks. Such task prevents the patching to finish till it continues again (is not traced anymore). Last, sending the fake signal is not automatic. It is done only when admin requests it by writing 1 to signal sysfs attribute in livepatch sysfs directory. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: x86@kernel.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
---|---|---|
.. | ||
obsolete | ||
removed | ||
stable | ||
testing | ||
README |
README
This directory attempts to document the ABI between the Linux kernel and userspace, and the relative stability of these interfaces. Due to the everchanging nature of Linux, and the differing maturity levels, these interfaces should be used by userspace programs in different ways. We have four different levels of ABI stability, as shown by the four different subdirectories in this location. Interfaces may change levels of stability according to the rules described below. The different levels of stability are: stable/ This directory documents the interfaces that the developer has defined to be stable. Userspace programs are free to use these interfaces with no restrictions, and backward compatibility for them will be guaranteed for at least 2 years. Most interfaces (like syscalls) are expected to never change and always be available. testing/ This directory documents interfaces that are felt to be stable, as the main development of this interface has been completed. The interface can be changed to add new features, but the current interface will not break by doing this, unless grave errors or security problems are found in them. Userspace programs can start to rely on these interfaces, but they must be aware of changes that can occur before these interfaces move to be marked stable. Programs that use these interfaces are strongly encouraged to add their name to the description of these interfaces, so that the kernel developers can easily notify them if any changes occur (see the description of the layout of the files below for details on how to do this.) obsolete/ This directory documents interfaces that are still remaining in the kernel, but are marked to be removed at some later point in time. The description of the interface will document the reason why it is obsolete and when it can be expected to be removed. removed/ This directory contains a list of the old interfaces that have been removed from the kernel. Every file in these directories will contain the following information: What: Short description of the interface Date: Date created KernelVersion: Kernel version this feature first showed up in. Contact: Primary contact for this interface (may be a mailing list) Description: Long description of the interface and how to use it. Users: All users of this interface who wish to be notified when it changes. This is very important for interfaces in the "testing" stage, so that kernel developers can work with userspace developers to ensure that things do not break in ways that are unacceptable. It is also important to get feedback for these interfaces to make sure they are working in a proper way and do not need to be changed further. How things move between levels: Interfaces in stable may move to obsolete, as long as the proper notification is given. Interfaces may be removed from obsolete and the kernel as long as the documented amount of time has gone by. Interfaces in the testing state can move to the stable state when the developers feel they are finished. They cannot be removed from the kernel tree without going through the obsolete state first. It's up to the developer to place their interfaces in the category they wish for it to start out in. Notable bits of non-ABI, which should not under any circumstances be considered stable: - Kconfig. Userspace should not rely on the presence or absence of any particular Kconfig symbol, in /proc/config.gz, in the copy of .config commonly installed to /boot, or in any invocation of the kernel build process. - Kernel-internal symbols. Do not rely on the presence, absence, location, or type of any kernel symbol, either in System.map files or the kernel binary itself. See Documentation/process/stable-api-nonsense.rst.