print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The boot_delay parameter affects all printk(), even if the log level
prevents visible output from the call. It results in delays greater than
the user intended without purpose.
This patch changes the behaviour of boot_delay to only delay output.
Signed-off-by: Andrew Cooks <acooks@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of BUG_ON(in_interrupt()), since that doesn't check for all
the newfangled stuff like preempt.
Note that this is valid since the console_sem is essentially used like
a real mutex with only two twists:
- we allow trylock from hardirq context
- across suspend/resume we lock the logical console_lock, but drop the
semaphore protecting the locking state.
Now that doesn't guarantee that no one is playing tricks in
single-thread atomic contexts at suspend/resume/boot time, but
- I couldn't find anything suspicious with some grepping,
- might_sleep shouldn't die,
- and I think the upside of catching more potential issues is worth
the risk of getting a might_sleep backtrace that would have been
save (and then dealing with that fallout).
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dave Airlie recently discovered a locking bug in the fbcon layer,
where a timer_del_sync (for the blinking cursor) deadlocks with the
timer itself, since both (want to) hold the console_lock:
https://lkml.org/lkml/2012/8/21/36
Unfortunately the console_lock isn't a plain mutex and hence has no
lockdep support. Which resulted in a few days wasted of tracking down
this bug (complicated by the fact that printk doesn't show anything
when the console is locked) instead of noticing the bug much earlier
with the lockdep splat.
Hence I've figured I need to fix that for the next deadlock involving
console_lock - and with kms/drm growing ever more complex locking
that'll eventually happen.
Now the console_lock has rather funky semantics, so after a quick irc
discussion with Thomas Gleixner and Dave Airlie I've quickly ditched
the original idead of switching to a real mutex (since it won't work)
and instead opted to annotate the console_lock with lockdep
information manually.
There are a few special cases:
- The console_lock state is protected by the console_sem, and usually
grabbed/dropped at _lock/_unlock time. But the suspend/resume code
drops the semaphore without dropping the console_lock (see
suspend_console/resume_console). But since the same thread that did
the suspend will do the resume, we don't need to fix up anything.
- In the printk code there's a special trylock, only used to kick off
the logbuffer printk'ing in console_unlock. But all that happens
while lockdep is disable (since printk does a few other evil
tricks). So no issue there, either.
- The console_lock can also be acquired form irq context (but only
with a trylock). lockdep already handles that.
This all leaves us with annotating the normal console_lock, _unlock
and _trylock functions.
And yes, it works - simply unloading a drm kms driver resulted in
lockdep complaining about the deadlock in fbcon_deinit:
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc2+ #552 Not tainted
-------------------------------------------------------
kms-reload/3577 is trying to acquire lock:
((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7
but task is already holding lock:
(console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (console_lock){+.+.+.}:
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81040190>] console_lock+0x59/0x5b
[<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c
[<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4
[<ffffffff810584a2>] worker_thread+0x1a7/0x24b
[<ffffffff8105ca29>] kthread+0x7f/0x87
[<ffffffff813b1204>] kernel_thread_helper+0x4/0x10
-> #0 ((&info->queue)){+.+...}:
[<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6
[<ffffffff81087440>] lock_acquire+0x95/0x105
[<ffffffff81058cab>] wait_on_work+0x3b/0xa7
[<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102
[<ffffffff81058e33>] cancel_work_sync+0xb/0xd
[<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc
[<ffffffff81264793>] bind_con_driver+0x145/0x263
[<ffffffff81264a45>] unbind_con_driver+0x14f/0x195
[<ffffffff8126540c>] store_bind+0x1ad/0x1c1
[<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f
[<ffffffff8116d884>] sysfs_write_file+0xe9/0x121
[<ffffffff811145b2>] vfs_write+0x9b/0xfd
[<ffffffff811147b7>] sys_write+0x3e/0x6b
[<ffffffff813b0039>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(console_lock);
lock((&info->queue));
lock(console_lock);
lock((&info->queue));
*** DEADLOCK ***
v2: Mark the lockdep_map static, noticed by Jani Nikula.
Cc: Dave Airlie <airlied@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The console_cpu_notify() function runs with interrupts disabled in the
CPU_DYING case. It therefore cannot block, for example, as will happen
when it calls console_lock(). Therefore, remove the CPU_DYING leg of
the switch statement to avoid this problem.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While tracking down a weird buffer overflow issue in a program that
looked to be sane, I started double checking the length returned by
syslog(SYSLOG_ACTION_READ_ALL, ...) to make sure it wasn't overflowing
the buffer.
Sure enough, it was. I saw this in strace:
11339 syslog(SYSLOG_ACTION_READ_ALL, "<5>[244017.708129] REISERFS (dev"..., 8192) = 8279
It turns out that the loops that calculate how much space the entries
will take when they're copied don't include the newlines and prefixes
that will be included in the final output since prev flags is passed as
zero.
This patch properly accounts for it and fixes the overflow.
CC: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vprintk_emit() prefix parsing should only be done for internal kernel
messages. This allows existing behavior to be kept in all cases.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current form of a KERN_<LEVEL> is "<.>".
Add printk_get_level and printk_skip_level functions to handle these
formats.
These functions centralize tests of KERN_<LEVEL> so a future modification
can change the KERN_<LEVEL> style and shorten the number of bytes consumed
by these headers.
[akpm@linux-foundation.org: fix build error and warning]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's the big driver core pull request for 3.6-rc1.
Unlike 3.5, this kernel should be a lot tamer, with the printk changes now
settled down. All we have here is some extcon driver updates, w1 driver
updates, a few printk cleanups that weren't needed for 3.5, but are good to
have now, and some other minor fixes/changes in the driver core.
All of these have been in the linux-next releases for a while now.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAlARgIUACgkQMUfUDdst+ynDHgCfRNwIB9L+zZvjcKE5e1BhDbUl
wVUAn398DFgbJ1+PjGkd1EMR2uVTh7Ou
=MIFu
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core changes from Greg Kroah-Hartman:
"Here's the big driver core pull request for 3.6-rc1.
Unlike 3.5, this kernel should be a lot tamer, with the printk changes
now settled down. All we have here is some extcon driver updates, w1
driver updates, a few printk cleanups that weren't needed for 3.5, but
are good to have now, and some other minor fixes/changes in the driver
core.
All of these have been in the linux-next releases for a while now.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
printk: Export struct log size and member offsets through vmcoreinfo
Drivers: hv: Change the hex constant to a decimal constant
driver core: don't trigger uevent after failure
extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device
sysfs: fail dentry revalidation after namespace change fix
sysfs: fail dentry revalidation after namespace change
extcon: spelling of detach in function doc
extcon: arizona: Stop microphone detection if we give up on it
extcon: arizona: Update cable reporting calls and split headset
PM / Runtime: Do not increment device usage counts before probing
kmsg - do not flush partial lines when the console is busy
kmsg - export "continuation record" flag to /dev/kmsg
kmsg - avoid warning for CONFIG_PRINTK=n compilations
kmsg - properly print over-long continuation lines
driver-core: Use kobj_to_dev instead of re-implementing it
driver-core: Move kobj_to_dev from genhd.h to device.h
driver core: Move deferred devices to the end of dpm_list before probing
driver core: move uevent call to driver_register
driver core: fix shutdown races with probe/remove(v3)
Extcon: Arizona: Add driver for Wolfson Arizona class devices
...
If used from KDB, the locked variants are prone to deadlocks (suppose we
got to the debugger w/ the logbuf lock held).
So, we have to implement a few routines that grab no logbuf lock.
Yet we don't need these functions in modules, so we don't export them.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function is no longer needed, so remove it.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are tools like makedumpfile and vmcore-dmesg which can extract
kernel log buffer from vmcore. Since we introduced structured logging,
that functionality is broken. Now user space tools need to know about
"struct log" and offsets of various fields to be able to parse struct
log data and extract text message or dictonary.
This patch exports some of the fields.
Currently I am not exporting log "level" info as that is a bitfield and
offsetof() bitfields can't be calculated. But if people start asking for
log level info in the output then we probably either need to seprate
out "level" or use bit shift operations for flags and level.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fragments of continuation lines are flushed to the console immediately. In
case the console is locked, the fragment must be queued up in the cont
buffer.
If the the console is busy and the continuation line is complete, but no part
of it was written to the console up to this point, we can just store the
entire line as a regular record and free the buffer earlier.
If the console is busy and earlier messages are already queued up, we
should not flush the fragments of continuation lines, but store them after
the queued up messages, to ensure the proper ordering.
This keeps the console output better readable in case printk()s race against
each other, or we receive over-long continuation lines we need to flush.
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In some cases we are forced to store individual records for a continuation
line print.
Export a flag to allow the external re-construction of the line. The flag
allows us to apply a similar logic externally which is used internally when
the console, /proc/kmsg or the syslog() output is printed.
$ cat /dev/kmsg
4,165,0,-;Free swap = 0kB
4,166,0,-;Total swap = 0kB
6,167,0,c;[
4,168,0,+;0
4,169,0,+;1
4,170,0,+;2
4,171,0,+;3
4,172,0,+;]
6,173,0,-;[0 1 2 3 ]
6,174,0,-;Console: colour VGA+ 80x25
6,175,0,-;console [tty0] enabled
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reserve PREFIX_MAX bytes in the LOG_LINE_MAX line when buffering a
continuation line, to be able to properly prefix the LOG_LINE_MAX
line with the syslog prefix and timestamp when printing it.
Reported-By: Dave Jones <davej@redhat.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In (the unlikely) case our continuation merge buffer is busy, we unfortunately
can not merge further continuation printk()s into a single record and have to
store them separately, which leads to split-up output of these lines when they
are printed.
Add some flags about newlines and prefix existence to these records and try to
reconstruct the full line again, when the separated records are printed.
Reported-By: Michael Neuling <mikey@neuling.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Restore support for partial reads of any size on /proc/kmsg, in case the
supplied read buffer is smaller than the record size.
Some people seem to think is is ia good idea to run:
$ dd if=/proc/kmsg bs=1 of=...
as a klog bridge.
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44211
Reported-by: Jukka Ollila <jiiksteri@gmail.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We suppress printing kmsg records to the console, which are already printed
immediately while we have received their fragments.
Newly registered boot consoles print the entire kmsg buffer during
registration. Clear the console-suppress flag after we skipped the record
during its first storage, so any later print will see these records as usual.
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The /proc/kmsg read() interface is internally simply wired up to a sequence
of syslog() syscalls, which might are racy between their checks and actions,
regarding concurrency.
In the (very uncommon) case of concurrent readers of /dev/kmsg, relying on
usual O_NONBLOCK behavior, the recently introduced mutex might block an
O_NONBLOCK reader in read(), when poll() returns for it, but another process
has already read the data in the meantime. We've seen that while running
artificial test setups and tools that "fight" about /proc/kmsg data.
This restores the original /proc/kmsg behavior, where in case of concurrent
read()s, poll() might wake up but the read() syscall will just return 0 to
the caller, while another process has "stolen" the data.
This is in the general case not the expected behavior, but it is the exact
same one, that can easily be triggered with a 3.4 kernel, and some tools
might just rely on it.
The mutex is not needed, the original integrity issue which introduced it,
is in the meantime covered by:
"fill buffer with more than a single message for SYSLOG_ACTION_READ"
116e90b23f
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the recent split of facility and level into separate variables,
we miss the facility value (always 0 for kernel-originated messages)
in the syslog prefix.
On Tue, Jul 3, 2012 at 12:45 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> Static checkers complain about the impossible condition here.
>
> In 084681d14e ('printk: flush continuation lines immediately to
> console'), we changed msg->level from being a u16 to being an unsigned
> 3 bit bitfield.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Non-printable characters in the log data are hex-escaped to ensure safe
post processing. We need to escape a backslash we find in the data, to be
able to distinguish it from a backslash we add for the escaping.
Also escape the non-printable character 127.
Thanks to Miloslav Trmac for the heads up.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In function devkmsg_read/writev/llseek/poll/open()..., the function
raw_spin_lock/unlock is used, there is potential deadlock case happening.
CPU1: thread1 doing the cat /dev/kmsg:
raw_spin_lock(&logbuf_lock);
while (user->seq == log_next_seq) {
when thread1 run here, at this time one interrupt is coming on CPU1 and running
based on this thread,if the interrupt handle called the printk which need the
logbuf_lock spin also, it will cause deadlock.
So we should use raw_spin_lock/unlock_irq here.
Acked-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix kernel-doc warnings in printk.c: use correct parameter name.
Warning(kernel/printk.c:2429): No description found for parameter 'buf'
Warning(kernel/printk.c:2429): Excess function parameter 'line' description in 'kmsg_dump_get_buffer'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In reviewing Kay's fix up patch: "printk: Have printk() never buffer its
data", I found two if statements that could be combined and optimized.
Put together the two 'cont.len && cont.owner == current' if statements
into a single one, and check if we need to call cont_add(). This also
removes the unneeded double cont_flush() calls.
Link: http://lkml.kernel.org/r/1340869133.876.10.camel@mop
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Continuation lines are buffered internally, intended to merge the
chunked printk()s into a single record, and to isolate potentially
racy continuation users from usual terminated line users.
This though, has the effect that partial lines are not printed to
the console in the moment they are emitted. In case the kernel
crashes in the meantime, the potentially interesting printed
information would never reach the consoles.
Here we share the continuation buffer with the console copy logic,
and partial lines are always immediately flushed to the available
consoles. They are still buffered internally to improve the
readability and integrity of the messages and minimize the amount
of needed record headers to store.
Signed-off-by: Kay Sievers <kay@vrfy.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The recent changes to the printk buffer management resulted in
SYSLOG_ACTION_READ to only return a single message, whereas previously
the buffer would get filled as much as possible. As, when too small to
fit everything, filling it to the last byte would be pretty ugly with
the new code, the patch arranges for as many messages as possible to
get returned in a single invocation. User space tools in at least all
SLES versions depend on the old behavior.
This at once addresses the issue attempted to get fixed with commit
b56a39ac26 ("printk: return -EINVAL if
the message len is bigger than the buf size"), and since that commit
widened the possibility for losing a message altogether, the patch
here assumes that this other commit would get reverted first
(otherwise the patch here won't apply).
Furthermore, this patch also addresses the problem dealt with in
commit 4a77a5a06e ("printk: use mutex
lock to stop syslog_seq from going wild"), so I'd recommend reverting
that one too (albeit there's no direct collision between the two).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kay Sievers <kay@vrfy.org>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit b56a39ac26.
A better patch from Jan will follow this to resolve the issue.
Acked-by: Kay Sievers <kay@vrfy.org>
Cc: Fengguang Wu <wfg@linux.intel.com>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 7ff9554bb5 (printk: convert
byte-buffer to variable-length record buffer) introduced a regression
by accidentally removing a "break" statement from inside the big
switch in printk's do_syslog(). The symptom of this bug is that the
"dmesg -C" command doesn't only clear the kernel's log buffer; it also
disables console logging.
This patch (as1561) fixes the regression by adding the missing
"break".
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Just like what devkmsg_read() does, return -EINVAL if the message len is
bigger than the buf size, or it will trigger a segfault error.
Acked-by: Kay Sievers <kay@vrfy.org>
Acked-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although syslog_seq and log_next_seq stuff are protected by logbuf_lock
spin log, it's not enough. Say we have two processes A and B, and let
syslog_seq = N, while log_next_seq = N + 1, and the two processes both
come to syslog_print at almost the same time. And No matter which
process get the spin lock first, it will increase syslog_seq by one,
then release spin lock; thus later, another process increase syslog_seq
by one again. In this case, syslog_seq is bigger than syslog_next_seq.
And latter, it would make:
wait_event_interruptiable(log_wait, syslog != log_next_seq)
don't wait any more even there is no new write comes. Thus it introduce
a infinite loop reading.
I can easily see this kind of issue by the following steps:
# cat /proc/kmsg # at meantime, I don't kill rsyslog
# So they are the two processes.
# xinit # I added drm.debug=6 in the kernel parameter line,
# so that it will produce lots of message and let that
# issue happen
It's 100% reproducable on my side. And my disk will be filled up by
/var/log/messages in a quite short time.
So, introduce a mutex_lock to stop syslog_seq from going wild just like
what devkmsg_read() does. It does fix this issue as expected.
v2: use mutex_lock_interruptiable() instead (comments from Kay)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-By: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Provide an iterator to receive the log buffer content, and convert all
kmsg_dump() users to it.
The structured data in the kmsg buffer now contains binary data, which
should no longer be copied verbatim to the kmsg_dump() users.
The iterator should provide reliable access to the buffer data, and also
supports proper log line-aware chunking of data while iterating.
Signed-off-by: Kay Sievers <kay@vrfy.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Reported-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Tested-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 7ff9554bb5, printk: convert
byte-buffer to variable-length record buffer, causes systems using
EABI to crash very early in the boot cycle. The first entry in struct
log is a u64, which for EABI must be 8 byte aligned.
Make use of __alignof__() so the compiler to decide the alignment, but
allow it to be overridden using CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS,
for systems which can perform unaligned access and want to save
a few bytes of space.
Tested on Orion5x and Kirkwood.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arrange the continuation printk() buffering to be fully separated from the
ordinary full line users.
Limit the exposure to races and wrong printk() line merges to users of
continuation only. Ordinary full line users racing against continuation
users will no longer affect each other.
Multiple continuation users from different threads, racing against each
other will not wrongly be merged into a single line, but printed as
separate lines.
Test output of a kernel module which starts two separate threads which
race against each other, one of them printing a single full terminated
line:
printk("(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)\n");
The other one printing the line, every character separate in a
continuation loop:
printk("(C");
for (i = 0; i < 58; i++)
printk(KERN_CONT "C");
printk(KERN_CONT "C)\n");
Behavior of single and non-thread-aware printk() buffer:
# modprobe printk-race
printk test init
(CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
CC(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
C(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
(CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
New behavior with separate and thread-aware continuation buffer:
# modprobe printk-race
printk test init
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA)
(CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
(CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC)
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a stub for prepend_timestamp() when CONFIG_PRINTK is not
enabled. Fixes this build error:
kernel/printk.c:1770:3: error: implicit declaration of function 'prepend_timestamp'
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__log_buf must be aligned, because a 64-bit value is written directly
to it as part of struct log. Alignment of the log entries is typically
handled by log_store(), but this only triggers for subsequent entries,
not the very first (or wrapped) entries.
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The output of the timestamps got lost with the conversion of the
kmsg buffer to records; restore the old behavior.
Document, that CONFIG_PRINTK_TIME now only controls the output of
the timestamps in the syslog() system call and on the console, and
not the recording of the timestamps.
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This prevents the merging of printk() continuation lines of different
threads, in the case they race against each other.
It should properly isolate "atomic" single-line printk() users from
continuation users, to make sure the single-line users will never be
merged with the racy continuation ones.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Tue, May 8, 2012 at 10:02 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> kernel/built-in.o: In function `devkmsg_read':
> printk.c:(.text+0x27e8): undefined reference to `__udivdi3'
> Most probably the "msg->ts_nsec / 1000" since
> ts_nsec is a u64 and this is a 32 bit build ...
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.
[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0
Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Record-based stream instead of the traditional byte stream
buffer. All records carry a 64 bit timestamp, the syslog facility
and priority in the record header.
- Records consume almost the same amount, sometimes less memory than
the traditional byte stream buffer (if printk_time is enabled). The record
header is 16 bytes long, plus some padding bytes at the end if needed.
The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
the timestamp and a newline.
- Buffer management is based on message sequence numbers. When records
need to be discarded, the reading heads move on to the next full
record. Unlike the byte-stream buffer, no old logged lines get
truncated or partly overwritten by new ones. Sequence numbers also
allow consumers of the log stream to get notified if any message in
the stream they are about to read gets discarded during the time
of reading.
- Better buffered IO support for KERN_CONT continuation lines, when printk()
is called multiple times for a single line. The use of KERN_CONT is now
mandatory to use continuation; a few places in the kernel need trivial fixes
here. The buffering could possibly be extended to per-cpu variables to allow
better thread-safety for multiple printk() invocations for a single line.
- Full-featured syslog facility value support. Different facilities
can tag their messages. All userspace-injected messages enforce a
facility value > 0 now, to be able to reliably distinguish them from
the kernel-generated messages. Independent subsystems like a
baseband processor running its own firmware, or a kernel-related
userspace process can use their own unique facility values. Multiple
independent log streams can co-exist that way in the same
buffer. All share the same global sequence number counter to ensure
proper ordering (and interleaving) and to allow the consumers of the
log to reliably correlate the events from different facilities.
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull scheduler changes for v3.4 from Ingo Molnar
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
printk: Make it compile with !CONFIG_PRINTK
sched/x86: Fix overflow in cyc2ns_offset
sched: Fix nohz load accounting -- again!
sched: Update yield() docs
printk/sched: Introduce special printk_sched() for those awkward moments
sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
sched: Cleanup cpu_active madness
sched: Fix load-balance wreckage
sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
sched: Ditch per cgroup task lists for load-balancing
sched: Rename load-balancing fields
sched: Move load-balancing arguments into helper struct
sched/rt: Do not submit new work when PI-blocked
sched/rt: Prevent idle task boosting
sched/wait: Add __wake_up_all_locked() API
sched/rt: Document scheduler related skip-resched-check sites
sched/rt: Use schedule_preempt_disabled()
sched/rt: Add schedule_preempt_disabled()
sched/rt: Do not throttle when PI boosting
sched/rt: Keep period timer ticking when rt throttling is active
...
Commit 3ccf3e8306 ("printk/sched: Introduce special
printk_sched() for those awkward moments") overlooked
an #ifdef, so move code around to respect these directives.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Link: http://lkml.kernel.org/r/1331811337.18960.179.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There's a few awkward printk()s inside of scheduler guts that people
prefer to keep but really are rather deadlock prone. Fudge around it
by storing the text in a per-cpu buffer and poll it using the existing
printk_tick() handler.
This will drop output when its more frequent than once a tick, however
only the affinity thing could possible go that fast and for that just
one should suffice to notify the admin he's done something silly..
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-wua3lmkt3dg8nfts66o6brne@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since commit 04c6862c05 ("kmsg_dump: add kmsg_dump() calls to the
reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
run on normal paths including poweroff and reboot.
This is less than ideal given pstore implementations that can only
represent single backtraces, since a reboot may overwrite a stored oops
before it's been picked up by userspace. In addition, some pstore
backends may have low performance and provide a significant delay in
reboot as a result.
This patch adds a printk.always_kmsg_dump kernel parameter (which can also
be changed from userspace). Without it, the code will only be run on
failure paths rather than on normal paths. The option can be enabled in
environments where there's a desire to attempt to audit whether or not a
reboot was cleanly requested or not.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>