From 4bcc595ccd80decb4245096e3d1258989c50ed41 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 8 Oct 2016 20:32:40 -0700 Subject: [PATCH 1/4] printk: reinstate KERN_CONT for printing continuation lines Long long ago the kernel log buffer was a buffered stream of bytes, very much like stdio in user space. It supported log levels by scanning the stream and noticing the log level markers at the beginning of each line, but if you wanted to print a partial line in multiple chunks, you just did multiple printk() calls, and it just automatically worked. Except when it didn't, and you had very confusing output when different lines got all mixed up with each other. Then you got fragment lines mixing with each other, or with non-fragment lines, because it was traditionally impossible to tell whether a printk() call was a continuation or not. To at least help clarify the issue of continuation lines, we added a KERN_CONT marker back in 2007 to mark continuation lines: 474925277671 ("printk: add KERN_CONT annotation"). That continuation marker was initially an empty string, and didn't actuall make any semantic difference. But it at least made it possible to annotate the source code, and have check-patch notice that a printk() didn't need or want a log level marker, because it was a continuation of a previous line. To avoid the ambiguity between a continuation line that had that KERN_CONT marker, and a printk with no level information at all, we then in 2009 made KERN_CONT be a real log level marker which meant that we could now reliably tell the difference between the two cases. 5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines") and we could take advantage of that to make sure we didn't mix up continuation lines with lines that just didn't have any loglevel at all. Then, in 2012, the kernel log buffer was changed to be a "record" based log, where each line was a record that has a loglevel and a timestamp. You can see the beginning of that conversion in commits e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface") 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer") with a number of follow-up commits to fix some painful fallout from that conversion. Over all, it took a couple of months to sort out most of it. But the upside was that you could have concurrent readers (and writers) of the kernel log and not have lines with mixed output in them. And one particular pain-point for the record-based kernel logging was exactly the fragmentary lines that are generated in smaller chunks. In order to still log them as one recrod, the continuation lines need to be attached to the previous record properly. However the explicit continuation record marker that is actually useful for this exact case was actually removed in aroundm the same time by commit 61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT") due to the incorrect belief that KERN_CONT wasn't meaningful. The ambiguity between "is this a continuation line" or "is this a plain printk with no log level information" was reintroduced, and in fact became an even bigger pain point because there was now the whole record-level merging of kernel messages going on. This patch reinstates the KERN_CONT as a real non-empty string marker, so that the ambiguity is fixed once again. But it's not a plain revert of that original removal: in the four years since we made KERN_CONT an empty string again, not only has the format of the log level markers changed, we've also had some usage changes in this area. For example, some ACPI code seems to use KERN_CONT _together_ with a log level, and now uses both the KERN_CONT marker and (for example) a KERN_INFO marker to show that it's an informational continuation of a line. Which is actually not a bad idea - if the continuation line cannot be attached to its predecessor, without the log level information we don't know what log level to assign to it (and we traditionally just assigned it the default loglevel). So having both a log level and the KERN_CONT marker is not necessarily a bad idea, but it does mean that we need to actually iterate over potentially multiple markers, rather than just a single one. Also, since KERN_CONT was still conceptually needed, and encouraged, but didn't actually _do_ anything, we've also had the reverse problem: rather than having too many annotations it has too few, and there is bit rot with code that no longer marks the continuation lines with the KERN_CONT marker. So this patch not only re-instates the non-empty KERN_CONT marker, it also fixes up the cases of bit-rot I noticed in my own logs. There are probably other cases where KERN_CONT will be needed to be added, either because it is new code that never dealt with the need for KERN_CONT, or old code that has bitrotted without anybody noticing. That said, we should strive to avoid the need for KERN_CONT. It does result in real problems for logging, and should generally not be seen as a good feature. If we some day can get rid of the feature entirely, because nobody does any fragmented printk calls, that would be lovely. But until that point, let's at mark the code that relies on the hacky multi-fragment kernel printk's. Not only does it avoid the ambiguity, it also annotates code as "maybe this would be good to fix some day". (That said, particularly during single-threaded bootup, the downsides of KERN_CONT are very limited. Things get much hairier when you have multiple threads going on and user level reading and writing logs too). Signed-off-by: Linus Torvalds --- drivers/tty/vt/vt.c | 6 +++--- include/linux/kern_levels.h | 2 +- include/linux/printk.h | 1 + kernel/printk/printk.c | 29 ++++++++++++----------------- security/selinux/ss/policydb.c | 4 ++-- 5 files changed, 19 insertions(+), 23 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 2705ca960e92..145a75864e44 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3182,11 +3182,11 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last, pr_info("Console: switching "); if (!deflt) - printk("consoles %d-%d ", first+1, last+1); + printk(KERN_CONT "consoles %d-%d ", first+1, last+1); if (j >= 0) { struct vc_data *vc = vc_cons[j].d; - printk("to %s %s %dx%d\n", + printk(KERN_CONT "to %s %s %dx%d\n", vc->vc_can_do_color ? "colour" : "mono", desc, vc->vc_cols, vc->vc_rows); @@ -3195,7 +3195,7 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last, update_screen(vc); } } else - printk("to %s\n", desc); + printk(KERN_CONT "to %s\n", desc); retval = 0; err: diff --git a/include/linux/kern_levels.h b/include/linux/kern_levels.h index c2ce155d83cc..f282d4e87258 100644 --- a/include/linux/kern_levels.h +++ b/include/linux/kern_levels.h @@ -20,7 +20,7 @@ * line that had no enclosing \n). Only to be used by core/arch code * during early bootup (a continued line is not SMP-safe otherwise). */ -#define KERN_CONT "" +#define KERN_CONT KERN_SOH "c" /* integer equivalents of KERN_ */ #define LOGLEVEL_SCHED -2 /* Deferred messages from sched code diff --git a/include/linux/printk.h b/include/linux/printk.h index 696a56be7d3e..eac1af8502bb 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -16,6 +16,7 @@ static inline int printk_get_level(const char *buffer) switch (buffer[1]) { case '0' ... '7': case 'd': /* KERN_DEFAULT */ + case 'c': /* KERN_CONT */ return buffer[1]; } } diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index eea6dbc2d8cf..7b449802089a 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -655,11 +655,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, * better readable output. 'c' in the record flags mark the first * fragment of a line, '+' the following. */ - if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT)) - cont = 'c'; - else if ((msg->flags & LOG_CONT) || - ((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX))) - cont = '+'; + if (msg->flags & LOG_CONT) + cont = (prev_flags & LOG_CONT) ? '+' : 'c'; return scnprintf(buf, size, "%u,%llu,%llu,%c;", (msg->facility << 3) | msg->level, seq, ts_usec, cont); @@ -1819,10 +1816,9 @@ asmlinkage int vprintk_emit(int facility, int level, /* strip kernel syslog prefix and extract log level or control flags */ if (facility == 0) { - int kern_level = printk_get_level(text); + int kern_level; - if (kern_level) { - const char *end_of_header = printk_skip_level(text); + while ((kern_level = printk_get_level(text)) != 0) { switch (kern_level) { case '0' ... '7': if (level == LOGLEVEL_DEFAULT) @@ -1830,14 +1826,13 @@ asmlinkage int vprintk_emit(int facility, int level, /* fallthrough */ case 'd': /* KERN_DEFAULT */ lflags |= LOG_PREFIX; + break; + case 'c': /* KERN_CONT */ + lflags |= LOG_CONT; } - /* - * No need to check length here because vscnprintf - * put '\0' at the end of the string. Only valid and - * newly printed level is detected. - */ - text_len -= end_of_header - text; - text = (char *)end_of_header; + + text_len -= 2; + text += 2; } } @@ -1852,7 +1847,7 @@ asmlinkage int vprintk_emit(int facility, int level, * Flush the conflicting buffer. An earlier newline was missing, * or another task also prints continuation lines. */ - if (cont.len && (lflags & LOG_PREFIX || cont.owner != current)) + if (cont.len && (!(lflags & LOG_CONT) || cont.owner != current)) cont_flush(LOG_NEWLINE); /* buffer line if possible, otherwise store it right away */ @@ -1874,7 +1869,7 @@ asmlinkage int vprintk_emit(int facility, int level, * a newline, flush and append the newline. */ if (cont.len) { - if (cont.owner == current && !(lflags & LOG_PREFIX)) + if (cont.owner == current && (lflags & LOG_CONT)) stored = cont_add(facility, level, text, text_len); cont_flush(LOG_NEWLINE); diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c index 992a31530825..bb9720b96c30 100644 --- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -527,9 +527,9 @@ static int policydb_index(struct policydb *p) printk(KERN_DEBUG "SELinux: %d users, %d roles, %d types, %d bools", p->p_users.nprim, p->p_roles.nprim, p->p_types.nprim, p->p_bools.nprim); if (p->mls_enabled) - printk(", %d sens, %d cats", p->p_levels.nprim, + printk(KERN_CONT ", %d sens, %d cats", p->p_levels.nprim, p->p_cats.nprim); - printk("\n"); + printk(KERN_CONT "\n"); printk(KERN_DEBUG "SELinux: %d classes, %d rules\n", p->p_classes.nprim, p->te_avtab.nel); From c362c7ff84634390b44cc1ae7808519596de162d Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 8 Oct 2016 22:02:09 -0700 Subject: [PATCH 2/4] printk: split out core logging code into helper function The code that actually decides how to log the message (whether to put it directly into the record log, whether to append it to an existing buffered log, or whether to start a new buffered log) is fairly non-obvious code in the middle of the vprintk_emit() function. Splitting that code up into a helper function makes it easier to understand, but perhaps more importantly also allows for the code to just return early out of the helper function once it has made the decision about where the new log content goes. Signed-off-by: Linus Torvalds --- kernel/printk/printk.c | 78 +++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 7b449802089a..090e201244d8 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1730,6 +1730,44 @@ static size_t cont_print_text(char *text, size_t size) return textlen; } +static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len) +{ + if (!(lflags & LOG_NEWLINE)) { + /* + * Flush the conflicting buffer. An earlier newline was missing, + * or another task also prints continuation lines. + */ + if (cont.len && (!(lflags & LOG_CONT) || cont.owner != current)) + cont_flush(LOG_NEWLINE); + + /* buffer line if possible, otherwise store it right away */ + if (cont_add(facility, level, text, text_len)) + return text_len; + + return log_store(facility, level, lflags | LOG_CONT, 0, dict, dictlen, text, text_len); + } + + /* + * If an earlier newline was missing and it was the same task, + * either merge it with the current buffer and flush, or if + * there was a race with interrupts (prefix == true) then just + * flush it out and store this line separately. + * If the preceding printk was from a different task and missed + * a newline, flush and append the newline. + */ + if (cont.len) { + bool stored = false; + + if (cont.owner == current && (lflags & LOG_CONT)) + stored = cont_add(facility, level, text, text_len); + cont_flush(LOG_NEWLINE); + if (stored) + return text_len; + } + + return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); +} + asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) @@ -1842,45 +1880,7 @@ asmlinkage int vprintk_emit(int facility, int level, if (dict) lflags |= LOG_PREFIX|LOG_NEWLINE; - if (!(lflags & LOG_NEWLINE)) { - /* - * Flush the conflicting buffer. An earlier newline was missing, - * or another task also prints continuation lines. - */ - if (cont.len && (!(lflags & LOG_CONT) || cont.owner != current)) - cont_flush(LOG_NEWLINE); - - /* buffer line if possible, otherwise store it right away */ - if (cont_add(facility, level, text, text_len)) - printed_len += text_len; - else - printed_len += log_store(facility, level, - lflags | LOG_CONT, 0, - dict, dictlen, text, text_len); - } else { - bool stored = false; - - /* - * If an earlier newline was missing and it was the same task, - * either merge it with the current buffer and flush, or if - * there was a race with interrupts (prefix == true) then just - * flush it out and store this line separately. - * If the preceding printk was from a different task and missed - * a newline, flush and append the newline. - */ - if (cont.len) { - if (cont.owner == current && (lflags & LOG_CONT)) - stored = cont_add(facility, level, text, - text_len); - cont_flush(LOG_NEWLINE); - } - - if (stored) - printed_len += text_len; - else - printed_len += log_store(facility, level, lflags, 0, - dict, dictlen, text, text_len); - } + printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len); logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); From 5e467652ffefb84b1159d5d8fda665c48e5fd840 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 9 Oct 2016 11:53:00 -0700 Subject: [PATCH 3/4] printk: re-organize log_output() to be more legible Avoid some duplicate logic now that we can return early, and update the comments for the new LOG_CONT world order. This also stops the continuation flushing from just using random record flags for the flushing action, instead taking the flags from the proper original line and updating them as we add continuations to it. Signed-off-by: Linus Torvalds --- kernel/printk/printk.c | 62 ++++++++++++++++++------------------------ 1 file changed, 27 insertions(+), 35 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 090e201244d8..f6fe60d725dd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1640,35 +1640,33 @@ static struct cont { bool flushed:1; /* buffer sealed and committed */ } cont; -static void cont_flush(enum log_flags flags) +static void cont_flush(void) { if (cont.flushed) return; if (cont.len == 0) return; - if (cont.cons) { /* * If a fragment of this line was directly flushed to the * console; wait for the console to pick up the rest of the * line. LOG_NOCONS suppresses a duplicated output. */ - log_store(cont.facility, cont.level, flags | LOG_NOCONS, + log_store(cont.facility, cont.level, cont.flags | LOG_NOCONS, cont.ts_nsec, NULL, 0, cont.buf, cont.len); - cont.flags = flags; cont.flushed = true; } else { /* * If no fragment of this line ever reached the console, * just submit it to the store and free the buffer. */ - log_store(cont.facility, cont.level, flags, 0, + log_store(cont.facility, cont.level, cont.flags, 0, NULL, 0, cont.buf, cont.len); cont.len = 0; } } -static bool cont_add(int facility, int level, const char *text, size_t len) +static bool cont_add(int facility, int level, enum log_flags flags, const char *text, size_t len) { if (cont.len && cont.flushed) return false; @@ -1679,7 +1677,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len) * the line gets too long, split it up in separate records. */ if (nr_ext_console_drivers || cont.len + len > sizeof(cont.buf)) { - cont_flush(LOG_CONT); + cont_flush(); return false; } @@ -1688,7 +1686,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len) cont.level = level; cont.owner = current; cont.ts_nsec = local_clock(); - cont.flags = 0; + cont.flags = flags; cont.cons = 0; cont.flushed = false; } @@ -1696,8 +1694,15 @@ static bool cont_add(int facility, int level, const char *text, size_t len) memcpy(cont.buf + cont.len, text, len); cont.len += len; + // The original flags come from the first line, + // but later continuations can add a newline. + if (flags & LOG_NEWLINE) { + cont.flags |= LOG_NEWLINE; + cont_flush(); + } + if (cont.len > (sizeof(cont.buf) * 80) / 100) - cont_flush(LOG_CONT); + cont_flush(); return true; } @@ -1732,39 +1737,26 @@ static size_t cont_print_text(char *text, size_t size) static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len) { - if (!(lflags & LOG_NEWLINE)) { - /* - * Flush the conflicting buffer. An earlier newline was missing, - * or another task also prints continuation lines. - */ - if (cont.len && (!(lflags & LOG_CONT) || cont.owner != current)) - cont_flush(LOG_NEWLINE); - - /* buffer line if possible, otherwise store it right away */ - if (cont_add(facility, level, text, text_len)) - return text_len; - - return log_store(facility, level, lflags | LOG_CONT, 0, dict, dictlen, text, text_len); - } - /* - * If an earlier newline was missing and it was the same task, - * either merge it with the current buffer and flush, or if - * there was a race with interrupts (prefix == true) then just - * flush it out and store this line separately. - * If the preceding printk was from a different task and missed - * a newline, flush and append the newline. + * If an earlier line was buffered, and we're a continuation + * write from the same process, try to add it to the buffer. */ if (cont.len) { - bool stored = false; + if (cont.owner == current && (lflags & LOG_CONT)) { + if (cont_add(facility, level, lflags, text, text_len)) + return text_len; + } + /* Otherwise, make sure it's flushed */ + cont_flush(); + } - if (cont.owner == current && (lflags & LOG_CONT)) - stored = cont_add(facility, level, text, text_len); - cont_flush(LOG_NEWLINE); - if (stored) + /* If it doesn't end in a newline, try to buffer the current line */ + if (!(lflags & LOG_NEWLINE)) { + if (cont_add(facility, level, lflags, text, text_len)) return text_len; } + /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } From bfd8d3f23b51018388be0411ccbc2d56277fe294 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 9 Oct 2016 12:16:57 -0700 Subject: [PATCH 4/4] printk: make reading the kernel log flush pending lines That will mean that any possible subsequent continuation will now be broken up onto a line of its own (since reading the log has finalized the beginning og the line), but if user space has activated system logging (or if there's a kernel message dump going on) that is the right thing to do. And now that we actually get the continuation flags _right_ for this all, the user space logger that is reading the kernel messages can actually see the continuation marker. Not that anybody seems to really bother with it (or care), but in theory user space can do its own message stitching. Signed-off-by: Linus Torvalds --- kernel/printk/printk.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index f6fe60d725dd..170d3a8040f4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -783,6 +783,8 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from) return ret; } +static void cont_flush(void); + static ssize_t devkmsg_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { @@ -798,6 +800,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf, if (ret) return ret; raw_spin_lock_irq(&logbuf_lock); + cont_flush(); while (user->seq == log_next_seq) { if (file->f_flags & O_NONBLOCK) { ret = -EAGAIN; @@ -860,6 +863,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) return -ESPIPE; raw_spin_lock_irq(&logbuf_lock); + cont_flush(); switch (whence) { case SEEK_SET: /* the first record */ @@ -898,6 +902,7 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait) poll_wait(file, &log_wait, wait); raw_spin_lock_irq(&logbuf_lock); + cont_flush(); if (user->seq < log_next_seq) { /* return error when data has vanished underneath us */ if (user->seq < log_first_seq) @@ -1284,6 +1289,7 @@ static int syslog_print(char __user *buf, int size) size_t skip; raw_spin_lock_irq(&logbuf_lock); + cont_flush(); if (syslog_seq < log_first_seq) { /* messages are gone, move to first one */ syslog_seq = log_first_seq; @@ -1343,6 +1349,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear) return -ENOMEM; raw_spin_lock_irq(&logbuf_lock); + cont_flush(); if (buf) { u64 next_seq; u64 seq; @@ -1504,6 +1511,7 @@ int do_syslog(int type, char __user *buf, int len, int source) /* Number of chars in the log buffer */ case SYSLOG_ACTION_SIZE_UNREAD: raw_spin_lock_irq(&logbuf_lock); + cont_flush(); if (syslog_seq < log_first_seq) { /* messages are gone, move to first one */ syslog_seq = log_first_seq; @@ -3016,6 +3024,7 @@ void kmsg_dump(enum kmsg_dump_reason reason) dumper->active = true; raw_spin_lock_irqsave(&logbuf_lock, flags); + cont_flush(); dumper->cur_seq = clear_seq; dumper->cur_idx = clear_idx; dumper->next_seq = log_next_seq; @@ -3106,6 +3115,7 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog, bool ret; raw_spin_lock_irqsave(&logbuf_lock, flags); + cont_flush(); ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len); raw_spin_unlock_irqrestore(&logbuf_lock, flags); @@ -3148,6 +3158,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog, goto out; raw_spin_lock_irqsave(&logbuf_lock, flags); + cont_flush(); if (dumper->cur_seq < log_first_seq) { /* messages are gone, move to first available one */ dumper->cur_seq = log_first_seq;