Commit Graph

35332 Commits

Author SHA1 Message Date
Jakub Kicinski c358f95205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/dev.c
  b552766c87 ("can: dev: prevent potential information leak in can_fill_info()")
  3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")
  0a042c6ec9 ("can: dev: move netlink related code into seperate file")

  Code move.

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  57ac4a31c4 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
  214baf2287 ("net/mlx5e: Support HTB offload")

  Adjacent code changes

net/switchdev/switchdev.c
  20776b465c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
  ffb68fc58e ("net: switchdev: remove the transaction structure from port object notifiers")
  bae33f2b5a ("net: switchdev: remove the transaction structure from port attributes")

  Transaction parameter gets dropped otherwise keep the fix.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 17:09:31 -08:00
Thomas Gleixner 34b1a1ce14 futex: Handle faults correctly for PI futexes
fixup_pi_state_owner() tries to ensure that the state of the rtmutex,
pi_state and the user space value related to the PI futex are consistent
before returning to user space. In case that the user space value update
faults and the fault cannot be resolved by faulting the page in via
fault_in_user_writeable() the function returns with -EFAULT and leaves
the rtmutex and pi_state owner state inconsistent.

A subsequent futex_unlock_pi() operates on the inconsistent pi_state and
releases the rtmutex despite not owning it which can corrupt the RB tree of
the rtmutex and cause a subsequent kernel stack use after free.

It was suggested to loop forever in fixup_pi_state_owner() if the fault
cannot be resolved, but that results in runaway tasks which is especially
undesired when the problem happens due to a programming error and not due
to malice.

As the user space value cannot be fixed up, the proper solution is to make
the rtmutex and the pi_state consistent so both have the same owner. This
leaves the user space value out of sync. Any subsequent operation on the
futex will fail because the 10th rule of PI futexes (pi_state owner and
user space value are consistent) has been violated.

As a consequence this removes the inept attempts of 'fixing' the situation
in case that the current task owns the rtmutex when returning with an
unresolvable fault by unlocking the rtmutex which left pi_state::owner and
rtmutex::owner out of sync in a different and only slightly less dangerous
way.

Fixes: 1b7558e457 ("futexes: fix fault handling in futex_lock_pi")
Reported-by: gzobqq@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:11:00 +01:00
Thomas Gleixner f2dac39d93 futex: Simplify fixup_pi_state_owner()
Too many gotos already and an upcoming fix would make it even more
unreadable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:10:59 +01:00
Thomas Gleixner 6ccc84f917 futex: Use pi_state_update_owner() in put_pi_state()
No point in open coding it. This way it gains the extra sanity checks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:10:59 +01:00
Thomas Gleixner 2156ac1934 rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
Nothing uses the argument. Remove it as preparation to use
pi_state_update_owner().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:10:58 +01:00
Thomas Gleixner c5cade200a futex: Provide and use pi_state_update_owner()
Updating pi_state::owner is done at several places with the same
code. Provide a function for it and use that at the obvious places.

This is also a preparation for a bug fix to avoid yet another copy of the
same code or alternatively introducing a completely unpenetratable mess of
gotos.

Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:10:58 +01:00
Thomas Gleixner 04b79c5520 futex: Replace pointless printk in fixup_owner()
If that unexpected case of inconsistent arguments ever happens then the
futex state is left completely inconsistent and the printk is not really
helpful. Replace it with a warning and make the state consistent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:10:58 +01:00
Thomas Gleixner 12bb3f7f1b futex: Ensure the correct return value from futex_lock_pi()
In case that futex_lock_pi() was aborted by a signal or a timeout and the
task returned without acquiring the rtmutex, but is the designated owner of
the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to
establish consistent state. In that case it invokes fixup_pi_state_owner()
which in turn tries to acquire the rtmutex again. If that succeeds then it
does not propagate this success to fixup_owner() and futex_lock_pi()
returns -EINTR or -ETIMEOUT despite having the futex locked.

Return success from fixup_pi_state_owner() in all cases where the current
task owns the rtmutex and therefore the futex and propagate it correctly
through fixup_owner(). Fixup the other callsite which does not expect a
positive return value.

Fixes: c1e2f0eaf0 ("futex: Avoid violating the 10th rule of futex")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
2021-01-26 15:10:58 +01:00
Linus Torvalds 007ad27d7b Printk urgent fixup for 5.11-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmAOyaUACgkQUqAMR0iA
 lPJmKw//cBHluHINQnKgGMfbdjyJilIm2Atd5jm95ysmblXjgAzbB8AJqnKFqhmH
 hC+YiTCa7t84tw2Vh2NpTnosRFjdq8jtjjFpGlqLuLoREprSSRRmKyOob15XIm5d
 5aio1Vh9ks+VoWwHUFfc6zyJuTNGfwRPWgIxA9KdQvt4C8+M3IiQjT3UcdXYzWOB
 Of74rBLZzlImRasL9SFW06oLxZN02CSsA3I49D37WezHFRJ0LlptVPAOVbmTsPXa
 WnX7Q8xOe70WsFH23so7+4H4mieSpkBqNz/+WHFBAekohae0BW5nnK84Nl9+uuUh
 3FF28fXZW3xyGd5JmmBD70tVCV/nUKTSOrkc6Eo0dcrwpwxq+R1wPAojgSUgru8L
 1em015/kRC1DRejCr1mdT3muySQF9gQ7/nEa+7cpo4bqmUb36TNN1pS9OjJMg4/7
 4ZgoKB2lQ1UfueDqiPMgDz/JJ91jRcUCkB86MotyaOGO7FI8oaRSuSp8rvorQ13n
 5LEHV8sX1ENIcu9jTcqN4oxEBg8dXVkxHwLFqWB283Rugk21L0f1IYbUbRPH4RF1
 L5AVoaWBxA6ftXDI/OywszrWtr93sDqcmdebmqT0CTXOiHtTqt2cad8InjWHDh4h
 wtIdV22ak7glMLR0DdMAPKjggjzHsXJLMy9SsowEKVzDN52jVcs=
 =pBad
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fix from Petr Mladek:
 "The fix of a potential buffer overflow in 5.11-rc5 introduced another
  one. The trailing '\0' might be written up to the message "len" past
  the buffer. Fortunately, it is not that easy to hit.

  Most readers use 1kB buffers for a single message. Typical messages
  fit into the temporary buffer with enough reserve.

  Also readers do not rely on the '\0'. It is related to the previous
  fix. Some readers required the space for the trailing '\0'. We decided
  to write it there to avoid such regressions in the future.

  The most realistic victims are dumpers using kmsg_dump_get_buffer().
  They are filling the entire buffer with as many messages as possible.
  They are typically used when handling panic()"

* tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: fix string termination for record_print_text()
2021-01-25 10:19:40 -08:00
Petr Mladek 61bb17da44 Merge branch 'printk-rework' into for-linus 2021-01-25 14:29:35 +01:00
John Ogness 08d60e5999 printk: fix string termination for record_print_text()
Commit f0e386ee0c ("printk: fix buffer overflow potential for
print_text()") added string termination in record_print_text().
However it used the wrong base pointer for adding the terminator.
This led to a 0-byte being written somewhere beyond the buffer.

Use the correct base pointer when adding the terminator.

Fixes: f0e386ee0c ("printk: fix buffer overflow potential for print_text()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.de
2021-01-25 10:37:08 +01:00
Linus Torvalds e68061375f - Fix a kernel panic in mips-cpu due to invalid irq domain hierarchy.
- Fix to not lose IPIs on bcm2836.
 
  - Fix for a bogus marking of ITS devices as shared due to unitialized
    stack variable.
 
  - Clear a phantom interrupt on qcom-pdc to unblock suspend.
 
  - Small cleanups, warning and build fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmANZ4kACgkQEsHwGGHe
 VUohthAAtkwq9kXmn1H4hQuay8Zly7L5D7dn6EUrVl+347cZi+DZmjYRv0Mu/5Ad
 L7KhLHDJ6XqGW4qsPxGkYJPgaWiXyo+C0OQXLdwFcrT0kOvjbOYoFCAarRkqIdQb
 5jr6J6t1MWb5Ktb8xTebUXv5kn91rvYj8ArcdlaVB8Kdp6jB8Irq9AROjZMK3Seh
 AHC9jWnf3tGPUXRhNk0rpBeDzdkI6G82KjO2LdZMT2SC9WLX2vH710wBeY8+Um42
 5rseckupLl7KJb02WpK0IDi/6bidWc9K3NsOh9QqjQme4O9a0459q7f6SoR8/PYi
 FWwB95NwGWahnVk26qfLwiV/Sd3RKM72uQxICu6XaI0+asrbOflIuK1NbEAbpf+S
 ZgtVqV3c8i09ij3NsJIWsZ5vjwtFyRUU26JgTxzWKzRf4FftSW0pHWr/VvMflXhK
 l0E68kvB4KmV7Bd5r3gavctWVFeMrZhTGAHxiiG91oISUpp+X634CCa4NyMh+Ryr
 6V+LhIADlzNOuyzHsG8xpsUmtSwutcsHSFQtqUkcXpG/1o2OpSPmhYQRySD4lz+4
 dpo3CFtvrBUWRAHo2bsaLorqU0LlBFc+h9sTUlYMLgyRKkOQtde1UN1lTIu7z0Ll
 GFqAYmttyjSlDKo/5MeBQeWp4Oq8d+Ckz9xkNIL4qJV0LLTUvMs=
 =Z32e
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix a kernel panic in mips-cpu due to invalid irq domain hierarchy.

 - Fix to not lose IPIs on bcm2836.

 - Fix for a bogus marking of ITS devices as shared due to unitialized
   stack variable.

 - Clear a phantom interrupt on qcom-pdc to unblock suspend.

 - Small cleanups, warning and build fixes.

* tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Export irq_check_status_bit()
  irqchip/mips-cpu: Set IPI domain parent chip
  irqchip/pruss: Simplify the TI_PRUSS_INTC Kconfig
  irqchip/loongson-liointc: Fix build warnings
  driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
  irqchip/bcm2836: Fix IPI acknowledgement after conversion to handle_percpu_devid_irq
  irqchip/irq-sl28cpld: Convert comma to semicolon
  genirq/msi: Initialize msi_alloc_info before calling msi_domain_prepare_irqs()
2021-01-24 10:24:20 -08:00
Linus Torvalds 24c56ee06c - Correct the marking of kthreads which are supposed to run on a specific,
single CPU vs such which are affine to only one CPU, mark per-cpu workqueue
  threads as such and make sure that marking "survives" CPU hotplug. Fix CPU
  hotplug issues with such kthreads.
 
  - A fix to not push away tasks on CPUs coming online.
 
  - Have workqueue CPU hotplug code use cpu_possible_mask when breaking affinity
    on CPU offlining so that pending workers can finish on newly arrived onlined
    CPUs too.
 
  - Dump tasks which haven't vacated a CPU which is currently being unplugged.
 
  - Register a special scale invariance callback which gets called on resume
  from RAM to read out APERF/MPERF after resume and thus make the schedutil
  scaling governor more precise.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmANYCAACgkQEsHwGGHe
 VUo+OBAAjfqkijDlXiGX6lrT5gRx5NZICpeMgbWa7J13XHT1ysD/b0fMGFIUyF6k
 aszDLTl8U/S1/qGAYlzTSPAFcdZ+ENiFqQ48ozMk4jZC3p0quHTjs/PdiSG6kYBi
 +e4smht+bSyLKxsG8hN0kJ+mLEd+uIQ13kP4YkxPgWbJ9WNP/U6HHGBo0rBchtSe
 Kn6bdd8CfwmC6rSazp7kdQoFoWeQaoMI1ODX3VphK1GtL1wq8WSICzRhpg3caeyG
 3lCIddoNW9mCA9Nkc6R6HeV3uW9JGkPAjnmtTIEHDbg9pib7xNT978ieTQuqNDCi
 DlAHDGumzoaiVJZhD/1fj/RXMJr2YUHxtrXWNsXpiKJ9g8Tn+WC0UW/4+Mx2L/km
 0RSoXJlMs1fGopS2I/fObZ6RPhmg4D+gJsMCdaHQzX4NgxZAGhNNPxMckZ0IM8A0
 2NNXSHUZHVTHeJEW0E/glOcpWb5hG+vDwiBMNEWfTwYpTfrw2EEOZaKniZE7WlSL
 4ItM9rkLGl1KToJzAH4A0oUtSy3vtSCo8B1noGlc09Lj+oCIBlr81z9+C79a2oxG
 qE7Xd4X7y7Qs3JeCbRZWQa7/2Kf1v4XnjELrJJeCZC85r0ZqJDwRX8w7lkmW2XPU
 m4J2prr/DDZSqrRh23/xC1fsU+vcBKSfKUFKAH4Lg2VIaUfSUEk=
 =2DAF
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Correct the marking of kthreads which are supposed to run on a
   specific, single CPU vs such which are affine to only one CPU, mark
   per-cpu workqueue threads as such and make sure that marking
   "survives" CPU hotplug. Fix CPU hotplug issues with such kthreads.

 - A fix to not push away tasks on CPUs coming online.

 - Have workqueue CPU hotplug code use cpu_possible_mask when breaking
   affinity on CPU offlining so that pending workers can finish on newly
   arrived onlined CPUs too.

 - Dump tasks which haven't vacated a CPU which is currently being
   unplugged.

 - Register a special scale invariance callback which gets called on
   resume from RAM to read out APERF/MPERF after resume and thus make
   the schedutil scaling governor more precise.

* tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Relax the set_cpus_allowed_ptr() semantics
  sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
  sched: Prepare to use balance_push in ttwu()
  workqueue: Restrict affinity change to rescuer
  workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
  kthread: Extract KTHREAD_IS_PER_CPU
  sched: Don't run cpu-online with balance_push() enabled
  workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
  sched/core: Print out straggler tasks in sched_cpu_dying()
  x86: PM: Register syscore_ops for scale invariance
2021-01-24 10:09:20 -08:00
Linus Torvalds 025929f468 - Fix an integer overflow in the NTP RTC synchronization which lead to latter
happening every 2 seconds instead of the intended every 11 minutes.
 
  - Get rid of now unused get_seconds().
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmANVSEACgkQEsHwGGHe
 VUr6pQ/+MLoKCld4bR4rXbzD2TSKaIaHt7BuYCOcE4Z3PyxpfYFdX0iJVn8J277X
 VAvbHOEJLFijGylpCD5vcMnK018GaW5DTvPgb7ijOEgdzHzqOoC/a30WMJWtnBVo
 dKrtEgbQPgB8eSPuQah5RSWRd6lrywJp79WNbB4k7yqsTVvQLQmbKtjc656jCvCI
 PKpc7zT/UZcChue6zZUtSXnLPBOM1aog0/7pXbViZc2F8Lv3ulIZy27o9sXSetu2
 bsdZzexKEY/Thqx1LiEOt5PhQm5gKSBsBWuS+wPIyI0V0E5eb0rnAlTLyx7hE3yL
 9R9axXPhGH5gRM05ikYaF2k4VscImgGLfT18blIb4jWSZ/C+HCtlyfJmcx5VgG4Z
 F4ZC5esiUD9vM++cdrw8Xeu/QmP+yHBrcspKshjK/NcaEnTm7DyhtyGbFzdEglti
 NpnNyhrX6EX1AY9RoVpLS3delytbdYABK2BXUBdVpLb034BG5f9p2Z6D1hsmzqE+
 rOLxTS0d34enfFglwxCjUkwrO/weEvdpCeqcnm0TyLtq6BGoGGbkKcf510MQ1CB+
 UqZxS7WIeYF1IIrm4YVD/ONiH6O1hn/QK6AmHFivgW5fKxvZXIJG850+kYX1JNca
 LUfF43xi6OfgGdllUWaPsi6fRlkvP+YVd7y37d9pSC7QStdysLQ=
 =fFOL
 -----END PGP SIGNATURE-----

Merge tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Borislav Petkov:

 - Fix an integer overflow in the NTP RTC synchronization which led to
   the latter happening every 2 seconds instead of the intended every 11
   minutes.

 - Get rid of now unused get_seconds().

* tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ntp: Fix RTC synchronization on 32-bit platforms
  timekeeping: Remove unused get_seconds()
2021-01-24 09:58:38 -08:00
Linus Torvalds 17b6c49da3 - Add a new Intel model number for Alder Lake
- Differentiate which aspects of the FPU state get saved/restored when the FPU
    is used in-kernel and fix a boot crash on K7 due to early MXCSR access before
    CR4.OSFXSR is even set.
 
  - A couple of noinstr annotation fixes
 
  - Correct die ID setting on AMD for users of topology information which need
    the correct die ID
 
  - A SEV-ES fix to handle string port IO to/from kernel memory properly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmANUr0ACgkQEsHwGGHe
 VUos4hAAlBik/z+y+DaZGJyxtpST2YQaEbwbW3UMqyLsdVnLTTRnKzC1T+fEfD2Q
 SxtCPYH5iuPbCgOOoQboWt6Aa53JlX9bRBZ/87Ub/ELJ9NgMxMQFXAiaDZAAY6Zy
 L2B13KpoGOifPjrGDgksnafyqYv1CYesiArfOffHgvC3/0j7ONdda2SRDQ697TBw
 FSV/WfUjCo0+JdXRRaP6YH5t9MxFerHxVH38xTDFwXikS9CVyddosLo5EP2wAQvi
 5+160i2jB25vyMEsFBr5wE0xDpWLUdClVpzHXXPG2i0P+NHATiBcreTMPzeYOUXu
 Hfc/y4ukOVDoMGlHLNKHq89alI87soMJIEjm2sAG1ZIypKyMJw7YUXQNRR3TcP0U
 c7/C3W1mCWD1+8nLtlIMM0Z20DacQOf9YWko95+uh08+S52KpTOgnx+mpoZjK1PQ
 Wv9HxPJKycrgRNhfverN5FSiOEW/DdvqNfVHTjuuzNLyKdM1NoZ/YTIyABk4RfFq
 USUnC5rk4GqvCYdaLTEKkAJvLCmRKgVYd75Rc4/pPKILS6kv82vpj3BjClBaH0h1
 yrvpafvXzOhwKP/J5q0vm57NJdqPZwuW4Ah+74tptmQL4rga84U4FOs3JpNJq0uu
 1mj6xSFD8ZyI11BSkYbZAHTy1eNERze+azftCSPq/6EifYvqnsE=
 =3rZM
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a new Intel model number for Alder Lake

 - Differentiate which aspects of the FPU state get saved/restored when
   the FPU is used in-kernel and fix a boot crash on K7 due to early
   MXCSR access before CR4.OSFXSR is even set.

 - A couple of noinstr annotation fixes

 - Correct die ID setting on AMD for users of topology information which
   need the correct die ID

 - A SEV-ES fix to handle string port IO to/from kernel memory properly

* tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add another Alder Lake CPU to the Intel family
  x86/mmx: Use KFPU_387 for MMX string operations
  x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  x86/topology: Make __max_die_per_package available unconditionally
  x86: __always_inline __{rd,wr}msr()
  x86/mce: Remove explicit/superfluous tracing
  locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
  locking/lockdep: Cure noinstr fail
  x86/sev: Fix nonistr violation
  x86/entry: Fix noinstr fail
  x86/cpu/amd: Set __max_die_per_package on AMD
  x86/sev-es: Handle string port IO to kernel memory properly
2021-01-24 09:46:05 -08:00
Linus Torvalds c509ce2378 for-linus-2021-01-24
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYA1opwAKCRCRxhvAZXjc
 osnpAP4wjExvtwgh1eA7IgBPtAFzL1EPK2lrv7WM6yuMJNh23wEAxU+quoNrBT7U
 R5UQvmXi2SwxjeGXR/BTLq/HU9rSJA4=
 =6YJX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull misc fixes from Christian Brauner:

 - Jann reported sparse complaints because of a missing __user
   annotation in a helper we added way back when we added
   pidfd_send_signal() to avoid compat syscall handling. Fix it.

 - Yanfei replaces a reference in a comment to the _do_fork() helper I
   removed a while ago with a reference to the new kernel_clone()
   replacement

 - Alexander Guril added a simple coding style fix

* tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  kthread: remove comments about old _do_fork() helper
  Kernel: fork.c: Fix coding style: Do not use {} around single-line statements
  signal: Add missing __user annotation to copy_siginfo_from_user_any
2021-01-24 09:35:28 -08:00
Peter Zijlstra 741ba80f6f sched: Relax the set_cpus_allowed_ptr() semantics
Now that we have KTHREAD_IS_PER_CPU to denote the critical per-cpu
tasks to retain during CPU offline, we can relax the warning in
set_cpus_allowed_ptr(). Any spurious kthread that wants to get on at
the last minute will get pushed off before it can run.

While during CPU online there is no harm, and actual benefit, to
allowing kthreads back on early, it simplifies hotplug code and fixes
a number of outstanding races.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lai jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.240724591@infradead.org
2021-01-22 15:09:44 +01:00
Peter Zijlstra 5ba2ffba13 sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
Prior to commit 1cf12e08bc ("sched/hotplug: Consolidate task
migration on CPU unplug") we'd leave any task on the dying CPU and
break affinity and force them off at the very end.

This scheme had to change in order to enable migrate_disable(). One
cannot wait for migrate_disable() to complete while stuck in
stop_machine(). Furthermore, since we need at the very least: idle,
hotplug and stop threads at any point before stop_machine, we can't
break affinity and/or push those away.

Under the assumption that all per-cpu kthreads are sanely handled by
CPU hotplug, the new code no long breaks affinity or migrates any of
them (which then includes the critical ones above).

However, there's an important difference between per-cpu kthreads and
kthreads that happen to have a single CPU affinity which is lost. The
latter class very much relies on the forced affinity breaking and
migration semantics previously provided.

Use the new kthread_is_per_cpu() infrastructure to tighten
is_per_cpu_kthread() and fix the hot-unplug problems stemming from the
change.

Fixes: 1cf12e08bc ("sched/hotplug: Consolidate task migration on CPU unplug")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.102416009@infradead.org
2021-01-22 15:09:44 +01:00
Peter Zijlstra 975707f227 sched: Prepare to use balance_push in ttwu()
In preparation of using the balance_push state in ttwu() we need it to
provide a reliable and consistent state.

The immediate problem is that rq->balance_callback gets cleared every
schedule() and then re-set in the balance_push_callback() itself. This
is not a reliable signal, so add a variable that stays set during the
entire time.

Also move setting it before the synchronize_rcu() in
sched_cpu_deactivate(), such that we get guaranteed visibility to
ttwu(), which is a preempt-disable region.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.966069627@infradead.org
2021-01-22 15:09:43 +01:00
Peter Zijlstra 640f17c824 workqueue: Restrict affinity change to rescuer
create_worker() will already set the right affinity using
kthread_bind_mask(), this means only the rescuer will need to change
it's affinity.

Howveer, while in cpu-hot-unplug a regular task is not allowed to run
on online&&!active as it would be pushed away quite agressively. We
need KTHREAD_IS_PER_CPU to survive in that environment.

Therefore set the affinity after getting that magic flag.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org
2021-01-22 15:09:43 +01:00
Peter Zijlstra 5c25b5ff89 workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
Mark the per-cpu workqueue workers as KTHREAD_IS_PER_CPU.

Workqueues have unfortunate semantics in that per-cpu workers are not
default flushed and parked during hotplug, however a subset does
manual flush on hotplug and hard relies on them for correctness.

Therefore play silly games..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.693465814@infradead.org
2021-01-22 15:09:42 +01:00
Peter Zijlstra ac687e6e8c kthread: Extract KTHREAD_IS_PER_CPU
There is a need to distinguish geniune per-cpu kthreads from kthreads
that happen to have a single CPU affinity.

Geniune per-cpu kthreads are kthreads that are CPU affine for
correctness, these will obviously have PF_KTHREAD set, but must also
have PF_NO_SETAFFINITY set, lest userspace modify their affinity and
ruins things.

However, these two things are not sufficient, PF_NO_SETAFFINITY is
also set on other tasks that have their affinities controlled through
other means, like for instance workqueues.

Therefore another bit is needed; it turns out kthread_create_per_cpu()
already has such a bit: KTHREAD_IS_PER_CPU, which is used to make
kthread_park()/kthread_unpark() work correctly.

Expose this flag and remove the implicit setting of it from
kthread_create_on_cpu(); the io_uring usage of it seems dubious at
best.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org
2021-01-22 15:09:42 +01:00
Peter Zijlstra 22f667c97a sched: Don't run cpu-online with balance_push() enabled
We don't need to push away tasks when we come online, mark the push
complete right before the CPU dies.

XXX hotplug state machine has trouble with rollback here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.415606087@infradead.org
2021-01-22 15:09:42 +01:00
Lai Jiangshan 547a77d02f workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
The scheduler won't break affinity for us any more, and we should
"emulate" the same behavior when the scheduler breaks affinity for
us.  The behavior is "changing the cpumask to cpu_possible_mask".

And there might be some other CPUs online later while the worker is
still running with the pending work items.  The worker should be allowed
to use the later online CPUs as before and process the work items ASAP.
If we use cpu_active_mask here, we can't achieve this goal but
using cpu_possible_mask can.

Fixes: 06249738a4 ("workqueue: Manually break affinity on hotplug")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210111152638.2417-4-jiangshanlai@gmail.com
2021-01-22 15:09:41 +01:00
Valentin Schneider 36c6e17bf1 sched/core: Print out straggler tasks in sched_cpu_dying()
Since commit

  1cf12e08bc ("sched/hotplug: Consolidate task migration on CPU unplug")

tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.

Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.

Print the dying rq's tasks to shed some light on the stragglers.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210113183141.11974-1-valentin.schneider@arm.com
2021-01-22 15:09:41 +01:00
Linus Torvalds 2561bbbe2e Printk fixes for 5.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmAJmRwACgkQUqAMR0iA
 lPLxAxAAgBEj8XqRiQh51mDyGCFkHR0NJ2WqSRa50HHhw/mCipwl40jwTsOol2xs
 REodcmswBBvZVApqtWlN+U24j3VaU0gyjpt9ndfkkG9c5PeKlpryGnlABd4Zw3SP
 m8NHkO2rQtdAgVm6AaNd17mYXtZV9a820SNhiDrqxylNDHR1DJW31MoQTpyY4SqE
 zcFyOFabA/zAc869IEnbpiZ8AO6n1lQCJa0C+D+mI70jgy7YEmD0eGkxRTpjoMDS
 ms/iTjelxfWHsuNzQ+85q/931hdz53/Ri3a+HHjDVZSL/e/yxRyr9sOM0XLPVQDy
 2xoHbqTZo4YCYaWSj0ePuW9Sl2yzxuuDtovNEUyssjP+22YV9en9qiHgViZqW9HN
 saruWYZytnpGp4YMFChgen4rxd7jbdxZZzaYOJdJWy2aRn3MfDXrg7aO97MhU4HS
 TeHGee430VYrUc28Nzhnd+wGyAtPDVlGSnZdT+AIo+Uv4a4iK8ULv/58mTM2DvrJ
 Nzln/pwSBf7ddarO8pvdAggmsBAxyBEQ8NUPbeSwiZ2BEIRnO+hT7mDLZ/6TNY83
 jeBDZP7hxo5MfDMhUPkoTdxlZxETnHFrihopChcHjtw+Gz1mb27n1iFactWaLirg
 J+E+/dKbdmxtcKyT3NoqGvcEj08pk/ImtuCPK11b23CzUFTkLrk=
 =l1hI
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Fix line counting and buffer size calculation. Both regressions
   caused that a reader buffer might not get filled as much as possible.

 - Restore non-documented behavior of printk() reader API and make it
   official.

   It did not fill the last byte of the provided buffer before 5.10. Two
   architectures, powerpc and um, used it to add the trailing '\0'.
   There might theoretically be more callers depending on this behavior
   in userspace.

* tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: fix buffer overflow potential for print_text()
  printk: fix kmsg_dump_get_buffer length calulations
  printk: ringbuffer: fix line counting
2021-01-21 11:37:22 -08:00
Petr Mladek 535b6a122c Merge branch 'printk-rework' into for-linus 2021-01-21 16:06:21 +01:00
Jakub Kicinski 0fe2f273ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/can/dev.c
  commit 03f16c5075 ("can: dev: can_restart: fix use after free bug")
  commit 3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")

  Code move.

drivers/net/dsa/b53/b53_common.c
 commit 8e4052c32d ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
 commit b7a9e0da2d ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")

 Field rename.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20 12:16:11 -08:00
Linus Torvalds 75439bc439 Networking fixes for 5.11-rc5, including fixes from bpf, wireless,
and can trees.
 
 Current release - regressions:
 
  - nfc: nci: fix the wrong NCI_CORE_INIT parameters
 
 Current release - new code bugs:
 
  - bpf: allow empty module BTFs
 
 Previous releases - regressions:
 
  - bpf: fix signed_{sub,add32}_overflows type handling
 
  - tcp: do not mess with cloned skbs in tcp_add_backlog()
 
  - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach
 
  - bpf: don't leak memory in bpf getsockopt when optlen == 0
 
  - tcp: fix potential use-after-free due to double kfree()
 
  - mac80211: fix encryption issues with WEP
 
  - devlink: use right genl user_ptr when handling port param get/set
 
  - ipv6: set multicast flag on the multicast route
 
  - tcp: fix TCP_USER_TIMEOUT with zero window
 
 Previous releases - always broken:
 
  - bpf: local storage helpers should check nullness of owner ptr passed
 
  - mac80211: fix incorrect strlen of .write in debugfs
 
  - cls_flower: call nla_ok() before nla_next()
 
  - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmAIa+UACgkQMUZtbf5S
 IruZTQ/+O263ZyI0C5S1uCbHPCsAyjZyxECWDNfQ3tRzTfvldoRRP4YbC1ekSoXu
 8Y9GKDDLMI2pYkNlCqfMhrFaop8sudosntOZDSeRm/2TkkQFnkM/bxAlz++7Rnwx
 vHu1Xo2t2bKJxooSw8gLJ5iZNTbkw/M5iA3qR9kP+BG1yDP7By4P/Y4ziFphffad
 gPlfLQaU8nRVuDBYYrGIX0GoMg05IH1zt2/MxvN4ReXuex/9tq2TrU8jxHiwT2ja
 K1DHR+g2VVZf55TWrL9Yw8V5Rr+F7bxf6i+yer9hWWhENXgoTv6QkndAnTFOcoat
 VQh44GzoNoL1dAHD8kyUOOxJCyjItJJe58Evcwjnls4o+5BC2aDNQADwrSyz3sHe
 l9iNMSMEylymu7Xu+cJw2kjOq/BK6TdjaGSxwm1M2ErPehf36eJuc4FkaJz3RO55
 nkYMfm0+5rYWSsR5CTTJp8r2urCAT4SSx1iLoZknUXE6qa5AcMSNhIjGbw6pUp4q
 RDBtAKqiV0l37vdUag4Z+QgjPA0cH9E4aMQKYmD9dop20Zuzp4ug38qR32aEFC6q
 Qfb0VBMKgwu6OWjuWARbwYktVQNcoelKiGnsGnORJ5S9cyc1N4HeKEnb5Hw8ky5q
 4FBpNMfx3Ief14iNkh65KrzA+uyZBjqEG+joTSzn+9R7Lof60QA=
 =KyY7
 -----END PGP SIGNATURE-----

Merge tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
  can trees.

  Current release - regressions:

   - nfc: nci: fix the wrong NCI_CORE_INIT parameters

  Current release - new code bugs:

   - bpf: allow empty module BTFs

  Previous releases - regressions:

   - bpf: fix signed_{sub,add32}_overflows type handling

   - tcp: do not mess with cloned skbs in tcp_add_backlog()

   - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach

   - bpf: don't leak memory in bpf getsockopt when optlen == 0

   - tcp: fix potential use-after-free due to double kfree()

   - mac80211: fix encryption issues with WEP

   - devlink: use right genl user_ptr when handling port param get/set

   - ipv6: set multicast flag on the multicast route

   - tcp: fix TCP_USER_TIMEOUT with zero window

  Previous releases - always broken:

   - bpf: local storage helpers should check nullness of owner ptr passed

   - mac80211: fix incorrect strlen of .write in debugfs

   - cls_flower: call nla_ok() before nla_next()

   - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"

* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net: systemport: free dev before on error path
  net: usb: cdc_ncm: don't spew notifications
  net: mscc: ocelot: Fix multicast to the CPU port
  tcp: Fix potential use-after-free due to double kfree()
  bpf: Fix signed_{sub,add32}_overflows type handling
  can: peak_usb: fix use after free bugs
  can: vxcan: vxcan_xmit: fix use after free bug
  can: dev: can_restart: fix use after free bug
  tcp: fix TCP socket rehash stats mis-accounting
  net: dsa: b53: fix an off by one in checking "vlan->vid"
  tcp: do not mess with cloned skbs in tcp_add_backlog()
  selftests: net: fib_tests: remove duplicate log test
  net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
  sh_eth: Fix power down vs. is_opened flag ordering
  net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
  netfilter: rpfilter: mask ecn bits before fib lookup
  udp: mask TOS bits in udp_v4_early_demux()
  xsk: Clear pool even for inactive queues
  bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
  sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
  ...
2021-01-20 11:52:21 -08:00
Daniel Borkmann bc895e8b2a bpf: Fix signed_{sub,add32}_overflows type handling
Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy
comment). It looks like this might have slipped in via copy/paste issue, also
given prior to 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
the signature of signed_sub_overflows() had s64 a and s64 b as its input args
whereas now they are truncated to s32. Thus restore proper types. Also, the case
of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both
have s32 as inputs, therefore align the former.

Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: De4dCr0w <sa516203@mail.ustc.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20 17:19:40 +01:00
Linus Torvalds 45dfb8a565 task_work-2021-01-19
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAHH+IQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppOQD/4zMSRTkLa/goG15WzxC73HVp3zbMLy6R/x
 NvwYwjaWmUDV/H93CBt7NVqypItU4YPw40tR6L5W2qFP5apGZms3d62gVIZC7sp4
 5X2HMPezJJIP94DgG1jpWD3D0JQhVn6glaoBLiN5SBdXE2W9N8Z+g3NafKr10ici
 UXbKfED2NvRSIifyPDI18d6bmwiwdvpRh5+NqCy+37A4+FG+Q16R0KX4PAT3fhvW
 k2ajqCDnQkYy1S9an/Buak6i9RU/f3ASvXdoh5pqLJppR/aEWlVQV5m88hFG4I0K
 3SWvBTSR5LEh/BGPka5oSYEEe/a8OZUhcV43JYRi9hdQPn1wqG+VPpZ0QxgCzDg/
 UaXrSGMAPBwKUQMnncvBdTp2gc9JzXANK8gtTgDoQxHQG5YmtGTiveu+IIAHCIpb
 GW2ATHMOPsk6rsTYSiLTHJsb62J6jXpOXBBBH00l8LDfdk3p1+cEyP551SVumtQl
 sq/0C4o1UGTDHR+alzPwrOlEyz1B+YWxGCSilauCO1ww6yiCJ55ECve7T0hFH1BP
 3ftWWdkSDwl6xelAHoKyDKs7A1D0Au8iPV/zfSQBQ3bw5LWagIPOE+9nION+IBX9
 cHlUyUx1q5uwOenURxMk2yvlf+2dhOxRVnYZsYwqS83So6RpDG6bEKdd5ukN+vLx
 /83ywtUfLg==
 =LAMf
 -----END PGP SIGNATURE-----

Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block

Pull task_work fix from Jens Axboe:
 "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
  task_work run we had in get_signal().

  This caused a regression for some setups, since we're relying on eg
  ____fput() being run to close and release, for example, a pipe and
  wake the other end.

  For 5.11, I prefer the simple solution of just reinstating the
  unconditional run, even if it conceptually doesn't make much sense -
  if you need that kind of guarantee, you should be using TWA_SIGNAL
  instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
  ensure that other potential gotchas/assumptions for task_work don't
  regress for 5.11.

  We're looking into further simplifying the task_work notifications for
  5.12 which would resolve that too"

* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
  task_work: unconditionally run task_work from get_signal()
2021-01-19 13:26:05 -08:00
Mircea Cirjaliu 301a33d518 bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
I assume this was obtained by copy/paste. Point it to bpf_map_peek_elem()
instead of bpf_map_pop_elem(). In practice it may have been less likely
hit when under JIT given shielded via 84430d4232 ("bpf, verifier: avoid
retpoline for map push/pop/peek operation").

Fixes: f1a2e44a3a ("bpf: add queue and stack maps")
Signed-off-by: Mircea Cirjaliu <mcirjaliu@bitdefender.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mauricio Vasquez <mauriciovasquezbernal@gmail.com>
Link: https://lore.kernel.org/bpf/AM7PR02MB6082663DFDCCE8DA7A6DD6B1BBA30@AM7PR02MB6082.eurprd02.prod.outlook.com
2021-01-19 22:04:08 +01:00
John Ogness f0e386ee0c printk: fix buffer overflow potential for print_text()
Before the commit 896fbe20b4 ("printk: use the lockless
ringbuffer"), msg_print_text() would only write up to size-1 bytes
into the provided buffer. Some callers expect this behavior and
append a terminator to returned string. In particular:

arch/powerpc/xmon/xmon.c:dump_log_buf()
arch/um/kernel/kmsg_dump.c:kmsg_dumper_stdout()

msg_print_text() has been replaced by record_print_text(), which
currently fills the full size of the buffer. This causes a
buffer overflow for the above callers.

Change record_print_text() so that it will only use size-1 bytes
for text data. Also, for paranoia sakes, add a terminator after
the text data.

And finally, document this behavior so that it is clear that only
size-1 bytes are used and a terminator is added.

Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210114170412.4819-1-john.ogness@linutronix.de
2021-01-19 11:42:14 +01:00
Jakub Kicinski 2d9116be76 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-01-16

1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
   that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.

2) Add support for using kernel module global variables (__ksym externs in BPF
   programs) retrieved via module's BTF, from Andrii Nakryiko.

3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
   stored in mmap2 event for perf, from Jiri Olsa.

4) Various fixes for cross-building BPF sefltests out-of-tree which then will
   unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.

5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.

6) Clean up driver's XDP buffer init and split into two helpers to init per-
   descriptor and non-changing fields during processing, from Lorenzo Bianconi.

7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
  perf: Add build id data in mmap2 event
  bpf: Add size arg to build_id_parse function
  bpf: Move stack_map_get_build_id into lib
  bpf: Document new atomic instructions
  bpf: Add tests for new BPF atomic operations
  bpf: Add bitwise atomic instructions
  bpf: Pull out a macro for interpreting atomic ALU operations
  bpf: Add instructions for atomic_[cmp]xchg
  bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
  bpf: Move BPF_STX reserved field check into BPF_STX verifier code
  bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
  bpf: x86: Factor out a lookup table for some ALU opcodes
  bpf: x86: Factor out emission of REX byte
  bpf: x86: Factor out emission of ModR/M for *(reg + off)
  tools/bpftool: Add -Wall when building BPF programs
  bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
  selftests/bpf: Install btf_dump test cases
  selftests/bpf: Fix installation of urandom_read
  selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
  selftests/bpf: Fix out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 17:57:26 -08:00
Jakub Kicinski e23a8d0021 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-01-16

1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in
   link creation's error path causing a refcount underflow, from Jiri Olsa.

2) Fix BTF validation errors for the case where kernel modules don't declare
   any new types and end up with an empty BTF, from Andrii Nakryiko.

3) Fix BPF local storage helpers to first check their {task,inode} owners for
   being NULL before access, from KP Singh.

4) Fix a memory leak in BPF setsockopt handling for the case where optlen is
   zero and thus temporary optval buffer should be freed, from Stanislav Fomichev.

5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for
   raw_tracepoint caused by too big ctx_size_in, from Song Liu.

6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL}
   registers were spilled to stack but not recognized, from Gilad Reti.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  MAINTAINERS: Update my email address
  selftests/bpf: Add verifier test for PTR_TO_MEM spill
  bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
  bpf: Reject too big ctx_size_in for raw_tp test run
  libbpf: Allow loading empty BTFs
  bpf: Allow empty module BTFs
  bpf: Don't leak memory in bpf getsockopt when optlen == 0
  bpf: Update local storage test to check handling of null ptrs
  bpf: Fix typo in bpf_inode_storage.c
  bpf: Local storage helpers should check nullness of owner ptr passed
  bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
====================

Link: https://lore.kernel.org/r/20210116002025.15706-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 16:34:59 -08:00
John Ogness 89ccf18f03 printk: fix kmsg_dump_get_buffer length calulations
kmsg_dump_get_buffer() uses @syslog to determine if the syslog
prefix should be written to the buffer. However, when calculating
the maximum number of records that can fit into the buffer, it
always counts the bytes from the syslog prefix.

Use @syslog when calculating the maximum number of records that can
fit into the buffer.

Fixes: e2ae715d66 ("kmsg - kmsg_dump() use iterator to receive log buffer content")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113164413.1599-1-john.ogness@linutronix.de
2021-01-15 11:32:52 +01:00
John Ogness 668af87f99 printk: ringbuffer: fix line counting
Counting text lines in a record simply involves counting the number
of newline characters (+1). However, it is searching the full data
block for newline characters, even though the text data can be (and
often is) a subset of that area. Since the extra area in the data
block was never initialized, the result is that extra newlines may
be seen and counted.

Restrict newline searching to the text data length.

Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113144234.6545-1-john.ogness@linutronix.de
2021-01-15 11:30:03 +01:00
Jiri Olsa 88a16a1309 perf: Add build id data in mmap2 event
Adding support to carry build id data in mmap2 event.

The build id data replaces maj/min/ino/ino_generation
fields, which are also used to identify map's binary,
so it's ok to replace them with build id data:

  union {
          struct {
                  u32       maj;
                  u32       min;
                  u64       ino;
                  u64       ino_generation;
          };
          struct {
                  u8        build_id_size;
                  u8        __reserved_1;
                  u16       __reserved_2;
                  u8        build_id[20];
          };
  };

Replaced maj/min/ino/ino_generation fields give us size
of 24 bytes. We use 20 bytes for build id data, 1 byte
for size and rest is unused.

There's new misc bit for mmap2 to signal there's build
id data in it:

  #define PERF_RECORD_MISC_MMAP_BUILD_ID   (1 << 14)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-4-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Jiri Olsa 921f88fc89 bpf: Add size arg to build_id_parse function
It's possible to have other build id types (other than default SHA1).
Currently there's also ld support for MD5 build id.

Adding size argument to build_id_parse function, that returns (if defined)
size of the parsed build id, so we can recognize the build id type.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-3-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Jiri Olsa bd7525dacd bpf: Move stack_map_get_build_id into lib
Moving stack_map_get_build_id into lib with
declaration in linux/buildid.h header:

  int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);

This function returns build id for given struct vm_area_struct.
There is no functional change to stack_map_get_build_id function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-2-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Brendan Jackman 981f94c3e9 bpf: Add bitwise atomic instructions
This adds instructions for

atomic[64]_[fetch_]and
atomic[64]_[fetch_]or
atomic[64]_[fetch_]xor

All these operations are isomorphic enough to implement with the same
verifier, interpreter, and x86 JIT code, hence being a single commit.

The main interesting thing here is that x86 doesn't directly support
the fetch_ version these operations, so we need to generate a CMPXCHG
loop in the JIT. This requires the use of two temporary registers,
IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman 462910670e bpf: Pull out a macro for interpreting atomic ALU operations
Since the atomic operations that are added in subsequent commits are
all isomorphic with BPF_ADD, pull out a macro to avoid the
interpreter becoming dominated by lines of atomic-related code.

Note that this sacrificies interpreter performance (combining
STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we
need an extra conditional branch to differentiate them) in favour of
compact and (relatively!) simple C code.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-9-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman 5ffa25502b bpf: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.

There are two significant design decisions made for the CMPXCHG
instruction:

 - To solve the issue that this operation fundamentally has 3
   operands, but we only have two register fields. Therefore the
   operand we compare against (the kernel's API calls it 'old') is
   hard-coded to be R0. x86 has similar design (and A64 doesn't
   have this problem).

   A potential alternative might be to encode the other operand's
   register number in the immediate field.

 - The kernel's atomic_cmpxchg returns the old value, while the C11
   userspace APIs return a boolean indicating the comparison
   result. Which should BPF do? A64 returns the old value. x86 returns
   the old value in the hard-coded register (and also sets a
   flag). That means return-old-value is easier to JIT, so that's
   what we use.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman 5ca419f286 bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman c5bcb5eb4d bpf: Move BPF_STX reserved field check into BPF_STX verifier code
I can't find a reason why this code is in resolve_pseudo_ldimm64;
since I'll be modifying it in a subsequent commit, tidy it up.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-6-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman 91c960b005 bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.

In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.

This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.

All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Gilad Reti 744ea4e388 bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
Add support for pointer to mem register spilling, to allow the verifier
to track pointers to valid memory addresses. Such pointers are returned
for example by a successful call of the bpf_ringbuf_reserve helper.

The patch was partially contributed by CyberArk Software, Inc.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210113053810.13518-1-gilad.reti@gmail.com
2021-01-13 19:47:43 -08:00
Thomas Gleixner ce09ccc502 genirq: Export irq_check_status_bit()
One of the users can be built modular:

  ERROR: modpost: "irq_check_status_bit" [drivers/perf/arm_spe_pmu.ko] undefined!

Fixes: fdd0296304 ("genirq: Move status flag checks to core")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201227192049.GA195845@roeck-us.net
2021-01-13 15:48:05 +01:00
Andrii Nakryiko 541c3bad8d bpf: Support BPF ksym variables in kernel modules
Add support for directly accessing kernel module variables from BPF programs
using special ldimm64 instructions. This functionality builds upon vmlinux
ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
specifying kernel module BTF's FD in insn[1].imm field.

During BPF program load time, verifier will resolve FD to BTF object and will
take reference on BTF object itself and, for module BTFs, corresponding module
as well, to make sure it won't be unloaded from under running BPF program. The
mechanism used is similar to how bpf_prog keeps track of used bpf_maps.

One interesting change is also in how per-CPU variable is determined. The
logic is to find .data..percpu data section in provided BTF, but both vmlinux
and module each have their own .data..percpu entries in BTF. So for module's
case, the search for DATASEC record needs to look at only module's added BTF
types. This is implemented with custom search function.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-6-andrii@kernel.org
2021-01-12 17:24:30 -08:00
Brendan Jackman 28a8add641 bpf: Fix a verifier message for alloc size helper arg
The error message here is misleading, the argument will be rejected unless
it is a known constant.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210112123913.2016804-1-jackmanb@google.com
2021-01-12 21:41:17 +01:00