Commit Graph

983542 Commits

Author SHA1 Message Date
Linus Torvalds a527a2b32d Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro:
 "Several assorted fixes.

  I still think that audit ->d_name race is better fixed this way for
  the benefit of backports, with any possibly fancier variants done on
  top of it"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dump_common_audit_data(): fix racy accesses to ->d_name
  iov_iter: fix the uaccess area in copy_compat_iovec_from_user
  umount(2): move the flag validity checks first
2021-01-17 12:16:47 -08:00
Linus Torvalds feb889fb40 mm: don't put pinned pages into the swap cache
So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.

However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page.  That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.

Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache.  But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.

Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).

But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.

It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors.  If you do it wrong, you end up with an incorrectly
shared page.

As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic.  Re-using the page requires a hard black-and-white rule with
no room for ambiguity.

In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.

Fixes: 09854ba94c ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-17 12:08:04 -08:00
Dexuan Cui fff7b5e6ee x86/hyperv: Initialize clockevents after LAPIC is initialized
With commit 4df4cb9e99, the Hyper-V direct-mode STIMER is actually
initialized before LAPIC is initialized: see

  apic_intr_mode_init()

    x86_platform.apic_post_init()
      hyperv_init()
        hv_stimer_alloc()

    apic_bsp_setup()
      setup_local_APIC()

setup_local_APIC() temporarily disables LAPIC, initializes it and
re-eanble it.  The direct-mode STIMER depends on LAPIC, and when it's
registered, it can be programmed immediately and the timer can fire
very soon:

  hv_stimer_init
    clockevents_config_and_register
      clockevents_register_device
        tick_check_new_device
          tick_setup_device
            tick_setup_periodic(), tick_setup_oneshot()
              clockevents_program_event

When the timer fires in the hypervisor, if the LAPIC is in the
disabled state, new versions of Hyper-V ignore the event and don't inject
the timer interrupt into the VM, and hence the VM hangs when it boots.

Note: when the VM starts/reboots, the LAPIC is pre-enabled by the
firmware, so the window of LAPIC being temporarily disabled is pretty
small, and the issue can only happen once out of 100~200 reboots for
a 40-vCPU VM on one dev host, and on another host the issue doesn't
reproduce after 2000 reboots.

The issue is more noticeable for kdump/kexec, because the LAPIC is
disabled by the first kernel, and stays disabled until the kdump/kexec
kernel enables it. This is especially an issue to a Generation-2 VM
(for which Hyper-V doesn't emulate the PIT timer) when CONFIG_HZ=1000
(rather than CONFIG_HZ=250) is used.

Fix the issue by moving hv_stimer_alloc() to a later place where the
LAPIC timer is initialized.

Fixes: 4df4cb9e99 ("x86/hyperv: Initialize clockevents earlier in CPU onlining")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by:  Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210116223136.13892-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-01-17 15:20:50 +00:00
Mike Rapoport 32c2bc8f2d ia64: fix build failure caused by memory model changes
The change of ia64's default memory model to SPARSEMEM causes defconfig
build to fail:

  CC      kernel/async.o
In file included from include/linux/numa.h:25,
                 from include/linux/async.h:13,
                 from kernel/async.c:47:
arch/ia64/include/asm/sparsemem.h:14:40: warning: "PAGE_SHIFT" is not defined, evaluates to 0 [-Wundef]
   14 | #if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
      |                                        ^~~~~~~~~~
In file included from include/linux/gfp.h:6,
                 from include/linux/xarray.h:14,
                 from include/linux/radix-tree.h:19,
                 from include/linux/idr.h:15,
                 from include/linux/kernfs.h:13,
                 from include/linux/sysfs.h:16,
                 from include/linux/kobject.h:20,
                 from include/linux/energy_model.h:7,
                 from include/linux/device.h:16,
                 from include/linux/async.h:14,
                 from kernel/async.c:47:
include/linux/mmzone.h:1156:2: error: #error Allocator MAX_ORDER exceeds SECTION_SIZE
 1156 | #error Allocator MAX_ORDER exceeds SECTION_SIZE
      |  ^~~~~

The error cause is the missing definition of PAGE_SHIFT in the calculation
of SECTION_SIZE_BITS.

Add include of <asm/page.h> to arch/ia64/include/asm/sparsemem.h to solve
the problem.

Fixes: 214496cb18 ("ia64: make SPARSEMEM default and disable DISCONTIGMEM")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
2021-01-17 13:31:09 +02:00
Jakub Kicinski 213b97b125 Merge branch 'net-fix-the-features-flag-in-sctp_gso_segment'
Xin Long says:

====================
net: fix the features flag in sctp_gso_segment

Patch 1/2 is to improve the code in skb_segment(), and it is needed
by Patch 2/2.
====================

Link: https://lore.kernel.org/r/cover.1610703289.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 19:06:01 -08:00
Xin Long 1fef8544bf sctp: remove the NETIF_F_SG flag before calling skb_segment
It makes more sense to clear NETIF_F_SG instead of set it when
calling skb_segment() in sctp_gso_segment(), since SCTP GSO is
using head_skb's fraglist, of which all frags are linear skbs.

This will make SCTP GSO code more understandable.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 19:05:59 -08:00
Xin Long dbd50f238d net: move the hsize check to the else block in skb_segment
After commit 89319d3801 ("net: Add frag_list support to skb_segment"),
it goes to process frag_list when !hsize in skb_segment(). However, when
using skb frag_list, sg normally should not be set. In this case, hsize
will be set with len right before !hsize check, then it won't go to
frag_list processing code.

So the right thing to do is move the hsize check to the else block, so
that it won't affect the !hsize check for frag_list processing.

v1->v2:
  - change to do "hsize <= 0" check instead of "!hsize", and also move
    "hsize < 0" into else block, to save some cycles, as Alex suggested.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 19:05:59 -08:00
Alexander Lobakin 66c556025d skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
Commit 3226b158e6 ("net: avoid 32 x truesize under-estimation for
tiny skbs") ensured that skbs with data size lower than 1025 bytes
will be kmalloc'ed to avoid excessive page cache fragmentation and
memory consumption.
However, the fix adressed only __napi_alloc_skb() (primarily for
virtio_net and napi_get_frags()), but the issue can still be achieved
through __netdev_alloc_skb(), which is still used by several drivers.
Drivers often allocate a tiny skb for headers and place the rest of
the frame to frags (so-called copybreak).
Mirror the condition to __netdev_alloc_skb() to handle this case too.

Since v1 [0]:
 - fix "Fixes:" tag;
 - refine commit message (mention copybreak usecase).

[0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me

Fixes: a1c7fff7e1 ("net: netdev_alloc_skb() use build_skb()")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 19:02:04 -08:00
Xu Wang 20efd2c79a net: mscc: ocelot: Remove unneeded semicolon
fix semicolon.cocci warnings:
drivers/net/ethernet/mscc/ocelot_net.c:460:2-3: Unneeded semicolon

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Link: https://lore.kernel.org/r/20210115095544.33164-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 19:01:10 -08:00
Raju Rangoju b660bccbc3 cxgb4: enable interrupt based Tx completions for T5
Enable interrupt based Tx completions to improve latency for T5.
The consumer index (CIDX) will now come via interrupts so that Tx
SKBs can be freed up sooner in Rx path. Also, enforce CIDX flush
threshold override (CIDXFTHRESHO) to improve latency for slow
traffic. This ensures that the interrupt is generated immediately
whenever hardware catches up with driver (i.e. CIDX == PIDX is
reached), which is often the case for slow traffic.

Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20210115102059.6846-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:32:08 -08:00
Jakub Kicinski c761b2df9d Merge branch 'rid-w-1-warnings-in-ethernet'
Lee Jones says:

====================
Rid W=1 warnings in Ethernet

This set is part of a larger effort attempting to clean-up W=1
kernel builds, which are currently overwhelmingly riddled with
niggly little warnings.

v2:
 - Squashed IBM patches
 - Fixed real issue in SMSC
 - Added Andrew's Reviewed-by tags on remainder
====================

Link: https://lore.kernel.org/r/20210115200905.3470941-1-lee.jones@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:33 -08:00
Lee Jones e242d59899 net: ethernet: toshiba: spider_net: Document a whole bunch of function parameters
Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/toshiba/spider_net.c:263: warning: Function parameter or member 'hwdescr' not described in 'spider_net_get_descr_status'
 drivers/net/ethernet/toshiba/spider_net.c:263: warning: Excess function parameter 'descr' description in 'spider_net_get_descr_status'
 drivers/net/ethernet/toshiba/spider_net.c:554: warning: Function parameter or member 'netdev' not described in 'spider_net_get_multicast_hash'
 drivers/net/ethernet/toshiba/spider_net.c:902: warning: Function parameter or member 't' not described in 'spider_net_cleanup_tx_ring'
 drivers/net/ethernet/toshiba/spider_net.c:902: warning: Excess function parameter 'card' description in 'spider_net_cleanup_tx_ring'
 drivers/net/ethernet/toshiba/spider_net.c:1074: warning: Function parameter or member 'card' not described in 'spider_net_resync_head_ptr'
 drivers/net/ethernet/toshiba/spider_net.c🔢 warning: Function parameter or member 'napi' not described in 'spider_net_poll'
 drivers/net/ethernet/toshiba/spider_net.c🔢 warning: Excess function parameter 'netdev' description in 'spider_net_poll'
 drivers/net/ethernet/toshiba/spider_net.c:1278: warning: Function parameter or member 'p' not described in 'spider_net_set_mac'
 drivers/net/ethernet/toshiba/spider_net.c:1278: warning: Excess function parameter 'ptr' description in 'spider_net_set_mac'
 drivers/net/ethernet/toshiba/spider_net.c:1350: warning: Function parameter or member 'error_reg1' not described in 'spider_net_handle_error_irq'
 drivers/net/ethernet/toshiba/spider_net.c:1350: warning: Function parameter or member 'error_reg2' not described in 'spider_net_handle_error_irq'
 drivers/net/ethernet/toshiba/spider_net.c:1968: warning: Function parameter or member 't' not described in 'spider_net_link_phy'
 drivers/net/ethernet/toshiba/spider_net.c:1968: warning: Excess function parameter 'data' description in 'spider_net_link_phy'
 drivers/net/ethernet/toshiba/spider_net.c:2149: warning: Function parameter or member 'work' not described in 'spider_net_tx_timeout_task'
 drivers/net/ethernet/toshiba/spider_net.c:2149: warning: Excess function parameter 'data' description in 'spider_net_tx_timeout_task'
 drivers/net/ethernet/toshiba/spider_net.c:2182: warning: Function parameter or member 'txqueue' not described in 'spider_net_tx_timeout'

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:28 -08:00
Lee Jones b510363214 net: ethernet: toshiba: ps3_gelic_net: Fix some kernel-doc misdemeanours
Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'irq' not described in 'gelic_card_interrupt'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'ptr' not described in 'gelic_card_interrupt'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1407: warning: Function parameter or member 'txqueue' not described in 'gelic_net_tx_timeout'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1439: warning: Function parameter or member 'napi' not described in 'gelic_ether_setup_netdev_ops'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1639: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_probe'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1795: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_remove'

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:23 -08:00
Lee Jones 807086021b net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
Fixes the following W=1 kernel build warning(s):

 from drivers/net/ethernet/ibm/ibmvnic.c:35:
 inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'skb' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_len' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_field' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'scrq_arr' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'num_entries' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'adapter' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'rwi' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'reset_state' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'adapter' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'rwi' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'reset_state' not described in 'do_reset'

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:17 -08:00
Lee Jones e49e4647f3 net: ethernet: ti: am65-cpts: Document am65_cpts_rx_enable()'s 'en' parameter
Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/ti/am65-cpts.c:736: warning: Function parameter or member 'en' not described in 'am65_cpts_rx_enable'
 drivers/net/ethernet/ti/am65-cpts.c:736: warning: Excess function parameter 'skb' description in 'am65_cpts_rx_enable'

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:14 -08:00
Lee Jones 935888cda8 net: ethernet: ti: am65-cpsw-qos: Demote non-conformant function header
Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/ti/am65-cpsw-qos.c:364: warning: Function parameter or member 'ndev' not described in 'am65_cpsw_timer_set'
 drivers/net/ethernet/ti/am65-cpsw-qos.c:364: warning: Function parameter or member 'est_new' not described in 'am65_cpsw_timer_set'

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:10 -08:00
Lee Jones 090c7ae8e0 net: xen-netback: xenbus: Demote nonconformant kernel-doc headers
Fixes the following W=1 kernel build warning(s):

 drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 'dev' not described in 'frontend_changed'
 drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 'frontend_state' not described in 'frontend_changed'
 drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 'dev' not described in 'netback_probe'
 drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 'id' not described in 'netback_probe'

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:04 -08:00
Lee Jones 7d2a92445e net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'dev' not described in 'try_toggle_control_gpio'
 drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'desc' not described in 'try_toggle_control_gpio'
 drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'name' not described in 'try_toggle_control_gpio'
 drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'index' not described in 'try_toggle_control_gpio'
 drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'value' not described in 'try_toggle_control_gpio'
 drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'nsdelay' not described in 'try_toggle_control_gpio'

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:12:57 -08:00
Linus Torvalds 0da0a8a0a0 SCSI fixes on 20210116
Nine minor fixes, 7 in drivers and 2 in the core SCSI disk driver (sd)
 which should be harmless involving removing an unused variable and
 quietening a spurious warning.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYANKJiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVSyAP9a4xdK
 9A4seh2/LW3GwPsoQUJQINe4yok/lGVXSQk3XQD/QkkYbzeqVGN4BHK1LQX2R9Sw
 wy+E4ENjdOY+9p+KMxo=
 =m0jh
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Nine minor fixes, seven in drivers and two in the core SCSI disk
  driver (sd) which should be harmless involving removing an unused
  variable and quietening a spurious warning"

Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Remove obsolete variable in sd_remove()
  scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
  scsi: scsi_debug: Fix memleak in scsi_debug_init()
  scsi: mpt3sas: Fix spelling mistake in Kconfig "compatiblity" -> "compatibility"
  scsi: qedi: Correct max length of CHAP secret
  scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
  scsi: ufs: Relocate flush of exceptional event
  scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
  scsi: ufs: Fix possible power drain during system suspend
2021-01-16 12:25:40 -08:00
Al Viro d36a1dd9f7 dump_common_audit_data(): fix racy accesses to ->d_name
We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us.  And it's possible for
old long name to be freed after rename, leading to UAF here.

Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-16 15:11:35 -05:00
Linus Torvalds 54c6247d06 block-5.11-2021-01-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmADKbkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgrWD/9gmAaxW3cF5K9UPcfT9VRz50PRfa5cGNwu
 eQ3ETIkqQtmizibj9BXBYwPWn6FK6gSeI8LjP0SXOUxkG1O23Hwwb6ieQ2tRg+CX
 Wmb+S7EUwpkHBWWmulfsX9PXQsyG2D4oz8Obxwp0jvI3XzuEeMQ5gYTpO+0YTxi1
 VgdeOZfV5Xben0kvpwlUfowdOD6CK8LSPJ4KjuynkvgtfKuipsCmJqMqRZqFMu3f
 F4p+ngVYuqEcKddh+h5pFifUy2bo076eYe4kE6vGlmdySjaVyHiLJcBmkMDxb/cp
 daWTaBGCsvZfP0uwLgATD0P0UqJn44Vx15FlN50uSlrfaEiglO6aBuWwbQpuPpq/
 FF/hhUscs9cmZhYGO8TNnOZHfzNmYmPx9dH/D+Fo6mqCaGvsOyGGUeKuG86hfLlM
 AEVGFiVdlQNdLq7f2HHFKfWL1XvHbvGrgQ5d4dnJrD9a/1TgXTiluleC1+vvLfoP
 cRTATNtfMTcc88QF+L1BxtHXkUsf6dyqPt1AalmKqwHDv9d1C/B0aZt6a/JFTXp4
 utsxDi+WOb4HVT4/ojl35HML6CWATqVgHH1rSG2Q8UPWvLX+O8QalRIBhtNaRt9k
 hr3joKlmeyCjJdrgE+07xlwfiqTmmhmTkGjJw39KyQTiOyyEPey+HWwJEhO1QaFR
 hgoQvLA4mw==
 =CYyj
 -----END PGP SIGNATURE-----

Merge tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just an nvme pull request via Christoph:

   - don't initialize hwmon for discover controllers (Sagi Grimberg)

   - fix iov_iter handling in nvme-tcp (Sagi Grimberg)

   - fix a preempt warning in nvme-tcp (Sagi Grimberg)

   - fix a possible NULL pointer dereference in nvme (Israel Rukshin)"

* tag 'block-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
  nvme: don't intialize hwmon for discovery controllers
  nvme-tcp: fix possible data corruption with bio merges
  nvme-tcp: Fix warning with CONFIG_DEBUG_PREEMPT
  nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY
2021-01-16 11:39:58 -08:00
Linus Torvalds 11c0239ae2 io_uring-5.11-2021-01-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmADKaIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnjVD/402IBhG40EoLMdcztTbOF7S5JZLbECbWam
 HPN99D0YyjRQNwsd7CPzeO+7W4zO4P0T/NcnXLF1TmPnkRynDNsT4oygcLrJoH8n
 xZoAXi6ykM/4eE+4NSaNfWF/f3DtGepsIHMpSNNyzE+qxz4I36/EPTu7M13ga1A4
 H6pAbvU1DrDZ+z62rhWcb+4ouUV7QDS02UTNo9cW4ueeMNLyW/Zxp9lBzNuAkbzF
 gRlsxkhXLgcqtA36EdRWKs9fOwcdTyiMGzS5DRRmmE/O/a3//yl0ENFTPmWd6+jf
 daZcAYW6BdB+0va1sYGbECl9zOkMlhcm/hNFnl3ZmFIW5a4QiRy2WLoWuiNwVSYM
 nGCz0l5P2pP32tt0fQlSQh8KdI+py+c7MhJ1IaRhJi8X4t1MHoOR3vlilw0UvrdQ
 zpm8H+hMmL3g9/6pZLwxBvd31N1Y/A8ghh6qTdQ2EwdPyjImR3Xv/wYUqdbfvBo5
 VcB6ccoFwuP5KzQcyI2j2WRcx7AtHZuiVGvoqkA8QL9Onq3GWoY8tnkf/sP8RxlQ
 wL91wVkT46UMkCLeY7QCGdx4nnmSahCEa8SQEYxibnh++9jh+c6Y5FtZc3YNjnED
 GRFT9vJmqNmrvKCLo78iTf/F8vJllGXQRKDeMuU9DVj3ylhZOBSEhpmMIZwdXNQa
 CqMCqia4Xg==
 =XL14
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "We still have a pending fix for a cancelation issue, but it's still
  being investigated. In the meantime:

   - Dead mm handling fix (Pavel)

   - SQPOLL setup error handling (Pavel)

   - Flush timeout sequence fix (Marcelo)

   - Missing finish_wait() for one exit case"

* tag 'io_uring-5.11-2021-01-16' of git://git.kernel.dk/linux-block:
  io_uring: ensure finish_wait() is always called in __io_uring_task_cancel()
  io_uring: flush timeouts that should already have expired
  io_uring: do sqo disable on install_fd error
  io_uring: fix null-deref in io_disable_sqo_submit
  io_uring: don't take files/mm for a dead task
  io_uring: drop mm and files after task_work_run
2021-01-16 11:12:02 -08:00
Linus Torvalds acda701bf1 RISC-V Fixes for 5.11-rc4
There are a few more fixes than a normal rc4, largely due to the bubble
 introduced by the holiday break:
 
 * A fix to return -ENOSYS for syscall number -1, which previously
   returned an uninitialized value.
 * A fix to time_init() to ensure of_clk_init() has been called, without
   which clock drivers may not be initialized.
 * A fix to the sifive,uart0 driver to properly display the baud rate.  A
   fix to initialize MPIE that allows interrupts to be processed during
   system calls.
 * A fix to avoid erronously begin tracing IRQs when interrupts are
   disabled, which at least triggers suprious lockdep failures.
 * A workaround for a warning related to calling smp_processor_id() while
   preemptible.  The warning itself is suprious on currently availiable
   systems.
 * A fix to properly include the generic time VDSO calls.  A fix to our
   kasan address mapping.  A fix to the HiFive Unleashed device tree,
   which allows the Ethernet PHY to be properly initialized by Linux (as
   opposed to relying on the bootloader).
 * A defconfig update to include SiFive's GPIO driver, which is present
   on the HiFive Unleashed and necessary to initialize the PHY.
 * A fix to avoid allocating memory while initializing reserved memory.
 * A fix to avoid allocating the last 4K of memory, as pointers there
   alias with syscall errors.
 
 There are also two cleanups that should have no functional effect but do
 fix build warnings:
 
 * A cleanup to drop a duplicated definition of PAGE_KERNEL_EXEC.
 * A cleanup to properly declare the asm register SP shim.
 * A cleanup to the rv32 memory size Kconfig entry, to reflect the actual
   size of memory availiable.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmACfN8THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWkQD/40g0dX0dX7fkGHk/zbwlkJffA/VCF7
 +dtqfJSjpRkfPCBEOCJxfCYdHM0adrH2FACwN2JhQmOzOpbccO6Xhr/9jgqms43x
 WB5J0WNKrIT2ZGpB6Hp5htD2lo2OuCLNtdnMK0I3NJMBhJHUKb4HGMwFXKBg+RK1
 LtOQMKEJSgkwT6Mhe5K8+tz5VLu6XQUJMCAzxoeFoTDi513HDDcAoOcvfKt0VujD
 1g2HDtIhYlIYUiGliBcFh88slycWh8Im41Hi2Dh5Nfe9PaiexskpL1+g+CCnZUpa
 yIi72AOirjNxxvfPfmAcTXRx4lHB/YQnB9EUnI6W16D7XniXedC7StR+zhzRjkuy
 lePMKJE6x8mTMF5zv2GLP1FalBeJ2SFwLtboT7zfRlhI+IucMqCaSsE06RHg5/52
 YL9ScVi0xS423f3ctGNUrlSKQKdV1s5D7pm6Qf2WZlqUvwKTspOvkzDn8hNpINCe
 0D3EvPDWVoRSWx4u0Ao/Ig2+xwumtun1IPmF7WHqW+aSxVTs0g0XiEiA8ubaU9bD
 JpqfCL9ftzD5S85igHp8+a3zuFmK+sgR/7JCdp7TqBoHmLoQjg8c5L56m1cZy8Mn
 Wu6E1X0n0IyDDjDqWbGTkfkKC6CvbUy6wRW3LdEnhufon3ldiPMTB66gLs4qfcqF
 Yq5WkrJyGK98Ww==
 =guLu
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "There are a few more fixes than a normal rc4, largely due to the
  bubble introduced by the holiday break:

   - return -ENOSYS for syscall number -1, which previously returned an
     uninitialized value.

   - ensure of_clk_init() has been called in time_init(), without which
     clock drivers may not be initialized.

   - fix sifive,uart0 driver to properly display the baud rate. A fix to
     initialize MPIE that allows interrupts to be processed during
     system calls.

   - avoid erronously begin tracing IRQs when interrupts are disabled,
     which at least triggers suprious lockdep failures.

   - workaround for a warning related to calling smp_processor_id()
     while preemptible. The warning itself is suprious on currently
     availiable systems.

   - properly include the generic time VDSO calls. A fix to our kasan
     address mapping. A fix to the HiFive Unleashed device tree, which
     allows the Ethernet PHY to be properly initialized by Linux (as
     opposed to relying on the bootloader).

   - defconfig update to include SiFive's GPIO driver, which is present
     on the HiFive Unleashed and necessary to initialize the PHY.

   - avoid allocating memory while initializing reserved memory.

   - avoid allocating the last 4K of memory, as pointers there alias
     with syscall errors.

  There are also two cleanups that should have no functional effect but
  do fix build warnings:

   - drop a duplicated definition of PAGE_KERNEL_EXEC.

   - properly declare the asm register SP shim.

   - cleanup the rv32 memory size Kconfig entry, to reflect the actual
     size of memory availiable"

* tag 'riscv-for-linus-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Fix maximum allowed phsyical memory for RV32
  RISC-V: Set current memblock limit
  RISC-V: Do not allocate memblock while iterating reserved memblocks
  riscv: stacktrace: Move register keyword to beginning of declaration
  riscv: defconfig: enable gpio support for HiFive Unleashed
  dts: phy: add GPIO number and active state used for phy reset
  dts: phy: fix missing mdio device and probe failure of vsc8541-01 device
  riscv: Fix KASAN memory mapping.
  riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
  riscv: cacheinfo: Fix using smp_processor_id() in preemptible
  riscv: Trace irq on only interrupt is enabled
  riscv: Drop a duplicated PAGE_KERNEL_EXEC
  riscv: Enable interrupts during syscalls with M-Mode
  riscv: Fix sifive serial driver
  riscv: Fix kernel time_init()
  riscv: return -ENOSYS for syscall -1
2021-01-16 11:00:08 -08:00
Linus Torvalds 9348b73c2e mm: don't play games with pinned pages in clear_page_refs
Turning a pinned page read-only breaks the pinning after COW.  Don't do it.

The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-16 10:51:26 -08:00
Linus Torvalds 29a951dfb3 mm: fix clear_refs_write locking
Turning page table entries read-only requires the mmap_sem held for
writing.

So stop doing the odd games with turning things from read locks to write
locks and back.  Just get the write lock.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-16 10:46:39 -08:00
Atish Patra e557793799
RISC-V: Fix maximum allowed phsyical memory for RV32
Linux kernel can only map 1GB of address space for RV32 as the page offset
is set to 0xC0000000. The current description in the Kconfig is confusing
as it indicates that RV32 can support 2GB of physical memory. That is
simply not true for current kernel. In future, a 2GB split support can be
added to allow 2GB physical address space.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-15 21:35:48 -08:00
Atish Patra abb8e86b26
RISC-V: Set current memblock limit
Currently, linux kernel can not use last 4k bytes of addressable space
because IS_ERR_VALUE macro treats those as an error. This will be an issue
for RV32 as any memblock allocator potentially allocate chunk of memory
from the end of DRAM (2GB) leading bad address error even though the
address was technically valid.

Fix this issue by limiting the memblock if available memory spans the
entire address space.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-15 21:35:47 -08:00
Atish Patra 797f0375dd
RISC-V: Do not allocate memblock while iterating reserved memblocks
Currently, resource tree allocates memory blocks while iterating on the
list. It leads to following kernel warning because memblock allocation
also invokes memory block reservation API.

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:795
__insert_resource+0x8e/0xd0
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-00022-ge20097fb37e2-dirty #549
[    0.000000] epc: c00125c2 ra : c001262c sp : c1c01f50
[    0.000000]  gp : c1d456e0 tp : c1c0a980 t0 : ffffcf20
[    0.000000]  t1 : 00000000 t2 : 00000000 s0 : c1c01f60
[    0.000000]  s1 : ffffcf00 a0 : ffffff00 a1 : c1c0c0c4
[    0.000000]  a2 : 80c12b15 a3 : 80402000 a4 : 80402000
[    0.000000]  a5 : c1c0c0c4 a6 : 80c12b15 a7 : f5faf600
[    0.000000]  s2 : c1c0c0c4 s3 : c1c0e000 s4 : c1009a80
[    0.000000]  s5 : c1c0c000 s6 : c1d48000 s7 : c1613b4c
[    0.000000]  s8 : 00000fff s9 : 80000200 s10: c1613b40
[    0.000000]  s11: 00000000 t3 : c1d4a000 t4 : ffffffff

This is also unnecessary as we can pre-compute the total memblocks required
for each memory region and allocate it before the loop. It save precious
boot time not going through memblock allocation code every time.

Fixes: 00ab027a3b ("RISC-V: Add kernel image sections to the resource tree")

Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-15 21:35:46 -08:00
Pravin B Shelar 9ab7e76aef GTP: add support for flow based tunneling API
Following patch add support for flow based tunneling API
to send and recv GTP tunnel packet over tunnel metadata API.
This would allow this device integration with OVS or eBPF using
flow based tunneling APIs.

Signed-off-by: Pravin B Shelar <pbshelar@fb.com>
Link: https://lore.kernel.org/r/20210110070021.26822-1-pbshelar@fb.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:31:49 -08:00
Yejune Deng f4d133d86a tcp_cubic: use memset and offsetof init
In bictcp_reset(), use memset and offsetof instead of = 0.

Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Link: https://lore.kernel.org/r/1610597696-128610-1-git-send-email-yejune.deng@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:22:16 -08:00
Menglong Dong b69df26082 net: tap: check vlan with eth_type_vlan() method
Replace some checks for ETH_P_8021Q and ETH_P_8021AD in
drivers/net/tap.c with eth_type_vlan.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20210115023238.4681-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:13:49 -08:00
Geliang Tang 32d91b4af3 nfc: netlink: use &w->w in nfc_genl_rcv_nl_event
Use the struct member w of the struct urelease_work directly instead of
casting it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Link: https://lore.kernel.org/r/f0ed86d6d54ac0834bd2e161d172bf7bb5647cf7.1610683862.git.geliangtang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:08:28 -08:00
Jakub Kicinski 58f9f9b555 Merge branch 'configuring-congestion-watermarks-on-ocelot-switch-using-devlink-sb'
Vladimir Oltean says:

====================
Configuring congestion watermarks on ocelot switch using devlink-sb

In some applications, it is important to create resource reservations in
the Ethernet switches, to prevent background traffic, or deliberate
attacks, from inducing denial of service into the high-priority traffic.

These patches give the user some knobs to turn. The ocelot switches
support per-port and per-port-tc reservations, on ingress and on egress.
The resources that are monitored are packet buffers (in cells of 60
bytes each) and frame references.

The frames that exceed the reservations can optionally consume from
sharing watermarks which are not per-port but global across the switch.
There are 10 sharing watermarks, 8 of them are per traffic class and 2
are per drop priority.

I am configuring the hardware using the best of my knowledge, and mostly
through trial and error. Same goes for devlink-sb integration. Feedback
is welcome.
====================

Link: https://lore.kernel.org/r/20210115021120.3055988-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:03:41 -08:00
Vladimir Oltean f59fd9cab7 net: mscc: ocelot: configure watermarks using devlink-sb
Using devlink-sb, we can configure 12/16 (the important 75%) of the
switch's controlling watermarks for congestion drops, and we can monitor
50% of the watermark occupancies (we can monitor the reservation
watermarks, but not the sharing watermarks, which are exposed as pool
sizes).

The following definitions can be made:

SB_BUF=0 # The devlink-sb for frame buffers
SB_REF=1 # The devlink-sb for frame references
POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances
           # have one of these.
POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances
           # have one of these.

Editing the hardware watermarks is done in the following way:
BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING
REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING
BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR
REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR

Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly
by modifying the corresponding pool size. By default, the pool size has
maximum size, so this can be skipped.

devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \
	size 129840 thtype static

Since by default there is no buffer reservation, the above command has
maxed out BUF_COL_SHR_I(dp=0).

Configuring the per-port reservation watermark (P_RSRV) is done in the
following way:

devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \
	pool $POOL_ING th 1000

The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this
command, the sharing watermarks are internally reconfigured with 1000
bytes less, i.e. from 129840 bytes to 128840 bytes.

Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in
the following way:

for tc in {0..7}; do
	devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \
		type ingress pool $POOL_ING \
		th 3000
done

The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes.
The sharing watermarks are again reconfigured with 24000 bytes less.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:35 -08:00
Vladimir Oltean a4ae997adc net: mscc: ocelot: initialize watermarks to sane defaults
This is meant to be a gentle introduction into the world of watermarks
on ocelot. The code is placed in ocelot_devlink.c because it will be
integrated with devlink, even if it isn't right now.

My first step was intended to be to replicate the default configuration
of the congestion watermarks programatically, since they are now going
to be tuned by the user.

But after studying and understanding through trial and error how they
work, I now believe that the configuration used out of reset does not do
justice to the word "reservation", since the sum of all reservations
exceeds the total amount of resources (otherwise said, all reservations
cannot be fulfilled at the same time, which means that, contrary to the
reference manual, they don't guarantee anything).

As an example, here's a dump of the reservation watermarks for frame
buffers, for port 0 (for brevity, the ports 1-6 were omitted, but they
have the same configuration):

BUF_Q_RSRV_I(port 0, prio 0) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 1) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 2) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 3) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 4) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 5) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 6) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 7) = max 3000 bytes

Otherwise said, every port-tc has an ingress reservation of 3000 bytes,
and there are 7 ports in VSC9959 Felix (6 user ports and 1 CPU port).
Concentrating only on the ingress reservations, there are, in total,
8 [traffic classes] x 7 [ports] x 3000 [bytes] = 168,000 bytes of memory
reserved on ingress.
But, surprise, Felix only has 128 KB of packet buffer in total...
A similar thing happens with Seville, which has a larger packet buffer,
but also more ports, and the default configuration is also overcommitted.

This patch disables the (apparently) bogus reservations and moves all
resources to the shared area. This way, real reservations can be set up
by the user, using devlink-sb.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean 6c30384eb1 net: mscc: ocelot: register devlink ports
Add devlink integration into the mscc_ocelot switchdev driver. All
physical ports (i.e. the unused ones as well) except the CPU port module
at ocelot->num_phys_ports are registered with devlink, and that requires
keeping the devlink_port structure outside struct ocelot_port_private,
since the latter has a 1:1 mapping with a struct net_device (which does
not exist for unused ports).

Since we use devlink_port_type_eth_set to link the devlink port to the
net_device, we can as well remove the .ndo_get_phys_port_name and
.ndo_get_port_parent_id implementations, since devlink takes care of
retrieving the port name and number automatically, once
.ndo_get_devlink_port is implemented.

Note that the felix DSA driver is already integrated with devlink by
default, since that is a thing that the DSA core takes care of. This is
the reason why these devlink stubs were put in ocelot_net.c and not in
the common library. It is also the reason why ocelot::devlink is a
pointer and not a full structure embedded inside struct ocelot: because
the mscc_ocelot driver allocates that by itself (as the container of
struct ocelot, in fact), but in the case of felix, it is DSA who
allocates the devlink, and felix just propagates the pointer towards
struct ocelot.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean c6c65d47dd net: mscc: ocelot: delete unused ocelot_set_cpu_port prototype
This is a leftover of commit 69df578c5f ("net: mscc: ocelot: eliminate
confusion between CPU and NPI port") which renamed that function.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean 70d39a6e62 net: mscc: ocelot: export NUM_TC constant from felix to common switch lib
We should be moving anything that isn't DSA-specific or SoC-specific out
of the felix DSA driver, and into the common mscc_ocelot switch library.

The number of traffic classes is one of the aspects that is common
between all ocelot switches, so it belongs in the library.

This patch also makes seville use 8 TX queues, and therefore enables
prioritization via the QOS_CLASS field in the NPI injection header.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean d19741b0f5 net: dsa: felix: perform teardown in reverse order of setup
In general it is desirable that cleanup is the reverse process of setup.
In this case I am not seeing any particular issue, but with the
introduction of devlink-sb for felix, a non-obvious decision had to be
made as to where to put its cleanup method. When there's a convention in
place, that decision becomes obvious.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean a7096915e4 net: dsa: felix: reindent struct dsa_switch_ops
The devlink function pointer names are super long, and they would break
the alignment. So reindent the existing ops now by adding one tab.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean 2a6ef76303 net: dsa: add ops for devlink-sb
Switches that care about QoS might have hardware support for reserving
buffer pools for individual ports or traffic classes, and configuring
their sizes and thresholds. Through devlink-sb (shared buffers), this is
all configurable, as well as their occupancy being viewable.

Add the plumbing in DSA for these operations.

Individual drivers still need to call devlink_sb_register() with the
shared buffers they want to expose. A helper was not created in DSA for
this purpose (unlike, say, dsa_devlink_params_register), since in my
opinion it does not bring any benefit over plainly calling
devlink_sb_register() directly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean 703b762190 net: mscc: ocelot: add ops for decoding watermark threshold and occupancy
We'll need to read back the watermark thresholds and occupancy from
hardware (for devlink-sb integration), not only to write them as we did
so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct
ocelot_ops, similar to wm_enc, and implement them for the 3 supported
mscc_ocelot switches.

Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT
register, because that doesn't scale with the number of switches that
mscc_ocelot supports now. They have different bit widths for the
watermarks, and we need function pointers to abstract that difference
away.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:34 -08:00
Vladimir Oltean f6fe01d6fa net: mscc: ocelot: auto-detect packet buffer size and number of frame references
Instead of reading these values from the reference manual and writing
them down into the driver, it appears that the hardware gives us the
option of detecting them dynamically.

The number of frame references corresponds to what the reference manual
notes, however it seems that the frame buffers are reported as slightly
less than the books would indicate. On VSC9959 (Felix), the books say it
should have 128KB of packet buffer, but the registers indicate only
129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from
the documentation of all these devices is incorrect (taken from an older
generation). This was confirmed by Younes Leroul from Microchip support.

Not having anything better to do with these values at the moment* (this
will change soon), let's just print them.

*The frame buffer size is, in fact, used to calculate the tail dropping
watermarks.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 20:02:33 -08:00
Christoph Hellwig a959a9782f iov_iter: fix the uaccess area in copy_compat_iovec_from_user
sizeof needs to be called on the compat pointer, not the native one.

Fixes: 89cd35c58b ("iov_iter: transparently handle compat iovecs in import_iovec")
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-15 22:51:42 -05:00
Eric Dumazet bcd0cf19ef net_sched: avoid shift-out-of-bounds in tcindex_set_parms()
tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT
attribute is not silly.

UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29
shift exponent 255 is too large for 32-bit type 'int'
CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 valid_perfect_hash net/sched/cls_tcindex.c:260 [inline]
 tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425
 tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546
 tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127
 rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:672
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2390
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 18:11:10 -08:00
Eric Dumazet dd5e073381 net_sched: gen_estimator: support large ewma log
syzbot report reminded us that very big ewma_log were supported in the past,
even if they made litle sense.

tc qdisc replace dev xxx root est 1sec 131072sec ...

While fixing the bug, also add boundary checks for ewma_log, in line
with range supported by iproute2.

UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83
 call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417
 expire_timers kernel/time/timer.c:1462 [inline]
 __run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731
 __run_timers kernel/time/timer.c:1712 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744
 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343
 asm_call_irq_on_stack+0xf/0x20
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:226 [inline]
 __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432
 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516

Fixes: 1c0d32fde5 ("net_sched: gen_estimator: complete rewrite of rate estimators")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20210114181929.1717985-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 18:11:06 -08:00
Eric Dumazet e4bedf48aa net_sched: reject silly cell_log in qdisc_get_rtab()
iproute2 probably never goes beyond 8 for the cell exponent,
but stick to the max shift exponent for signed 32bit.

UBSAN reported:
UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
shift exponent 130 is too large for 32-bit type 'int'
CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x183/0x22e lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:148 [inline]
 __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg net/socket.c:672 [inline]
 ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
 ___sys_sendmsg net/socket.c:2399 [inline]
 __sys_sendmsg+0x319/0x400 net/socket.c:2432
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 18:06:45 -08:00
Linus Torvalds 1d94330a43 - Fix DM-raid's raid1 discard limits so discards work.
- Select missing Kconfig dependencies for DM integrity and zoned
   targets.
 
 - 4 fixes for DM crypt target's support to optionally bypass
   kcryptd workqueues.
 
 - Fix DM snapshot merge supports missing data flushes before
   committing metadata.
 
 - Fix DM integrity data device flushing when external metadata is
   used.
 
 - Fix DM integrity's maximum number of supported constructor arguments
   that user can request when creating an integrity device.
 
 - Eliminate DM core ioctl logging noise when an ioctl is issued
   without required CAP_SYS_RAWIO permission.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmACJwsTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWr93B/sFpKbCjRxZhGdOz8u2UIV9jGoSzAaZ
 3qdSTruCp1HMTsz0J/+bFBTMQKKap+8u8kzhCOLAqtZgEz13P0D3NyDUGiCxj07O
 vPZJQ3COrClknq5byKws/aOKcnbr25odk6Zq+gFmk47pdNFBMkzNMf0IbdVJsj19
 d0tnTRkIiycC8hnXOr/tHufnmERrw9nZpq3cLEHT6vaXc4FxcYXbo4UEMIwKjgcB
 AnsRrqpqKyayG+G/SEs01JM9eYznWRPqf3L03wTWfieAsFjDoIp/ek8iXWjEwLtZ
 BnhxFwodWvycFb9ITaxlFgfyiBSgU1f3tsIYjU4MHCSFI6u5b83y57Dl
 =FnV7
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM-raid's raid1 discard limits so discards work.

 - Select missing Kconfig dependencies for DM integrity and zoned
   targets.

 - Four fixes for DM crypt target's support to optionally bypass kcryptd
   workqueues.

 - Fix DM snapshot merge supports missing data flushes before committing
   metadata.

 - Fix DM integrity data device flushing when external metadata is used.

 - Fix DM integrity's maximum number of supported constructor arguments
   that user can request when creating an integrity device.

 - Eliminate DM core ioctl logging noise when an ioctl is issued without
   required CAP_SYS_RAWIO permission.

* tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm crypt: defer decryption to a tasklet if interrupts disabled
  dm integrity: fix the maximum number of arguments
  dm crypt: do not call bio_endio() from the dm-crypt tasklet
  dm integrity: fix flush with external metadata device
  dm: eliminate potential source of excessive kernel log noise
  dm snapshot: flush merged data before committing metadata
  dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
  dm crypt: do not wait for backlogged crypto request completion in softirq
  dm zoned: select CONFIG_CRC32
  dm integrity: select CRYPTO_SKCIPHER
  dm raid: fix discard limits for raid1
2021-01-15 18:01:17 -08:00
Jakub Kicinski 2d9116be76 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-01-16

1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
   that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.

2) Add support for using kernel module global variables (__ksym externs in BPF
   programs) retrieved via module's BTF, from Andrii Nakryiko.

3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
   stored in mmap2 event for perf, from Jiri Olsa.

4) Various fixes for cross-building BPF sefltests out-of-tree which then will
   unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.

5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.

6) Clean up driver's XDP buffer init and split into two helpers to init per-
   descriptor and non-changing fields during processing, from Lorenzo Bianconi.

7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
  perf: Add build id data in mmap2 event
  bpf: Add size arg to build_id_parse function
  bpf: Move stack_map_get_build_id into lib
  bpf: Document new atomic instructions
  bpf: Add tests for new BPF atomic operations
  bpf: Add bitwise atomic instructions
  bpf: Pull out a macro for interpreting atomic ALU operations
  bpf: Add instructions for atomic_[cmp]xchg
  bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
  bpf: Move BPF_STX reserved field check into BPF_STX verifier code
  bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
  bpf: x86: Factor out a lookup table for some ALU opcodes
  bpf: x86: Factor out emission of REX byte
  bpf: x86: Factor out emission of ModR/M for *(reg + off)
  tools/bpftool: Add -Wall when building BPF programs
  bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
  selftests/bpf: Install btf_dump test cases
  selftests/bpf: Fix installation of urandom_read
  selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
  selftests/bpf: Fix out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 17:57:26 -08:00
Tom Rix 3ada665b8f net: ks8851: remove definition of DEBUG
Defining DEBUG should only be done in development.
So remove DEBUG.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20210115153128.131026-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 17:53:29 -08:00