Commit Graph

1262 Commits

Author SHA1 Message Date
Will Deacon 4109823037 arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes
Commit fb396bb459 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
removed TLB invalidation from get_clear_flush() [now get_clear_contig()]
on the basis that the core TLB invalidation code is aware of hugetlb
mappings backed by contiguous page-table entries and will cover the
correct virtual address range.

However, this change also resulted in the TLB invalidation being removed
from the "break" step in the break-before-make (BBM) sequence used
internally by huge_ptep_set_{access_flags,wrprotect}(), therefore
making the BBM sequence unsafe irrespective of later invalidation.

Although the architecture is desperately unclear about how exactly
contiguous ptes should be updated in a live page-table, restore TLB
invalidation to our BBM sequence under the assumption that BBM is the
right thing to be doing in the first place.

Fixes: fb396bb459 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220629095349.25748-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-07-01 18:29:26 +01:00
Will Deacon c50f11c619 arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers

When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:

  (1) If the DMA transfer does not write to every byte in the buffer,
      then the unwritten bytes will contain stale data once the transfer
      has completed.

  (2) If the buffer has a virtual alias in userspace, then stale data
      may be visible via this alias during the period between performing
      the cache invalidation and the DMA writes landing in memory.

Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-06-17 19:06:06 +01:00
Oleksandr Tyshchenko 9bf22421dc arm/xen: Introduce xen_setup_dma_ops()
This patch introduces new helper and places it in new header.
The helper's purpose is to assign any Xen specific DMA ops in
a single place. For now, we deal with xen-swiotlb DMA ops only.
The one of the subsequent commits in current series will add
xen-grant DMA ops case.

Also re-use the xen_swiotlb_detect() check on Arm32.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
[For arm64]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1654197833-25362-2-git-send-email-olekstysh@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06 08:54:33 +02:00
Baolin Wang e68b823ab0 arm64/hugetlb: Fix building errors in huge_ptep_clear_flush()
Fix the arm64 build error which was caused by commit ae07562909 ("mm:
change huge_ptep_clear_flush() to return the original pte") interacting
with commit fb396bb459 ("arm64/hugetlb: Drop TLB flush from
get_clear_flush()"):

  arch/arm64/mm/hugetlbpage.c: In function ‘huge_ptep_clear_flush’:
  arch/arm64/mm/hugetlbpage.c:515:9: error: implicit declaration of function ‘get_clear_flush’; did you mean ‘ptep_clear_flush’? [-Werror=implicit-function-declaration]
    515 |  return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
        |         ^~~~~~~~~~~~~~~
        |         ptep_clear_flush

Due to the new get_clear_contig() has dropped TLB flush, we should add
an explicit TLB flush in huge_ptep_clear_flush() to keep original
semantics when changing to use new get_clear_contig().

Fixes: fb396bb459 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()").
Fixes: ae07562909 ("mm: change huge_ptep_clear_flush() to return the original pte")
Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27 10:56:35 -07:00
Linus Torvalds 98931dd95f Yang Shi has improved the behaviour of khugepaged collapsing of readonly
file-backed transparent hugepages.
 
 Johannes Weiner has arranged for zswap memory use to be tracked and
 managed on a per-cgroup basis.
 
 Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
 enablement of the recent huge page vmemmap optimization feature.
 
 Baolin Wang contributes a series to fix some issues around hugetlb
 pagetable invalidation.
 
 Zhenwei Pi has fixed some interactions between hwpoisoned pages and
 virtualization.
 
 Tong Tiangen has enabled the use of the presently x86-only
 page_table_check debugging feature on arm64 and riscv.
 
 David Vernet has done some fixup work on the memcg selftests.
 
 Peter Xu has taught userfaultfd to handle write protection faults against
 shmem- and hugetlbfs-backed files.
 
 More DAMON development from SeongJae Park - adding online tuning of the
 feature and support for monitoring of fixed virtual address ranges.  Also
 easier discovery of which monitoring operations are available.
 
 Nadav Amit has done some optimization of TLB flushing during mprotect().
 
 Neil Brown continues to labor away at improving our swap-over-NFS support.
 
 David Hildenbrand has some fixes to anon page COWing versus
 get_user_pages().
 
 Peng Liu fixed some errors in the core hugetlb code.
 
 Joao Martins has reduced the amount of memory consumed by device-dax's
 compound devmaps.
 
 Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
 
 Muchun Song has found and fixed some errors in the TLB flushing of
 transparent hugepages.
 
 Roman Gushchin has done more work on the memcg selftests.
 
 And, of course, many smaller fixes and cleanups.  Notably, the customary
 million cleanup serieses from Miaohe Lin.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
 jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
 PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
 =nFe6
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
2022-05-26 12:32:41 -07:00
Linus Torvalds 3f306ea2e1 dma-mapping updates for Linux 5.19
- don't over-decrypt memory (Robin Murphy)
  - takes min align mask into account for the swiotlb max mapping size
    (Tianyu Lan)
  - use GFP_ATOMIC in dma-debug (Mikulas Patocka)
  - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)
  - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)
  - cleanup swiotlb initialization and share more code with swiotlb-xen
    (me, Stefano Stabellini)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmKObTQLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYObmA//dIcDB/q4iFGD+WJh4MhM+asx0ZsdF2OJz42WEhgT
 Z9duOrgcneEQundCamqJP9rNTs980LHDA8uWQC5rZEc9vxuRVOdS7bSgYRUwWh6B
 r0ZjOsvQCn+ChoZML8uyk4rfmEINq+EvJuec3G5fgecZOhPuJS2i2uzzv5cHwqgP
 ChC0fwyZlkfdECXgvZXbEoCJLfTgGNlziN6Ai8dirSoqgEQUoCsY89/M7OiEBvV2
 R4XUWD7OvQERfB4t6xLuUHyzf9PAuWB+OiblRVNeAmK3lMjxVrc3k4kIowgklnzD
 8hfmphAa9Zou3zdfi6Gd4fiQRHRVOwKVp1rtqUmJ+lPSiwyMzu64z9ld2+2qac0h
 V4sSr/yJkhxnBT4/0MkTChvhnRobisackpUzNRpiM4ck7cNVb7eAvkISsbH+pWI9
 aEexPhbyskjlV+GOyM4QL4ygG0dpXY0HSyoh6uaSVsaXMycnWIsJCPidXxV1HGV0
 q2/RLHuHwYxia8cYCF01/DQvwOKSjwbU0zModxtRezGD5GYh2C0a+SrA1aX+qiTu
 yGJCs2UHtSQstAt78tTVp499YeDeL/oGSQkPAu8zyRkSczzF+CncGTuXyoJbAWyK
 otcgERWljgZ4scxjfu1uacfoVhKQ7nOu7hiJokL0U80FESAennLC3ZlocvB9h/ff
 HNA=
 =n2rk
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - don't over-decrypt memory (Robin Murphy)

 - takes min align mask into account for the swiotlb max mapping size
   (Tianyu Lan)

 - use GFP_ATOMIC in dma-debug (Mikulas Patocka)

 - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me)

 - don't fail on highmem CMA pages in dma_direct_alloc_pages (me)

 - cleanup swiotlb initialization and share more code with swiotlb-xen
   (me, Stefano Stabellini)

* tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits)
  dma-direct: don't over-decrypt memory
  swiotlb: max mapping size takes min align mask into account
  swiotlb: use the right nslabs-derived sizes in swiotlb_init_late
  swiotlb: use the right nslabs value in swiotlb_init_remap
  swiotlb: don't panic when the swiotlb buffer can't be allocated
  dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC
  dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages
  swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm
  x86: remove cruft from <asm/dma-mapping.h>
  swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl
  swiotlb: merge swiotlb-xen initialization into swiotlb
  swiotlb: provide swiotlb_init variants that remap the buffer
  swiotlb: pass a gfp_mask argument to swiotlb_init_late
  swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction
  swiotlb: make the swiotlb_init interface more useful
  x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
  x86: remove the IOMMU table infrastructure
  MIPS/octeon: use swiotlb_init instead of open coding it
  arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region
  swiotlb: rename swiotlb_late_init_with_default_size
  ...
2022-05-25 19:18:36 -07:00
Linus Torvalds 143a6252e1 arm64 updates for 5.19:
- Initial support for the ARMv9 Scalable Matrix Extension (SME). SME
   takes the approach used for vectors in SVE and extends this to provide
   architectural support for matrix operations. No KVM support yet, SME
   is disabled in guests.
 
 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.
 
 - btrfs search_ioctl() fix for live-lock with sub-page faults.
 
 - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring
   coherent I/O traffic, support for Arm's CMN-650 and CMN-700
   interconnect PMUs, minor driver fixes, kerneldoc cleanup.
 
 - Kselftest updates for SME, BTI, MTE.
 
 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.
 
 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).
 
 - stacktrace cleanups.
 
 - ftrace cleanups.
 
 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmKH19IACgkQa9axLQDI
 XvEFWg//bf0p6zjeNaOJmBbyVFsXsVyYiEaLUpFPUs3oB+81s2YZ+9i1rgMrNCft
 EIDQ9+/HgScKxJxnzWf68heMdcBDbk76VJtLALExbge6owFsjByQDyfb/b3v/bLd
 ezAcGzc6G5/FlI1IP7ct4Z9MnQry4v5AG8lMNAHjnf6GlBS/tYNAqpmj8HpQfgRQ
 ZbhfZ8Ayu3TRSLWL39NHVevpmxQm/bGcpP3Q9TtjUqg0r1FQ5sK/LCqOksueIAzT
 UOgUVYWSFwTpLEqbYitVqgERQp9LiLoK5RmNYCIEydfGM7+qmgoxofSq5e2hQtH2
 SZM1XilzsZctRbBbhMit1qDBqMlr/XAy/R5FO0GauETVKTaBhgtj6mZGyeC9nU/+
 RGDljaArbrOzRwMtSuXF+Fp6uVo5spyRn1m8UT/k19lUTdrV9z6EX5Fzuc4Mnhed
 oz4iokbl/n8pDObXKauQspPA46QpxUYhrAs10B/ELc3yyp/Qj3jOfzYHKDNFCUOq
 HC9mU+YiO9g2TbYgCrrFM6Dah2E8fU6/cR0ZPMeMgWK4tKa+6JMEINYEwak9e7M+
 8lZnvu3ntxiJLN+PrPkiPyG+XBh2sux1UfvNQ+nw4Oi9xaydeX7PCbQVWmzTFmHD
 q7UPQ8220e2JNCha9pULS8cxDLxiSksce06DQrGXwnHc1Ir7T04=
 =0DjE
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - Initial support for the ARMv9 Scalable Matrix Extension (SME).

   SME takes the approach used for vectors in SVE and extends this to
   provide architectural support for matrix operations. No KVM support
   yet, SME is disabled in guests.

 - Support for crashkernel reservations above ZONE_DMA via the
   'crashkernel=X,high' command line option.

 - btrfs search_ioctl() fix for live-lock with sub-page faults.

 - arm64 perf updates: support for the Hisilicon "CPA" PMU for
   monitoring coherent I/O traffic, support for Arm's CMN-650 and
   CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.

 - Kselftest updates for SME, BTI, MTE.

 - Automatic generation of the system register macros from a 'sysreg'
   file describing the register bitfields.

 - Update the type of the function argument holding the ESR_ELx register
   value to unsigned long to match the architecture register size
   (originally 32-bit but extended since ARMv8.0).

 - stacktrace cleanups.

 - ftrace cleanups.

 - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
   avoid executable mappings in kexec/hibernate code, drop TLB flushing
   from get_clear_flush() (and rename it to get_clear_contig()),
   ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Move sve_free() into SVE code section
  arm64: Kconfig.platforms: Add comments
  arm64: Kconfig: Fix indentation and add comments
  arm64: mm: avoid writable executable mappings in kexec/hibernate code
  arm64: lds: move special code sections out of kernel exec segment
  arm64/hugetlb: Implement arm64 specific huge_ptep_get()
  arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
  arm64: kdump: Do not allocate crash low memory if not needed
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  ...
2022-05-23 21:06:11 -07:00
Catalin Marinas 0616ea3f1b Merge branch 'for-next/esr-elx-64-bit' into for-next/core
* for-next/esr-elx-64-bit:
  : Treat ESR_ELx as a 64-bit register.
  KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high
  KVM: arm64: Treat ESR_EL2 as a 64-bit register
  arm64: Treat ESR_ELx as a 64-bit register
  arm64: compat: Do not treat syscall number as ESR_ELx for a bad syscall
  arm64: Make ESR_ELx_xVC_IMM_MASK compatible with assembly
2022-05-20 18:51:54 +01:00
Catalin Marinas e003d5335c Merge branch 'for-next/sysreg-gen' into for-next/core
* for-next/sysreg-gen: (32 commits)
  : Automatic system register definition generation.
  arm64/sysreg: Generate definitions for FAR_ELx
  arm64/sysreg: Generate definitions for DACR32_EL2
  arm64/sysreg: Generate definitions for CSSELR_EL1
  arm64/sysreg: Generate definitions for CPACR_ELx
  arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
  arm64/sysreg: Generate definitions for CLIDR_EL1
  arm64/sve: Generate ZCR definitions
  arm64/sme: Generate defintions for SVCR
  arm64/sme: Generate SMPRI_EL1 definitions
  arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
  arm64/sme: Automatically generate SMIDR_EL1 defines
  arm64/sme: Automatically generate defines for SMCR
  arm64/sysreg: Support generation of RAZ fields
  arm64/sme: Remove _EL0 from name of SVCR - FIXME sysreg.h
  arm64/sme: Standardise bitfield names for SVCR
  arm64/sme: Drop SYS_ from SMIDR_EL1 defines
  arm64/fp: Rename SVE and SME LEN field name to _WIDTH
  arm64/fp: Make SVE and SME length register definition match architecture
  arm64/sysreg: fix odd line spacing
  arm64/sysreg: improve comment for regs without fields
  ...
2022-05-20 18:50:57 +01:00
Catalin Marinas 201729d53a Merge branches 'for-next/sme', 'for-next/stacktrace', 'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf/arm-cmn: Decode CAL devices properly in debugfs
  perf/arm-cmn: Fix filter_sel lookup
  perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first
  drivers/perf: hisi: Add Support for CPA PMU
  drivers/perf: hisi: Associate PMUs in SICL with CPUs online
  drivers/perf: arm_spe: Expose saturating counter to 16-bit
  perf/arm-cmn: Add CMN-700 support
  perf/arm-cmn: Refactor occupancy filter selector
  perf/arm-cmn: Add CMN-650 support
  dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700
  perf: check return value of armpmu_request_irq()
  perf: RISC-V: Remove non-kernel-doc ** comments

* for-next/sme: (30 commits)
  : Scalable Matrix Extensions support.
  arm64/sve: Move sve_free() into SVE code section
  arm64/sve: Make kernel FPU protection RT friendly
  arm64/sve: Delay freeing memory in fpsimd_flush_thread()
  arm64/sme: More sensibly define the size for the ZA register set
  arm64/sme: Fix NULL check after kzalloc
  arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding()
  arm64/sme: Provide Kconfig for SME
  KVM: arm64: Handle SME host state when running guests
  KVM: arm64: Trap SME usage in guest
  KVM: arm64: Hide SME system registers from guests
  arm64/sme: Save and restore streaming mode over EFI runtime calls
  arm64/sme: Disable streaming mode and ZA when flushing CPU state
  arm64/sme: Add ptrace support for ZA
  arm64/sme: Implement ptrace support for streaming mode SVE registers
  arm64/sme: Implement ZA signal handling
  arm64/sme: Implement streaming SVE signal handling
  arm64/sme: Disable ZA and streaming mode when handling signals
  arm64/sme: Implement traps and syscall handling for SME
  arm64/sme: Implement ZA context switching
  arm64/sme: Implement streaming SVE context switching
  ...

* for-next/stacktrace:
  : Stacktrace cleanups.
  arm64: stacktrace: align with common naming
  arm64: stacktrace: rename stackframe to unwind_state
  arm64: stacktrace: rename unwinder functions
  arm64: stacktrace: make struct stackframe private to stacktrace.c
  arm64: stacktrace: delete PCS comment
  arm64: stacktrace: remove NULL task check from unwind_frame()

* for-next/fault-in-subpage:
  : btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable().
  btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults
  arm64: Add support for user sub-page fault probing
  mm: Add fault_in_subpage_writeable() to probe at sub-page granularity

* for-next/misc:
  : Miscellaneous patches.
  arm64: Kconfig.platforms: Add comments
  arm64: Kconfig: Fix indentation and add comments
  arm64: mm: avoid writable executable mappings in kexec/hibernate code
  arm64: lds: move special code sections out of kernel exec segment
  arm64/hugetlb: Implement arm64 specific huge_ptep_get()
  arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
  arm64: mm: Make arch_faults_on_old_pte() check for migratability
  arm64: mte: Clean up user tag accessors
  arm64/hugetlb: Drop TLB flush from get_clear_flush()
  arm64: Declare non global symbols as static
  arm64: mm: Cleanup useless parameters in zone_sizes_init()
  arm64: fix types in copy_highpage()
  arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE
  arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK
  arm64: document the boot requirements for MTE
  arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE

* for-next/ftrace:
  : ftrace cleanups.
  arm64/ftrace: Make function graph use ftrace directly
  ftrace: cleanup ftrace_graph_caller enable and disable

* for-next/crashkernel:
  : Support for crashkernel reservations above ZONE_DMA.
  arm64: kdump: Do not allocate crash low memory if not needed
  docs: kdump: Update the crashkernel description for arm64
  of: Support more than one crash kernel regions for kexec -s
  of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
  arm64: kdump: Reimplement crashkernel=X
  arm64: Use insert_resource() to simplify code
  kdump: return -ENOENT if required cmdline option does not exist
2022-05-20 18:50:35 +01:00
Ard Biesheuvel 01142791b0 arm64: mm: avoid writable executable mappings in kexec/hibernate code
The temporary mappings of the low-level kexec and hibernate helpers are
created with both writable and executable attributes, which is not
necessary here, and generally best avoided. So use read-only, executable
attributes instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220429131347.3621090-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-17 09:32:45 +01:00
Baolin Wang bc5dfb4fd7 arm64/hugetlb: Implement arm64 specific huge_ptep_get()
Now we use huge_ptep_get() to get the pte value of a hugetlb page,
however it will only return one specific pte value for the CONT-PTE
or CONT-PMD size hugetlb on ARM64 system, which can contain several
continuous pte or pmd entries with same page table attributes. And it
will not take into account the subpages' dirty or young bits of a
CONT-PTE/PMD size hugetlb page.

So the huge_ptep_get() is inconsistent with huge_ptep_get_and_clear(),
which already takes account the dirty or young bits for any subpages
in this CONT-PTE/PMD size hugetlb [1]. Meanwhile we can miss dirty or
young flags statistics for hugetlb pages with current huge_ptep_get(),
such as the gather_hugetlb_stats() function, and CONT-PTE/PMD hugetlb
monitoring with DAMON.

Thus define an ARM64 specific huge_ptep_get() implementation as well as
enabling __HAVE_ARCH_HUGE_PTEP_GET, that will take into account any
subpages' dirty or young bits for CONT-PTE/PMD size hugetlb page, for
those functions that want to check the dirty and young flags of a hugetlb
page.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/

Suggested-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/624109a80ac4bbdf1e462dfa0b49e9f7c31a7c0d.1652496622.git.baolin.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 23:39:07 +01:00
Baolin Wang f0d9d79ec7 arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
The original huge_ptep_get() on ARM64 is just a wrapper of ptep_get(),
which will not take into account any contig-PTEs dirty and access bits.
Meanwhile we will implement a new ARM64-specific huge_ptep_get()
interface in following patch, which will take into account any contig-PTEs
dirty and access bits. To keep the same efficient logic to get the pte
value, change to use ptep_get() as a preparation.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/5113ed6e103f995e1d0f0c9fda0373b761bbcad2.1652496622.git.baolin.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 23:39:07 +01:00
Zhen Lei 8f0f104e2a arm64: kdump: Do not allocate crash low memory if not needed
When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low"
memory is not required in the following corner cases:
1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means
   that the devices can access any memory.
2. If the system memory is small, the crash high memory may be allocated
   from the DMA zones. If that happens, there's no need to allocate
   another crash low memory because there's already one.

Add condition '(crash_base >= CRASH_ADDR_LOW_MAX)' to determine whether
the 'high' memory is allocated above DMA zones. Note: when both
CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the entire physical
memory is DMA accessible, CRASH_ADDR_LOW_MAX equals 'PHYS_MASK + 1'.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220511032033.426-1-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16 20:01:38 +01:00
Baolin Wang ae07562909 mm: change huge_ptep_clear_flush() to return the original pte
Patch series "Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating", v4.

presently, migrating a hugetlb page or unmapping a poisoned hugetlb page,
we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue.  This patch set
will change to use hugetlb related APIs to fix this issue.

Note: Mike pointed out the huge_ptep_get() will only return the one
specific value, and it would not take into account the dirty or young bits
of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1].  This
inconsistent issue is not introduced by this patch set, and this issue
will be addressed in another thread [2].  Meanwhile the uffd for hugetlb
case [3] pointed out by Gerald also needs another patch to address.

[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
[2] https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/


This patch (of 3):

It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table
when unmapping or migrating a hugetlb page, and will change to use
huge_ptep_clear_flush() instead in the following patches.

So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.

[baolin.wang@linux.alibaba.com: fix build in several more architectures]
  Link: https://lkml.kernel.org/r/0009a4cd-2826-e8be-e671-f050d4f18d5d@linux.alibaba.com
[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20220511181531.7f27a5c1@canb.auug.org.au
Link: https://lkml.kernel.org/r/cover.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20f77ddab90baa249bd24504c413189b82acde69.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/cover.1652147571.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/dcf065868cce35bceaf138613ad27f17bb7c0c19.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:55 -07:00
Anshuman Khandual fb396bb459 arm64/hugetlb: Drop TLB flush from get_clear_flush()
This drops now redundant TLB flush in get_clear_flush() which is no longer
required after recent commit 697a1d44af ("tlb: hugetlb: Add more sizes to
tlb_remove_huge_tlb_entry"). It also renames this function i.e dropping off
'_flush' and replacing it with '__contig' as appropriate.

Cc: Will Deacon <will@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20220510043930.2410985-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-11 13:53:19 +01:00
Mike Rapoport 260364d112 arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
The semantics of pfn_valid() is to check presence of the memory map for a
PFN and not whether a PFN is covered by the linear map.  The memory map
may be present for NOMAP memory regions, but they won't be mapped in the
linear mapping.  Accessing such regions via __va() when they are
memremap()'ed will cause a crash.

On v5.4.y the crash happens on qemu-arm with UEFI [1]:

<1>[    0.084476] 8<--- cut here ---
<1>[    0.084595] Unable to handle kernel paging request at virtual address dfb76000
<1>[    0.084938] pgd = (ptrval)
<1>[    0.085038] [dfb76000] *pgd=5f7fe801, *pte=00000000, *ppte=00000000

...

<4>[    0.093923] [<c0ed6ce8>] (memcpy) from [<c16a06f8>] (dmi_setup+0x60/0x418)
<4>[    0.094204] [<c16a06f8>] (dmi_setup) from [<c16a38d4>] (arm_dmi_init+0x8/0x10)
<4>[    0.094408] [<c16a38d4>] (arm_dmi_init) from [<c0302e9c>] (do_one_initcall+0x50/0x228)
<4>[    0.094619] [<c0302e9c>] (do_one_initcall) from [<c16011e4>] (kernel_init_freeable+0x15c/0x1f8)
<4>[    0.094841] [<c16011e4>] (kernel_init_freeable) from [<c0f028cc>] (kernel_init+0x8/0x10c)
<4>[    0.095057] [<c0f028cc>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)

On kernels v5.10.y and newer the same crash won't reproduce on ARM because
commit b10d6bca87 ("arch, drivers: replace for_each_membock() with
for_each_mem_range()") changed the way memory regions are registered in
the resource tree, but that merely covers up the problem.

On ARM64 memory resources registered in yet another way and there the
issue of wrong usage of pfn_valid() to ensure availability of the linear
map is also covered.

Implement arch_memremap_can_ram_remap() on ARM and ARM64 to prevent access
to NOMAP regions via the linear mapping in memremap().

Link: https://lore.kernel.org/all/Yl65zxGgFzF1Okac@sirena.org.uk
Link: https://lkml.kernel.org/r/20220426060107.7618-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>	[5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09 17:34:28 -07:00
Chen Zhou 944a45abfa arm64: kdump: Reimplement crashkernel=X
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel in DMA zone, which
will fail when there is not enough low memory.
2. If reserving crashkernel above DMA zone, in this case, crash dump
kernel will fail to boot because there is no low memory available
for allocation.

To solve these issues, introduce crashkernel=X,[high,low].
The "crashkernel=X,high" is used to select a region above DMA zone, and
the "crashkernel=Y,low" is used to allocate specified size low memory.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Co-developed-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20220506114402.365-4-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-07 19:54:33 +01:00
Zhen Lei e6b394425c arm64: Use insert_resource() to simplify code
insert_resource() traverses the subtree layer by layer from the root node
until a proper location is found. Compared with request_resource(), the
parent node does not need to be determined in advance.

In addition, move the insertion of node 'crashk_res' into function
reserve_crashkernel() to make the associated code close together.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: John Donnelly  <john.p.donnelly@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20220506114402.365-3-thunder.leizhen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-07 19:54:33 +01:00
Kefeng Wang f41ef4c2ee arm64: mm: Cleanup useless parameters in zone_sizes_init()
Directly use max_pfn for max and no one use min, kill them.

Reviewed-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220411092455.1461-4-wangkefeng.wang@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-05 08:48:09 +01:00
Tong Tiangen 921d161f15 arm64: fix types in copy_highpage()
In copy_highpage() the `kto` and `kfrom` local variables are pointers to
struct page, but these are used to hold arbitrary pointers to kernel memory
. Each call to page_address() returns a void pointer to memory associated
with the relevant page, and copy_page() expects void pointers to this
memory.

This inconsistency was introduced in commit 2563776b41 ("arm64: mte:
Tags-aware copy_{user_,}highpage() implementations") and while this
doesn't appear to be harmful in practice it is clearly wrong.

Correct this by making `kto` and `kfrom` void pointers.

Fixes: 2563776b41 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220420030418.3189040-3-tongtiangen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04 20:00:13 +01:00
Mark Brown bc249e37b9 arm64/mte: Make TCF field values and naming more standard
In preparation for automatic generation of the defines for system registers
make the values used for the enumeration in SCTLR_ELx.TCF suitable for use
with the newly defined SYS_FIELD_PREP_ENUM helper, removing the shift from
the define and using the helper to generate it on use instead. Since we
only ever interact with this field in EL1 and in preparation for generation
of the defines also rename from SCTLR_ELx to SCTLR_EL1. SCTLR_EL2 is not
quite the same as SCTLR_EL1 so the conversion does not share the field
definitions.

There should be no functional change from this patch.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220503170233.507788-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04 15:30:27 +01:00
Alexandru Elisei 8d56e5c5a9 arm64: Treat ESR_ELx as a 64-bit register
In the initial release of the ARM Architecture Reference Manual for
ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This
changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture,
when they became 64-bit registers, with bits [63:32] defined as RES0. In
version G.a, a new field was added to ESR_ELx, ISS2, which covers bits
[36:32].  This field is used when the Armv8.7 extension FEAT_LS64 is
implemented.

As a result of the evolution of the register width, Linux stores it as
both a 64-bit value and a 32-bit value, which hasn't affected correctness
so far as Linux only uses the lower 32 bits of the register.

Make the register type consistent and always treat it as 64-bit wide. The
register is redefined as an "unsigned long", which is an unsigned
double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1,
page 14). The type was chosen because "unsigned int" is the most frequent
type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx
in exception handling, is also declared as "unsigned long". The 64-bit type
also makes adding support for architectural features that use fields above
bit 31 easier in the future.

The KVM hypervisor will receive a similar update in a subsequent patch.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29 19:26:27 +01:00
Muchun Song 47010c040d mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*
The word of "free" is not expressive enough to express the feature of
optimizing vmemmap pages associated with each HugeTLB, rename this keywork
to "optimize".  In this patch , cheanup configs to make code more
expressive.

Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:15 -07:00
Muchun Song f10f1442c3 mm: hugetlb_vmemmap: cleanup hugetlb_free_vmemmap_enabled*
The word of "free" is not expressive enough to express the feature of
optimizing vmemmap pages associated with each HugeTLB, rename this keywork
to "optimize".  In this patch , cheanup the static key and
hugetlb_free_vmemmap_enabled() to make code more expressive.

Link: https://lkml.kernel.org/r/20220404074652.68024-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:15 -07:00
Anshuman Khandual b3aca728fb arm64/mm: enable ARCH_HAS_VM_GET_PAGE_PROT
This defines and exports a platform specific custom vm_get_page_prot() via
subscribing ARCH_HAS_VM_GET_PAGE_PROT. It localizes arch_vm_get_page_prot()
and moves it near vm_get_page_prot().

Link: https://lkml.kernel.org/r/20220414062125.609297-4-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:13 -07:00
Muchun Song 1e63ac088f arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64
The feature of minimizing overhead of struct page associated with each
HugeTLB page aims to free its vmemmap pages (used as struct page) to save
memory, where is ~14GB/16GB per 1TB HugeTLB pages (2MB/1GB type).  In
short, when a HugeTLB page is allocated or freed, the vmemmap array
representing the range associated with the page will need to be remapped. 
When a page is allocated, vmemmap pages are freed after remapping.  When a
page is freed, previously discarded vmemmap pages must be allocated before
remapping.  More implementations and details can be found here [1].

The infrastructure of freeing vmemmap pages associated with each HugeTLB
page is already there, we can easily enable HUGETLB_PAGE_FREE_VMEMMAP for
arm64, the only thing to be fixed is flush_dcache_page() .

flush_dcache_page() need to be adapted to operate on the head page's flags
since the tail vmemmap pages are mapped with read-only after the feature
is enabled (clear operation is not permitted).

There was some discussions about this in the thread [2], but there was no
conclusion in the end.  And I copied the concern proposed by Anshuman to
here and explain why those concern is superfluous.  It is safe to enable
it for x86_64 as well as arm64.

1st concern:
'''
But what happens when a hot remove section's vmemmap area (which is
being teared down) is nearby another vmemmap area which is either created
or being destroyed for HugeTLB alloc/free purpose. As you mentioned
HugeTLB pages inside the hot remove section might be safe. But what about
other HugeTLB areas whose vmemmap area shares page table entries with
vmemmap entries for a section being hot removed ? Massive HugeTLB alloc
/use/free test cycle using memory just adjacent to a memory hotplug area,
which is always added and removed periodically, should be able to expose
this problem.
'''

Answer: At the time memory is removed, all HugeTLB pages either have been
migrated away or dissolved.  So there is no race between memory hot remove
and free_huge_page_vmemmap().  Therefore, HugeTLB pages inside the hot
remove section is safe.  Let's talk your question "what about other
HugeTLB areas whose vmemmap area shares page table entries with vmemmap
entries for a section being hot removed ?", the question is not
established.  The minimal granularity size of hotplug memory 128MB (on
arm64, 4k base page), any HugeTLB smaller than 128MB is within a section,
then, there is no share PTE page tables between HugeTLB in this section
and ones in other sections and a HugeTLB page could not cross two
sections.  In this case, the section cannot be freed.  Any HugeTLB bigger
than 128MB (section size) whose vmemmap pages is an integer multiple of
2MB (PMD-mapped).  As long as:

  1) HugeTLBs are naturally aligned, power-of-two sizes
  2) The HugeTLB size >= the section size
  3) The HugeTLB size >= the vmemmap leaf mapping size

Then a HugeTLB will not share any leaf page table entries with *anything
else*, but will share intermediate entries.  In this case, at the time
memory is removed, all HugeTLB pages either have been migrated away or
dissolved.  So there is also no race between memory hot remove and
free_huge_page_vmemmap().

2nd concern:
'''
differently, not sure if ptdump would require any synchronization.

Dumping an wrong value is probably okay but crashing because a page table
entry is being freed after ptdump acquired the pointer is bad. On arm64,
ptdump() is protected against hotremove via [get|put]_online_mems().
'''

Answer: The ptdump should be fine since vmemmap_remap_free() only
exchanges PTEs or splits the PMD entry (which means allocating a PTE page
table).  Both operations do not free any page tables (PTE), so ptdump
cannot run into a UAF on any page tables.  The worst case is just dumping
an wrong value.

[1] https://lore.kernel.org/all/20210510030027.56044-1-songmuchun@bytedance.com/
[2] https://lore.kernel.org/all/20210518091826.36937-1-songmuchun@bytedance.com/

[songmuchun@bytedance.com: restructure the code comment inside flush_dcache_page()]
  Link: https://lkml.kernel.org/r/20220414072646.21910-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220331065640.5777-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Tested-by: Barry Song <baohua@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-28 23:16:03 -07:00
Christoph Hellwig c6af2aa9ff swiotlb: make the swiotlb_init interface more useful
Pass a boolean flag to indicate if swiotlb needs to be enabled based on
the addressing needs, and replace the verbose argument with a set of
flags, including one to force enable bounce buffering.

Note that this patch removes the possibility to force xen-swiotlb use
with the swiotlb=force parameter on the command line on x86 (arm and
arm64 never supported that), but this interface will be restored shortly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-18 07:21:11 +02:00
Julia Lawall dd671f16b1 arm64: fix typos in comments
Various spelling mistakes in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220318103729.157574-10-Julia.Lawall@inria.fr
[will: Squashed in 20220318103729.157574-28-Julia.Lawall@inria.fr]
Signed-off-by: Will Deacon <will@kernel.org>
2022-04-04 10:32:50 +01:00
Andrey Konovalov 36c4a73bf8 kasan, arm64: don't tag executable vmalloc allocations
Besides asking vmalloc memory to be executable via the prot argument of
__vmalloc_node_range() (see the previous patch), the kernel can skip that
bit and instead mark memory as executable via set_memory_x().

Once tag-based KASAN modes start tagging vmalloc allocations, executing
code from such allocations will lead to the PC register getting a tag,
which is not tolerated by the kernel.

Generic kernel code typically allocates memory via module_alloc() if it
intends to mark memory as executable.  (On arm64 module_alloc() uses
__vmalloc_node_range() without setting the executable bit).

Thus, reset pointer tags of pointers returned from module_alloc().

However, on arm64 there's an exception: the eBPF subsystem.  Instead of
using module_alloc(), it uses vmalloc() (via bpf_jit_alloc_exec()) to
allocate its JIT region.

Thus, reset pointer tags of pointers returned from bpf_jit_alloc_exec().

Resetting tags for these pointers results in untagged pointers being
passed to set_memory_x().  This causes conflicts in arithmetic checks in
change_memory_common(), as vm_struct->addr pointer returned by
find_vm_area() is tagged.

Reset pointer tag of find_vm_area(addr)->addr in change_memory_common().

Link: https://lkml.kernel.org/r/b7b2595423340cd7d76b770e5d519acf3b72f0ab.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24 19:06:48 -07:00
Linus Torvalds 52deda9551 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "Various misc subsystems, before getting into the post-linux-next
  material.

  41 patches.

  Subsystems affected by this patch series: procfs, misc, core-kernel,
  lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
  taskstats, panic, kcov, resource, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
  Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
  kernel/resource: fix kfree() of bootmem memory again
  kcov: properly handle subsequent mmap calls
  kcov: split ioctl handling into locked and unlocked parts
  panic: move panic_print before kmsg dumpers
  panic: add option to dump all CPUs backtraces in panic_print
  docs: sysctl/kernel: add missing bit to panic_print
  taskstats: remove unneeded dead assignment
  kasan: no need to unset panic_on_warn in end_report()
  ubsan: no need to unset panic_on_warn in ubsan_epilogue()
  panic: unset panic_on_warn inside panic()
  docs: kdump: add scp example to write out the dump file
  docs: kdump: update description about sysfs file system support
  arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
  kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
  cgroup: use irqsave in cgroup_rstat_flush_locked().
  fat: use pointer to simple type in put_user()
  minix: fix bug when opening a file with O_DIRECT
  ...
2022-03-24 14:14:07 -07:00
Jisheng Zhang d339f1584f arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a
check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and
increase compile coverage.

Link: https://lkml.kernel.org/r/20211206160514.2000-5-jszhang@kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23 19:00:34 -07:00
Linus Torvalds 9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds 3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
Anshuman Khandual 16785bd774 mm: merge pte_mkhuge() call into arch_make_huge_pte()
Each call into pte_mkhuge() is invariably followed by
arch_make_huge_pte().  Instead arch_make_huge_pte() can accommodate
pte_mkhuge() at the beginning.  This updates generic fallback stub for
arch_make_huge_pte() and available platforms definitions.  This makes huge
pte creation much cleaner and easier to follow.

Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:04 -07:00
Linus Torvalds 356a1adca8 arm64 updates for 5.18
- Support for including MTE tags in ELF coredumps
 
 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions
 
 - Improvements to kselftests for MTE and fpsimd
 
 - Symbol aliasing and linker script cleanups
 
 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs
 
 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously
 
 - Support for the latest pointer authentication algorithm ("QARMA3")
 
 - Support for the DDR PMU present in the Marvell CN10K platform
 
 - Support for the CPU PMU present in the Apple M1 platform
 
 - Use the RNDR instruction for arch_get_random_{int,long}()
 
 - Update our copy of the Arm optimised string routines for str{n}cmp()
 
 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions
 
 - Workaround for Marvell GICv3 erratum #38545
 
 - Clarification to our Documentation (booting reqs. and MTE prctl())
 
 - Miscellanous cleanups and minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmIvta8QHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNAIhB/oDSva5FryAFExVuIB+mqRkbZO9kj6fy/5J
 ctN9LEVO2GI/U1TVAUWop1lXmP8Kbq5UCZOAuY8sz7dAZs7NRUWkwTrXVhaTpi6L
 oxCfu5Afu76d/TGgivNz+G7/ewIJRFj5zCPmHezLF9iiWPUkcAsP0XCp4a0iOjU4
 04O4d7TL/ap9ujEes+U0oEXHnyDTPrVB2OVE316FKD1fgztcjVJ2U+TxX5O4xitT
 PPIfeQCjQBq1B2OC1cptE3wpP+YEr9OZJbx+Ieweidy1CSInEy0nZ13tLoUnGPGU
 KPhsvO9daUCbhbd5IDRBuXmTi/sHU4NIB8LNEVzT1mUPnU8pCizv
 =ziGg
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:

 - Support for including MTE tags in ELF coredumps

 - Instruction encoder updates, including fixes to 64-bit immediate
   generation and support for the LSE atomic instructions

 - Improvements to kselftests for MTE and fpsimd

 - Symbol aliasing and linker script cleanups

 - Reduce instruction cache maintenance performed for user mappings
   created using contiguous PTEs

 - Support for the new "asymmetric" MTE mode, where stores are checked
   asynchronously but loads are checked synchronously

 - Support for the latest pointer authentication algorithm ("QARMA3")

 - Support for the DDR PMU present in the Marvell CN10K platform

 - Support for the CPU PMU present in the Apple M1 platform

 - Use the RNDR instruction for arch_get_random_{int,long}()

 - Update our copy of the Arm optimised string routines for str{n}cmp()

 - Fix signal frame generation for CPUs which have foolishly elected to
   avoid building in support for the fpsimd instructions

 - Workaround for Marvell GICv3 erratum #38545

 - Clarification to our Documentation (booting reqs. and MTE prctl())

 - Miscellanous cleanups and minor fixes

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
  docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred
  arm64/mte: Remove asymmetric mode from the prctl() interface
  arm64: Add cavium_erratum_23154_cpus missing sentinel
  perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  Documentation: vmcoreinfo: Fix htmldocs warning
  kasan: fix a missing header include of static_keys.h
  drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
  drivers/perf: arm_pmu: Handle 47 bit counters
  arm64: perf: Consistently make all event numbers as 16-bits
  arm64: perf: Expose some Armv9 common events under sysfs
  perf/marvell: cn10k DDR perf event core ownership
  perf/marvell: cn10k DDR perfmon event overflow handling
  perf/marvell: CN10k DDR performance monitor support
  dt-bindings: perf: marvell: cn10k ddr performance monitor
  arm64: clean up tools Makefile
  perf/arm-cmn: Update watchpoint format
  perf/arm-cmn: Hide XP PUB events for CMN-600
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  ...
2022-03-21 10:46:39 -07:00
Will Deacon 641d804157 Merge branch 'for-next/spectre-bhb' into for-next/core
Merge in the latest Spectre mess to fix up conflicts with what was
already queued for 5.18 when the embargo finally lifted.

* for-next/spectre-bhb: (21 commits)
  arm64: Do not include __READ_ONCE() block in assembly files
  arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
  arm64: Use the clearbhb instruction in mitigations
  KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
  arm64: Mitigate spectre style branch history side channels
  arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
  arm64: Add percpu vectors for EL1
  arm64: entry: Add macro for reading symbol addresses from the trampoline
  arm64: entry: Add vectors that have the bhb mitigation sequences
  arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
  arm64: entry: Allow the trampoline text to occupy multiple pages
  arm64: entry: Make the kpti trampoline's kpti sequence optional
  arm64: entry: Move trampoline macros out of ifdef'd section
  arm64: entry: Don't assume tramp_vectors is the start of the vectors
  arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
  arm64: entry: Move the trampoline data page before the text page
  arm64: entry: Free up another register on kpti's tramp_exit path
  arm64: entry: Make the trampoline cleanup optional
  KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
  arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
  ...
2022-03-14 19:08:31 +00:00
Will Deacon 20fd2ed10f Merge branch 'for-next/mm' into for-next/core
* for-next/mm:
  Documentation: vmcoreinfo: Fix htmldocs warning
  arm64/mm: Drop use_1G_block()
  arm64: avoid flushing icache multiple times on contiguous HugeTLB
  arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges
  arm64/hugetlb: Define __hugetlb_valid_size()
  arm64/mm: avoid fixmap race condition when create pud mapping
  arm64/mm: Consolidate TCR_EL1 fields
2022-03-14 19:01:18 +00:00
Will Deacon b3ea0eafa9 Merge branch 'for-next/misc' into for-next/core
* for-next/misc:
  arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
  arm64: clean up tools Makefile
  arm64: drop unused includes of <linux/personality.h>
  arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
  arm64: prevent instrumentation of bp hardening callbacks
  arm64: cpufeature: Remove cpu_has_fwb() check
  arm64: atomics: remove redundant static branch
  arm64: entry: Save some nops when CONFIG_ARM64_PSEUDO_NMI is not set
2022-03-14 19:01:12 +00:00
Will Deacon 563c463595 Merge branch 'for-next/linkage' into for-next/core
* for-next/linkage:
  arm64: module: remove (NOLOAD) from linker script
  linkage: remove SYM_FUNC_{START,END}_ALIAS()
  x86: clean up symbol aliasing
  arm64: clean up symbol aliasing
  linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
2022-03-14 19:01:05 +00:00
Linus Torvalds e7e19defa5 - Fix compilation of eBPF object files that indirectly include
mte-kasan.h.
 
 - Fix test for execute-only permissions with EPAN (Enhanced Privileged
   Access Never, ARMv8.7 feature).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmIoyFYACgkQa9axLQDI
 XvGvuw/+OFBDYhvIY8C845RTzmpjrukTusy7GQcin5XpplBzxr2z6AnGuxN+Fvez
 UJZdzJLocwZRNiNqzdbIC0ycMmtEPKn/QZzGFpmsFs42wQOlztrQx7PjdOCnn0HR
 Mtcd0BTHRAogkPKqfvkuiUqCrkorzQ4ka+EN7TavzxMEfegzqBsZk5r9eE7xgGvc
 KLPmz9pFB3K3dFfUhfneHdWrPwERrCjk8ygT3Ia9Sg3UcyT7jzNGOtXBAOLgVuXY
 w/0z32H1TIBbmIVgakXHE0XqXmh5Z53zPO6T2wsOJNEVbHTnLbq1aRcbw2K5dvWc
 hoSZWharQ72yWn8VHu8w3zropNHiSdCSYBIK3jeVzh4edxCvuRmPuTk2g9oDoSUp
 zVHVA8v5GeGHZdJ2Jk5mPK/mRlwN/GbRg4lhhUhkglx9mWaAdE9j8ouGQPSXFjbr
 J3rsVxqYb2948IHz5WOlXJc2baVf9MVS49yZI03cFWyBl1FMTYMDcDkQc0EtM7J2
 Z/VMc6r+22vW/IFKmyCqxJbQh+BnO5X5HS6+1r08uoMYvyynV+ua7MO7qaVI+6cX
 zFbSfkGkyGCOdJGng7BrlmVABeO0VQqb3rsL1OEiYqOm45ekiwM99HiodxaUkC0K
 mlbDxslBf8ei2XzaPz1bg8T9gov19PmJ38NaYmUDWy59mW/ryOM=
 =qWQy
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Fix compilation of eBPF object files that indirectly include
   mte-kasan.h.

 - Fix test for execute-only permissions with EPAN (Enhanced Privileged
   Access Never, ARMv8.7 feature).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kasan: fix include error in MTE functions
  arm64: Ensure execute-only permissions are not allowed without EPAN
2022-03-09 12:59:21 -08:00
Will Deacon 770093459b arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition
Commit 031495635b ("arm64: Do not defer reserve_crashkernel() for
platforms with no DMA memory zones") introduced different definitions
for 'arm64_dma_phys_limit' depending on CONFIG_ZONE_DMA{,32} based on
a late suggestion from Pasha. Sadly, this results in a build error when
passing W=1:

  | arch/arm64/mm/init.c:90:19: error: conflicting type qualifiers for 'arm64_dma_phys_limit'

Drop the 'const' for now and use '__ro_after_init' consistently.

Link: https://lore.kernel.org/r/202203090241.aj7paWeX-lkp@intel.com
Link: https://lore.kernel.org/r/CA+CK2bDbbx=8R=UthkMesWOST8eJMtOGJdfMRTFSwVmo0Vn0EA@mail.gmail.com
Fixes: 031495635b ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones")
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-09 12:21:37 +00:00
Vijay Balakrishna 031495635b arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones
The following patches resulted in deferring crash kernel reservation to
mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU),
in particular Raspberry Pi 4.

commit 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32")
commit 8424ecdde7 ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges")
commit 0a30c53573 ("arm64: mm: Move reserve_crashkernel() into mem_init()")
commit 2687275a58 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required")

Above changes introduced boot slowdown due to linear map creation for
all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1].  The proposed
changes restore crash kernel reservation to earlier behavior thus avoids
slow boot, particularly for platforms with IOMMU (no DMA memory zones).

Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU
and 8GB memory.  Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm
no regression to deferring scheme of crash kernel memory reservation.
In both cases successfully collected kernel crash dump.

[1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/

Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com
[will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build]
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-08 10:22:33 +00:00
Catalin Marinas 6e2edd6371 arm64: Ensure execute-only permissions are not allowed without EPAN
Commit 18107f8a2d ("arm64: Support execute-only permissions with
Enhanced PAN") re-introduced execute-only permissions when EPAN is
available. When EPAN is not available, arch_filter_pgprot() is supposed
to change a PAGE_EXECONLY permission into PAGE_READONLY_EXEC. However,
if BTI or MTE are present, such check does not detect the execute-only
pgprot in the presence of PTE_GP (BTI) or MT_NORMAL_TAGGED (MTE),
allowing the user to request PROT_EXEC with PROT_BTI or PROT_MTE.

Remove the arch_filter_pgprot() function, change the default VM_EXEC
permissions to PAGE_READONLY_EXEC and update the protection_map[] array
at core_initcall() if EPAN is detected.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 18107f8a2d ("arm64: Support execute-only permissions with Enhanced PAN")
Cc: <stable@vger.kernel.org> # 5.13.x
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
2022-03-08 10:03:51 +00:00
Anshuman Khandual 1310222c27 arm64/mm: Drop use_1G_block()
pud_sect_supported() already checks for PUD level block mapping support i.e
on ARM64_4K_PAGES config. Hence pud_sect_supported(), along with some other
required alignment checks can help completely drop use_1G_block().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1644988012-25455-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-07 21:44:22 +00:00
Muchun Song cf5a501d98 arm64: avoid flushing icache multiple times on contiguous HugeTLB
When a contiguous HugeTLB page is mapped, set_pte_at() will be called
CONT_PTES/CONT_PMDS times.  Therefore, __sync_icache_dcache() will
flush cache multiple times if the page is executable (to ensure
the I-D cache coherency).  However, the first flushing cache already
covers subsequent cache flush operations.  So only flusing cache
for the head page if it is a HugeTLB page to avoid redundant cache
flushing.  In the next patch, it is also depends on this change
since the tail vmemmap pages of HugeTLB is mapped with read-only
meanning only head page struct can be modified.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220302084624.33340-1-songmuchun@bytedance.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-03-07 21:42:34 +00:00
Christoph Hellwig dc90f0846d mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.

Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03 12:47:33 -05:00
Anshuman Khandual a8a733b201 arm64/hugetlb: Define __hugetlb_valid_size()
arch_hugetlb_valid_size() can be just factored out to create another helper
to be used in arch_hugetlb_migration_supported() as well. This just defines
__hugetlb_valid_size() for that purpose.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1645073557-6150-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 22:11:54 +00:00
Mark Rutland 0f61f6be1f arm64: clean up symbol aliasing
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those
to simplify and more consistently define function aliases across
arch/arm64.

Aliases are now defined in terms of a canonical function name. For
position-independent functions I've made the __pi_<func> name the
canonical name, and defined other alises in terms of this.

The SYM_FUNC_{START,END}_PI(func) macros obscure the __pi_<func> name,
and make this hard to seatch for. The SYM_FUNC_START_WEAK_PI() macro
also obscures the fact that the __pi_<func> fymbol is global and the
<func> symbol is weak. For clarity, I have removed these macros and used
SYM_FUNC_{START,END}() directly with the __pi_<func> name.

For example:

	SYM_FUNC_START_WEAK_PI(func)
	... asm insns ...
	SYM_FUNC_END_PI(func)
	EXPORT_SYMBOL(func)

... becomes:

	SYM_FUNC_START(__pi_func)
	... asm insns ...
	SYM_FUNC_END(__pi_func)

	SYM_FUNC_ALIAS_WEAK(func, __pi_func)
	EXPORT_SYMBOL(func)

For clarity, where there are multiple annotations such as
EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For
example, where a function has a name and an alias which are both
exported, this is organised as:

	SYM_FUNC_START(func)
	... asm insns ...
	SYM_FUNC_END(func)
	EXPORT_SYMBOL(func)

	SYM_FUNC_ALIAS(alias, func)
	EXPORT_SYMBOL(alias)

For consistency with the other string functions, I've defined strrchr as
a position-independent function, as it can safely be used as such even
though we have no users today.

As we no longer use SYM_FUNC_{START,END}_ALIAS(), our local copies are
removed. The common versions will be removed by a subsequent patch.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22 16:21:34 +00:00
Catalin Marinas ab1e435ca7 arm64: mte: Define the number of bytes for storing the tags in a page
Rather than explicitly calculating the number of bytes for a compact tag
storage format corresponding to a page, just add a MTE_PAGE_TAG_STORAGE
macro. With the current MTE implementation of 4 bits per tag, we store
2 tags in a byte.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Luis Machado <luis.machado@linaro.org>
Link: https://lore.kernel.org/r/20220131165456.2160675-4-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-02-15 22:53:29 +00:00