mirror of https://gitee.com/openkylin/linux.git
6149 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Linus Torvalds | 63e30271b0 |
PCI changes for the v4.6 merge window:
Enumeration Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas) Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas Resource management Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas) Don't assign or reassign immutable resources (Bjorn Helgaas) Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas) Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas) Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas) ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas) ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas) Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas) rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) Virtualization Wait for up to 1000ms after FLR reset (Alex Williamson) Support SR-IOV on any function type (Kelly Zytaruk) Add ACS quirk for all Cavium devices (Manish Jaggi) AER Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas) Restore pci_ops pointer while calling original pci_ops (David Daney) Fix aer_inject error codes (Jean Delvare) Use dev_warn() in aer_inject (Jean Delvare) Log actual error causes in aer_inject (Jean Delvare) Log aer_inject error injections (Jean Delvare) VPD Prevent VPD access for buggy devices (Babu Moger) Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas) Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas) Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas) Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas) Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas) Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas) Update VPD definitions (Hannes Reinecke) Allow access to VPD attributes with size 0 (Hannes Reinecke) Determine actual VPD size on first access (Hannes Reinecke) Generic host bridge driver Move structure definitions to separate header file (David Daney) Add pci_host_common_probe(), based on gen_pci_probe() (David Daney) Expose pci_host_common_probe() for use by other drivers (David Daney) Altera host bridge driver Fix altera_pcie_link_is_up() (Ley Foon Tan) Cavium ThunderX host bridge driver Add PCIe host driver for ThunderX processors (David Daney) Add driver for ThunderX-pass{1,2} on-chip devices (David Daney) Freescale i.MX6 host bridge driver Add DT bindings to configure PHY Tx driver settings (Justin Waters) Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach) Move PHY reset into imx6_pcie_establish_link() (Lucas Stach) Remove broken Gen2 workaround (Lucas Stach) Move link up check into imx6_pcie_wait_for_link() (Lucas Stach) Freescale Layerscape host bridge driver Add "fsl,ls2085a-pcie" compatible ID (Yang Shi) Intel VMD host bridge driver Attach VMD resources to parent domain's resource tree (Jon Derrick) Set bus resource start to 0 (Keith Busch) Microsoft Hyper-V host bridge driver Add fwnode_handle to x86 pci_sysdata (Jake Oshins) Look up IRQ domain by fwnode_handle (Jake Oshins) Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins) NVIDIA Tegra host bridge driver Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding) Implement ->{add,remove}_bus() callbacks (Thierry Reding) Remove unused struct tegra_pcie.num_ports field (Thierry Reding) Track bus -> CPU mapping (Thierry Reding) Remove misleading PHYS_OFFSET (Thierry Reding) Renesas R-Car host bridge driver Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman) Synopsys DesignWare host bridge driver ARC: Add PCI support (Joao Pinto) Add generic dw_pcie_wait_for_link() (Joao Pinto) Add default link up check if sub-driver doesn't override (Joao Pinto) Add driver for prototyping kits based on ARC SDP (Joao Pinto) TI Keystone host bridge driver Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin) Xilinx AXI host bridge driver Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada) Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada) Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada) Update Zynq binding with Microblaze node (Bharat Kumar Gogada) microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada) Xilinx NWL host bridge driver Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada) Miscellaneous Check device_attach() return value always (Bjorn Helgaas) Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas) Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas) ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas) Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas) Remove includes of asm/pci-bridge.h (Bjorn Helgaas) Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas) unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas) Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler) Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas) Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa) frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig) Move pci_dma_* helpers to common code (Christoph Hellwig) Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus) Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson) Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW6XgMAAoJEFmIoMA60/r8Yq4P/1nNwwZPikU+9Z8k0HyGPll6 vqXBOYj/wlbAxJTzH2weaoyUamFrwvsKaO3Vap3xHkAeTFPD/Dp0TipCCNMrZ82Z j1y83JJpenkRyX6ifLARCNYpOtvnvgzSrO9x7Sb2Xfqb64dPb7+jGAfOpGNzhKsO n1nj/L7RGx8Q6fNFGf8ANMXKTsdkdL+1pdwegjUXmD5WdOT+oW8DmqVbhyfSKwl0 E8r4Ml2lIg7Qd5Wu5iKMIBsR0+5HEyrwV7ch92wXChwKfoRwG70qnn7FGdc0y5ZB XvJuj8UD5UeMxEUeoRa9SwU6wWQT3Q9e6BzMS+P+43z36SPYjMfy/Xffv054z/bY rQomLjuGxNLESpmfNK5JfKxWoe2YNXjHQIDWMrAHyNlwdKJbYiwPcxnZJhvOa/eB p0QYcGS7O43STjibG9PZhzeq8tuSJRshxi0W6iB9QlqO8qs8nJQxIO+sZj/vl4yz lSnswWcV9062KITl8Fe9xDw244/RTz1xSVCdldlSoDhJyeMOjRvzS8raUMyyVmbA YULsI3l2iCl+fwDm/T21o7hJG966oYdAmgEv7lc7BWfgEAMg//LZXvMzVvrPFB2D R77u/0idtOciVJrmnO/x9DnQO2hzro9SLmVH6m0+0YU4wSSpZfGn98PCrtkatOAU c8zT9dJgyJVE3Z7cnPJ4 =otsF -----END PGP SIGNATURE----- Merge tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for v4.6: Enumeration: - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas) - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas Resource management: - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas) - Don't assign or reassign immutable resources (Bjorn Helgaas) - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas) - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas) - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas) - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas) - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas) - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas) - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) Virtualization: - Wait for up to 1000ms after FLR reset (Alex Williamson) - Support SR-IOV on any function type (Kelly Zytaruk) - Add ACS quirk for all Cavium devices (Manish Jaggi) AER: - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas) - Restore pci_ops pointer while calling original pci_ops (David Daney) - Fix aer_inject error codes (Jean Delvare) - Use dev_warn() in aer_inject (Jean Delvare) - Log actual error causes in aer_inject (Jean Delvare) - Log aer_inject error injections (Jean Delvare) VPD: - Prevent VPD access for buggy devices (Babu Moger) - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas) - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas) - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas) - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas) - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas) - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas) - Update VPD definitions (Hannes Reinecke) - Allow access to VPD attributes with size 0 (Hannes Reinecke) - Determine actual VPD size on first access (Hannes Reinecke) Generic host bridge driver: - Move structure definitions to separate header file (David Daney) - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney) - Expose pci_host_common_probe() for use by other drivers (David Daney) Altera host bridge driver: - Fix altera_pcie_link_is_up() (Ley Foon Tan) Cavium ThunderX host bridge driver: - Add PCIe host driver for ThunderX processors (David Daney) - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney) Freescale i.MX6 host bridge driver: - Add DT bindings to configure PHY Tx driver settings (Justin Waters) - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach) - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach) - Remove broken Gen2 workaround (Lucas Stach) - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach) Freescale Layerscape host bridge driver: - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi) Intel VMD host bridge driver: - Attach VMD resources to parent domain's resource tree (Jon Derrick) - Set bus resource start to 0 (Keith Busch) Microsoft Hyper-V host bridge driver: - Add fwnode_handle to x86 pci_sysdata (Jake Oshins) - Look up IRQ domain by fwnode_handle (Jake Oshins) - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins) NVIDIA Tegra host bridge driver: - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding) - Implement ->{add,remove}_bus() callbacks (Thierry Reding) - Remove unused struct tegra_pcie.num_ports field (Thierry Reding) - Track bus -> CPU mapping (Thierry Reding) - Remove misleading PHYS_OFFSET (Thierry Reding) Renesas R-Car host bridge driver: - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman) Synopsys DesignWare host bridge driver: - ARC: Add PCI support (Joao Pinto) - Add generic dw_pcie_wait_for_link() (Joao Pinto) - Add default link up check if sub-driver doesn't override (Joao Pinto) - Add driver for prototyping kits based on ARC SDP (Joao Pinto) TI Keystone host bridge driver: - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin) Xilinx AXI host bridge driver: - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada) - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada) - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada) - Update Zynq binding with Microblaze node (Bharat Kumar Gogada) - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada) Xilinx NWL host bridge driver: - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada) Miscellaneous: - Check device_attach() return value always (Bjorn Helgaas) - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas) - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas) - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas) - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas) - Remove includes of asm/pci-bridge.h (Bjorn Helgaas) - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas) - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas) - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler) - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas) - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa) - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig) - Move pci_dma_* helpers to common code (Christoph Hellwig) - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus) - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson) - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)" * tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits) PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition PCI: designware: Add driver for prototyping kits based on ARC SDP PCI: designware: Add default link up check if sub-driver doesn't override PCI: designware: Add generic dw_pcie_wait_for_link() PCI: Cleanup pci/pcie/Kconfig whitespace PCI: Simplify pci_create_attr() control flow PCI: Don't leak memory if sysfs_create_bin_file() fails PCI: Simplify sysfs ROM cleanup PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource MIPS: Loongson 3: Use temporary struct resource * to avoid repetition ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource ia64/PCI: Use ioremap() instead of open-coded equivalent ia64/PCI: Use temporary struct resource * to avoid repetition PCI: Clean up pci_map_rom() whitespace PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices PCI: thunder: Add PCIe host driver for ThunderX processors PCI: generic: Expose pci_host_common_probe() for use by other drivers PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe() ... |
|
Linus Torvalds | 277edbabf6 |
Power management and ACPI material for v4.6-rc1, part 1
- Redesign of cpufreq governors and the intel_pstate driver to make them use callbacks invoked by the scheduler to trigger CPU frequency evaluation instead of using per-CPU deferrable timers for that purpose (Rafael Wysocki). - Reorganization and cleanup of cpufreq governor code to make it more straightforward and fix some concurrency problems in it (Rafael Wysocki, Viresh Kumar). - Cleanup and improvements of locking in the cpufreq core (Viresh Kumar). - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh Kumar, Eric Biggers). - intel_pstate driver updates including fixes, optimizations and a modification to make it enable enable hardware-coordinated P-state selection (HWP) by default if supported by the processor (Philippe Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe Franciosi). - Operating Performance Points (OPP) framework updates to improve its handling of voltage regulators and device clocks and updates of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter). - Updates of the powernv cpufreq driver to fix initialization and cleanup problems in it and correct its worker thread handling with respect to CPU offline, new powernv_throttle tracepoint (Shilpasri Bhat). - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki). - ACPICA updates including one fix for a regression introduced by previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box, Colin Ian King). - Support for installing ACPI tables from initrd (Lv Zheng). - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin Chaugule). - Support for _HID(ACPI0010) devices (ACPI processor containers) and ACPI processor driver cleanups (Sudeep Holla). - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory, Aleksey Makarov). - Modification of the ACPI PCI IRQ management code to make it treat 255 in the Interrupt Line register as "not connected" on x86 (as per the specification) and avoid attempts to use that value as a valid interrupt vector (Chen Fan). - ACPI APEI fixes related to resource leaks (Josh Hunt). - Removal of modularity from a few ACPI drivers (BGRT, GHES, intel_pmic_crc) that cannot be built as modules in practice (Paul Gortmaker). - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid resource type (Harb Abdulhamid). - New device ID (future AMD I2C controller) in the ACPI driver for AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu). - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin). - cpuidle menu governor optimization to avoid a square root computation in it (Rasmus Villemoes). - Fix for potential use-after-free in the generic device properties framework (Heikki Krogerus). - Updates of the generic power domains (genpd) framework including support for multiple power states of a domain, fixes and debugfs output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart, Geert Uytterhoeven). - Intel RAPL power capping driver updates to reduce IPI overhead in it (Jacob Pan). - System suspend/hibernation code cleanups (Eric Biggers, Saurabh Sengar). - Year 2038 fix for the process freezer (Abhilash Jindal). - turbostat utility updates including new features (decoding of more registers and CPUID fields, sub-second intervals support, GFX MHz and RC6 printout, --out command line option), fixes (syscall jitter detection and workaround, reductioin of the number of syscalls made, fixes related to Xeon x200 processors, compiler warning fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+ nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN WB+itPc4Fw0YHOrAFsrx =cfty -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "This time the majority of changes go into cpufreq and they are significant. First off, the way CPU frequency updates are triggered is different now. Instead of having to set up and manage a deferrable timer for each CPU in the system to evaluate and possibly change its frequency periodically, cpufreq governors set up callbacks to be invoked by the scheduler on a regular basis (basically on utilization updates). The "old" governors, "ondemand" and "conservative", still do all of their work in process context (although that is triggered by the scheduler now), but intel_pstate does it all in the callback invoked by the scheduler with no need for any additional asynchronous processing. Of course, this eliminates the overhead related to the management of all those timers, but also it allows the cpufreq governor code to be simplified quite a bit. On top of that, the common code and data structures used by the "ondemand" and "conservative" governors are cleaned up and made more straightforward and some long-standing and quite annoying problems are addressed. In particular, the handling of governor sysfs attributes is modified and the related locking becomes more fine grained which allows some concurrency problems to be avoided (particularly deadlocks with the core cpufreq code). In principle, the new mechanism for triggering frequency updates allows utilization information to be passed from the scheduler to cpufreq. Although the current code doesn't make use of it, in the works is a new cpufreq governor that will make decisions based on the scheduler's utilization data. That should allow the scheduler and cpufreq to work more closely together in the long run. In addition to the core and governor changes, cpufreq drivers are updated too. Fixes and optimizations go into intel_pstate, the cpufreq-dt driver is updated on top of some modification in the Operating Performance Points (OPP) framework and there are fixes and other updates in the powernv cpufreq driver. Apart from the cpufreq updates there is some new ACPICA material, including a fix for a problem introduced by previous ACPICA updates, and some less significant changes in the ACPI code, like CPPC code optimizations, ACPI processor driver cleanups and support for loading ACPI tables from initrd. Also updated are the generic power domains framework, the Intel RAPL power capping driver and the turbostat utility and we have a bunch of traditional assorted fixes and cleanups. Specifics: - Redesign of cpufreq governors and the intel_pstate driver to make them use callbacks invoked by the scheduler to trigger CPU frequency evaluation instead of using per-CPU deferrable timers for that purpose (Rafael Wysocki). - Reorganization and cleanup of cpufreq governor code to make it more straightforward and fix some concurrency problems in it (Rafael Wysocki, Viresh Kumar). - Cleanup and improvements of locking in the cpufreq core (Viresh Kumar). - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh Kumar, Eric Biggers). - intel_pstate driver updates including fixes, optimizations and a modification to make it enable enable hardware-coordinated P-state selection (HWP) by default if supported by the processor (Philippe Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe Franciosi). - Operating Performance Points (OPP) framework updates to improve its handling of voltage regulators and device clocks and updates of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter). - Updates of the powernv cpufreq driver to fix initialization and cleanup problems in it and correct its worker thread handling with respect to CPU offline, new powernv_throttle tracepoint (Shilpasri Bhat). - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki). - ACPICA updates including one fix for a regression introduced by previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box, Colin Ian King). - Support for installing ACPI tables from initrd (Lv Zheng). - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin Chaugule). - Support for _HID(ACPI0010) devices (ACPI processor containers) and ACPI processor driver cleanups (Sudeep Holla). - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory, Aleksey Makarov). - Modification of the ACPI PCI IRQ management code to make it treat 255 in the Interrupt Line register as "not connected" on x86 (as per the specification) and avoid attempts to use that value as a valid interrupt vector (Chen Fan). - ACPI APEI fixes related to resource leaks (Josh Hunt). - Removal of modularity from a few ACPI drivers (BGRT, GHES, intel_pmic_crc) that cannot be built as modules in practice (Paul Gortmaker). - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid resource type (Harb Abdulhamid). - New device ID (future AMD I2C controller) in the ACPI driver for AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu). - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin). - cpuidle menu governor optimization to avoid a square root computation in it (Rasmus Villemoes). - Fix for potential use-after-free in the generic device properties framework (Heikki Krogerus). - Updates of the generic power domains (genpd) framework including support for multiple power states of a domain, fixes and debugfs output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart, Geert Uytterhoeven). - Intel RAPL power capping driver updates to reduce IPI overhead in it (Jacob Pan). - System suspend/hibernation code cleanups (Eric Biggers, Saurabh Sengar). - Year 2038 fix for the process freezer (Abhilash Jindal). - turbostat utility updates including new features (decoding of more registers and CPUID fields, sub-second intervals support, GFX MHz and RC6 printout, --out command line option), fixes (syscall jitter detection and workaround, reductioin of the number of syscalls made, fixes related to Xeon x200 processors, compiler warning fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)" * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits) tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals ACPI / APEI: ERST: Fixed leaked resources in erst_init ACPI / APEI: Fix leaked resources intel_pstate: Do not skip samples partially intel_pstate: Remove freq calculation from intel_pstate_calc_busy() intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() ... |
|
Linus Torvalds | 10dc374766 |
One of the largest releases for KVM... Hardly any generic improvement,
but lots of architecture-specific changes. * ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. * PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). * s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. * x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory---currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0 yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII= =OUZZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ... |
|
Tony Luck | cbf8b5a2b6 |
x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
Returning a 'bool' was very unpopular. Doubly so because the
code was just wrong (returning zero for true, one for false;
great for shell programming, not so good for C).
Change return type to "int". Keep zero as the success indicator
because it matches other similar code and people may be more
comfortable writing:
if (memcpy_mcsafe(to, from, count)) {
printk("Sad panda, copy failed\n");
...
}
Make the failure return value -EFAULT for now.
Reported by: Mika Penttilä <mika.penttila@nextfour.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mika.penttila@nextfour.com
Fixes:
|
|
Linus Torvalds | df2e37c814 |
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner: "The 4.6 pile of irq updates contains: - Support for IPI irqdomains to support proper integration of IPIs to and from coprocessors. The first user of this new facility is MIPS. The relevant MIPS patches come with the core to avoid merge ordering issues and have been acked by Ralf. - A new command line option to set the default interrupt affinity mask at boot time. - Support for some more new ARM and MIPS interrupt controllers: tango, alpine-msix and bcm6345-l1 - Two small cleanups for x86/apic which we merged into irq/core to avoid yet another branch in x86 with two tiny commits. - The usual set of updates, cleanups in drivers/irqchip. Mostly in the area of ARM-GIC, arada-37-xp and atmel chips. Nothing outstanding here" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits) irqchip/irq-alpine-msi: Release the correct domain on error irqchip/mxs: Fix error check of of_io_request_and_map() irqchip/sunxi-nmi: Fix error check of of_io_request_and_map() genirq: Export IRQ functions for module use irqchip/gic/realview: Support more RealView DCC variants Documentation/bindings: Document the Alpine MSIX driver irqchip: Add the Alpine MSIX interrupt controller irqchip/gic-v3: Always return IRQ_SET_MASK_OK_DONE in gic_set_affinity irqchip/gic-v3-its: Mark its_init() and its children as __init irqchip/gic-v3: Remove gic_root_node variable from the ITS code irqchip/gic-v3: ACPI: Add redistributor support via GICC structures irqchip/gic-v3: Add ACPI support for GICv3/4 initialization irqchip/gic-v3: Refactor gic_of_init() for GICv3 driver x86/apic: Deinline _flat_send_IPI_mask, save ~150 bytes x86/apic: Deinline __default_send_IPI_*, save ~200 bytes dt-bindings: interrupt-controller: Add SoC-specific compatible string to Marvell ODMI irqchip/mips-gic: Add new DT property to reserve IPIs MIPS: Delete smp-gic.c MIPS: Make smp CMP, CPS and MT use the new generic IPI functions MIPS: Add generic SMP IPI support ... |
|
Linus Torvalds | 8a284c062e |
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner: "The timer department delivers this time: - Support for cross clock domain timestamps in the core code plus a first user. That allows more precise timestamping for PTP and later for audio and other peripherals. The ptp/e1000e patches have been acked by the relevant maintainers and are carried in the timer tree to avoid merge ordering issues. - Support for unregistering the current clocksource watchdog. That lifts a limitation for switching clocksources which has been there from day 1 - The usual pile of fixes and updates to the core and the drivers. Nothing outstanding and exciting" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) time/timekeeping: Work around false positive GCC warning e1000e: Adds hardware supported cross timestamp on e1000e nic ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping x86/tsc: Always Running Timer (ART) correlated clocksource hrtimer: Revert CLOCK_MONOTONIC_RAW support time: Add history to cross timestamp interface supporting slower devices time: Add driver cross timestamp interface for higher precision time synchronization time: Remove duplicated code in ktime_get_raw_and_real() time: Add timekeeping snapshot code capturing system time and counter time: Add cycles to nanoseconds translation jiffies: Use CLOCKSOURCE_MASK instead of constant clocksource: Introduce clocksource_freq2mult() clockevents/drivers/exynos_mct: Implement ->set_state_oneshot_stopped() clockevents/drivers/arm_global_timer: Implement ->set_state_oneshot_stopped() clockevents/drivers/arm_arch_timer: Implement ->set_state_oneshot_stopped() clocksource/drivers/arm_global_timer: Register delay timer clocksource/drivers/lpc32xx: Support timer-based ARM delay clocksource/drivers/lpc32xx: Support periodic mode clocksource/drivers/lpc32xx: Don't use the prescaler counter for clockevents clocksource/drivers/rockchip: Add err handle for rk_timer_init ... |
|
Linus Torvalds | 8ab84ef699 |
Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core platform updates from Ingo Molnar: "Intel Quark and Geode SoC platform updates" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/intel/quark: Drop IMR lock bit support x86/platform/intel/mid: Remove dead code x86/platform: Make platform/geode/net5501.c explicitly non-modular x86/platform: Make platform/geode/alix.c explicitly non-modular x86/platform: Make platform/geode/geos.c explicitly non-modular x86/platform: Make platform/intel-quark/imr_selftest.c explicitly non-modular x86/platform: Make platform/intel-quark/imr.c explicitly non-modular |
|
Linus Torvalds | 13c76ad872 |
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - Enable full ASLR randomization for 32-bit programs (Hector Marco-Gisbert) - Add initial minimal INVPCI support, to flush global mappings (Andy Lutomirski) - Add KASAN enhancements (Andrey Ryabinin) - Fix mmiotrace for huge pages (Karol Herbst) - ... misc cleanups and small enhancements" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/32: Enable full randomization on i386 and X86_32 x86/mm/kmmio: Fix mmiotrace for hugepages x86/mm: Avoid premature success when changing page attributes x86/mm/ptdump: Remove paravirt_enabled() x86/mm: Fix INVPCID asm constraint x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache() x86/mm: If INVPCID is available, use it to flush global mappings x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID x86/mm: Add INVPCID helpers x86/kasan: Write protect kasan zero shadow x86/kasan: Clear kasan_zero_page after TLB flush x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug() x86/mm/numa: Clean up numa_clear_kernel_node_hotplug() x86/mm: Make kmap_prot into a #define x86/mm/32: Set NX in __supported_pte_mask before enabling paging x86/mm: Streamline and restore probe_memory_block_size() |
|
Linus Torvalds | 9cf8d6360c |
Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Ingo Molnar: "The biggest change in this cycle was the separation of the microcode loading mechanism from the initrd code plus the support of built-in microcode images. There were also lots cleanups and general restructuring (by Borislav Petkov)" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/microcode/intel: Drop orig_sum from ext signature checksum x86/microcode/intel: Improve microcode sanity-checking error messages x86/microcode/intel: Merge two consecutive if-statements x86/microcode/intel: Get rid of DWSIZE x86/microcode/intel: Change checksum variables to u32 x86/microcode: Use kmemdup() rather than duplicating its implementation x86/microcode: Remove unnecessary paravirt_enabled check x86/microcode: Document builtin microcode loading method x86/microcode/AMD: Issue microcode updated message later x86/microcode/intel: Cleanup get_matching_model_microcode() x86/microcode/intel: Remove unused arg of get_matching_model_microcode() x86/microcode/intel: Rename mc_saved_in_initrd x86/microcode/intel: Use *wrmsrl variants x86/microcode/intel: Cleanup apply_microcode_intel() x86/microcode/intel: Move the BUG_ON up and turn it into WARN_ON x86/microcode/intel: Rename mc_intel variable to mc x86/microcode/intel: Rename mc_saved_count to num_saved x86/microcode/intel: Rename local variables of type struct mc_saved_data x86/microcode/AMD: Drop redundant printk prefix x86/microcode: Issue update message only once ... |
|
Linus Torvalds | ecc026bff6 |
Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar: "The biggest change in terms of impact is the changing of the FPU context switch model to 'eagerfpu' for all CPU types, via: commit 58122bf1d856: "x86/fpu: Default eagerfpu=on on all CPUs" This makes all FPU saves and restores synchronous and makes the FPU code a lot more obvious to read. In the next cycle, if this change is problem free, we'll remove the old lazy FPU restore code altogether. This change flushed out some old bugs, which should all be fixed by now, BYMMV" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Default eagerfpu=on on all CPUs x86/fpu: Speed up lazy FPU restores slightly x86/fpu: Fold fpu_copy() into fpu__copy() x86/fpu: Fix FNSAVE usage in eagerfpu mode x86/fpu: Fix math emulation in eager fpu mode |
|
Linus Torvalds | ba33ea811e |
Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar: "This is another big update. Main changes are: - lots of x86 system call (and other traps/exceptions) entry code enhancements. In particular the complex parts of the 64-bit entry code have been migrated to C code as well, and a number of dusty corners have been refreshed. (Andy Lutomirski) - vDSO special mapping robustification and general cleanups (Andy Lutomirski) - cpufeature refactoring, cleanups and speedups (Borislav Petkov) - lots of other changes ..." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/cpufeature: Enable new AVX-512 features x86/entry/traps: Show unhandled signal for i386 in do_trap() x86/entry: Call enter_from_user_mode() with IRQs off x86/entry/32: Change INT80 to be an interrupt gate x86/entry: Improve system call entry comments x86/entry: Remove TIF_SINGLESTEP entry work x86/entry/32: Add and check a stack canary for the SYSENTER stack x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed x86/entry: Vastly simplify SYSENTER TF (single-step) handling x86/entry/traps: Clear DR6 early in do_debug() and improve the comment x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/entry/32: Restore FLAGS on SYSEXIT x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test selftests/x86: In syscall_nt, test NT|TF as well x86/asm-offsets: Remove PARAVIRT_enabled x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled uprobes: __create_xol_area() must nullify xol_mapping.fault x86/cpufeature: Create a new synthetic cpu capability for machine check recovery ... |
|
Bjorn Helgaas | cfeb8139a1 |
Merge branch 'pci/host-hv' into next
* pci/host-hv: PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs PCI: Look up IRQ domain by fwnode_handle PCI: Add fwnode_handle to x86 pci_sysdata |
|
Linus Torvalds | d88bfe1d68 |
Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call |
|
Linus Torvalds | e71c2c1eeb |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Main kernel side changes: - Big reorganization of the x86 perf support code. The old code grew organically deep inside arch/x86/kernel/cpu/perf* and its naming became somewhat messy. The new location is under arch/x86/events/, using the following cleaner hierarchy of source code files: perf/x86: Move perf_event.c .................. => x86/events/core.c perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch] perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch] perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch] perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c (Borislav Petkov) - Update various x86 PMU constraint and hw support details (Stephane Eranian) - Optimize kprobes for BPF execution (Martin KaFai Lau) - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas Gleixner) - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner) - Various fixes and smaller cleanups. There are lots of perf tooling updates as well. A few highlights: perf report/top: - Hierarchy histogram mode for 'perf top' and 'perf report', showing multiple levels, one per --sort entry: (Namhyung Kim) On a mostly idle system: # perf top --hierarchy -s comm,dso Then expand some levels and use 'P' to take a snapshot: # cat perf.hist.0 - 92.32% perf 58.20% perf 22.29% libc-2.22.so 5.97% [kernel] 4.18% libelf-0.165.so 1.69% [unknown] - 4.71% qemu-system-x86 3.10% [kernel] 1.60% qemu-system-x86_64 (deleted) + 2.97% swapper # - Add 'L' hotkey to dynamicly set the percent threshold for histogram entries and callchains, i.e. dynamicly do what the --percent-limit command line option to 'top' and 'report' does. (Namhyung Kim) perf mem: - Allow specifying events via -e in 'perf mem record', also listing what events can be specified via 'perf mem record -e list' (Jiri Olsa) perf record: - Add 'perf record' --all-user/--all-kernel options, so that one can tell that all the events in the command line should be restricted to the user or kernel levels (Jiri Olsa), i.e.: perf record -e cycles:u,instructions:u is equivalent to: perf record --all-user -e cycles,instructions - Make 'perf record' collect CPU cache info in the perf.data file header: $ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ] $ perf report --header-only -I | tail -10 | head -8 # CPU cache info: # L1 Data 32K [0-1] # L1 Instruction 32K [0-1] # L1 Data 32K [2-3] # L1 Instruction 32K [2-3] # L2 Unified 256K [0-1] # L2 Unified 256K [2-3] # L3 Unified 4096K [0-3] Will be used in 'perf c2c' and eventually in 'perf diff' to allow, for instance running the same workload in multiple machines and then when using 'diff' show the hardware difference. (Jiri Olsa) - Improved support for Java, using the JVMTI agent library to do jitdumps that then will be inserted in synthesized PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized ELF files stored in ~/.debug and keyed with build-ids, to allow symbol resolution and even annotation with source line info, see the changeset comments to see how to use it (Stephane Eranian) perf script/trace: - Decode data_src values (e.g. perf.data files generated by 'perf mem record') in 'perf script': (Jiri Olsa) # perf script perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Improve support to 'data_src', 'weight' and 'addr' fields in 'perf script' (Jiri Olsa) - Handle empty print fmts in 'perf script -s' i.e. when running python or perl scripts (Taeung Song) perf stat: - 'perf stat' now shows shadow metrics (insn per cycle, etc) in interval mode too. E.g: # perf stat -I 1000 -e instructions,cycles sleep 1 # time counts unit events 1.000215928 519,620 instructions # 0.69 insn per cycle 1.000215928 752,003 cycles <SNIP> - Port 'perf kvm stat' to PowerPC (Hemant Kumar) - Implement CSV metrics output in 'perf stat' (Andi Kleen) perf BPF support: - Support converting data from bpf events in 'perf data' (Wang Nan) - Print bpf-output events in 'perf script': (Wang Nan). # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000 # perf script usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms]) BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a 0008: 42 50 46 20 65 76 65 6e BPF even 0010: 74 21 00 00 t!.. BPF string: "Raise a BPF event!" # - Add API to set values of map entries in a BPF object, be it individual map slots or ranges (Wang Nan) - Introduce support for the 'bpf-output' event (Wang Nan) - Add glue to read perf events in a BPF program (Wang Nan) - Improve support for bpf-output events in 'perf trace' (Wang Nan) ... and tons of other changes as well - see the shortlog and git log for details!" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits) perf stat: Add --metric-only support for -A perf stat: Implement --metric-only mode perf stat: Document CSV format in manpage perf hists browser: Check sort keys before hot key actions perf hists browser: Allow thread filtering for comm sort key perf tools: Add sort__has_comm variable perf tools: Recalc total periods using top-level entries in hierarchy perf tools: Remove nr_sort_keys field perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry() perf tools: Remove hist_entry->fmt field perf tools: Fix command line filters in hierarchy mode perf tools: Add more sort entry check functions perf tools: Fix hist_entry__filter() for hierarchy perf jitdump: Build only on supported archs tools lib traceevent: Add '~' operation within arg_num_eval() perf tools: Omit unnecessary cast in perf_pmu__parse_scale perf tools: Pass perf_hpp_list all the way through setup_sort_list perf tools: Fix perf script python database export crash perf jitdump: DWARF is also needed perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes ... |
|
Linus Torvalds | d09e356ad0 |
Merge branch 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull read-only kernel memory updates from Ingo Molnar: "This tree adds two (security related) enhancements to the kernel's handling of read-only kernel memory: - extend read-only kernel memory to a new class of formerly writable kernel data: 'post-init read-only memory' via the __ro_after_init attribute, and mark the ARM and x86 vDSO as such read-only memory. This kind of attribute can be used for data that requires a once per bootup initialization sequence, but is otherwise never modified after that point. This feature was based on the work by PaX Team and Brad Spengler. (by Kees Cook, the ARM vDSO bits by David Brown.) - make CONFIG_DEBUG_RODATA always enabled on x86 and remove the Kconfig option. This simplifies the kernel and also signals that read-only memory is the default model and a first-class citizen. (Kees Cook)" * 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM/vdso: Mark the vDSO code read-only after init x86/vdso: Mark the vDSO code read-only after init lkdtm: Verify that '__ro_after_init' works correctly arch: Introduce post-init read-only memory x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings asm-generic: Consolidate mark_rodata_ro() |
|
Linus Torvalds | fbed0bc091 |
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar: "Various updates: - Futex scalability improvements: remove page lock use for shared futex get_futex_key(), which speeds up 'perf bench futex hash' benchmarks by over 40% on a 60-core Westmere. This makes anon-mem shared futexes perform close to private futexes. (Mel Gorman) - lockdep hash collision detection and fix (Alfredo Alvarez Fernandez) - lockdep testing enhancements (Alfredo Alvarez Fernandez) - robustify lockdep init by using hlists (Andrew Morton, Andrey Ryabinin) - mutex and csd_lock micro-optimizations (Davidlohr Bueso) - small x86 barriers tweaks (Michael S Tsirkin) - qspinlock updates (Waiman Long)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait() locking/csd_lock: Explicitly inline csd_lock*() helpers futex: Replace barrier() in unqueue_me() with READ_ONCE() locking/lockdep: Detect chain_key collisions locking/lockdep: Prevent chain_key collisions tools/lib/lockdep: Fix link creation warning tools/lib/lockdep: Add tests for AA and ABBA locking tools/lib/lockdep: Add userspace version of READ_ONCE() tools/lib/lockdep: Fix the build on recent kernels locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h locking/mutex: Allow next waiter lockless wakeup locking/pvqspinlock: Enable slowpath locking count tracking locking/qspinlock: Use smp_cond_acquire() in pending code locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock() locking/mcs: Fix mcs_spin_lock() ordering futex: Remove requirement for lock_page() in get_futex_key() futex: Rename barrier references in ordering guarantees locking/atomics: Update comment about READ_ONCE() and structures locking/lockdep: Eliminate lockdep_init() locking/lockdep: Convert hash tables to hlists ... |
|
Rafael J. Wysocki | 0d571b62dd |
Merge branch 'pm-tools'
* pm-tools: tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals tools/power turbostat: Decode MSR_MISC_PWR_MGMT tools/power turbostat: decode HWP registers x86 msr-index: Simplify syntax for HWP fields tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency tools/power turbostat: decode more CPUID fields |
|
Alexander Duyck | 1e94082963 |
ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short
This patch updates csum_ipv6_magic so that it correctly recognizes that protocol is a unsigned 8 bit value. This will allow us to better understand what limitations may or may not be present in how we handle the data. For example there are a number of places that call htonl on the protocol value. This is likely not necessary and can be replaced with a multiplication by ntohl(1) which will be converted to a shift by the compiler. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Alexander Duyck | 01cfbad79a |
ipv4: Update parameters for csum_tcpudp_magic to their original types
This patch updates all instances of csum_tcpudp_magic and csum_tcpudp_nofold to reflect the types that are usually used as the source inputs. For example the protocol field is populated based on nexthdr which is actually an unsigned 8 bit value. The length is usually populated based on skb->len which is an unsigned integer. This addresses an issue in which the IPv6 function csum_ipv6_magic was generating a checksum using the full 32b of skb->len while csum_tcpudp_magic was only using the lower 16 bits. As a result we could run into issues when attempting to adjust the checksum as there was no protocol agnostic way to update it. With this change the value is still truncated as many architectures use "(len + proto) << 8", however this truncation only occurs for values greater than 16776960 in length and as such is unlikely to occur as we stop the inner headers at ~64K in size. I did have to make a few minor changes in the arm, mn10300, nios2, and score versions of the function in order to support these changes as they were either using things such as an OR to combine the protocol and length, or were using ntohs to convert the length which would have truncated the value. I also updated a few spots in terms of whitespace and type differences for the addresses. Most of this was just to make sure all of the definitions were in sync going forward. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Rafael J. Wysocki | 3fdb74649b |
Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools
Pull turbostat updates for 4.6 from Len Brown. * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals tools/power turbostat: Decode MSR_MISC_PWR_MGMT tools/power turbostat: decode HWP registers x86 msr-index: Simplify syntax for HWP fields tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency tools/power turbostat: decode more CPUID fields |
|
Fenghua Yu | d050049442 |
x86/cpufeature: Enable new AVX-512 features
A few new AVX-512 instruction groups/features are added in cpufeatures.h for enuermation: AVX512DQ, AVX512BW, and AVX512VL. Clear the flags in fpu__xstate_clear_all_cpu_caps(). The specification for latest AVX-512 including the features can be found at: https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up the flags and enable them in KVM. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua.yu@intel.com [ Added more detailed feature descriptions. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 0d47638f80 |
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
Kirill Shutemov pointed this out to me.
The tip tree currently has commit:
|
|
Andy Lutomirski | 9999c8c01f |
x86/entry: Call enter_from_user_mode() with IRQs off
Now that slow-path syscalls always enter C before enabling interrupts, it's straightforward to call enter_from_user_mode() before enabling interrupts rather than doing it as part of entry tracing. With this change, we should finally be able to retire exception_enter(). This will also enable optimizations based on knowing that we never change context tracking state with interrupts on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Yu-cheng Yu | a65050c6f1 |
x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
Leonid Shatz noticed that the SDM interpretation of the following
recent commit:
|
|
Andy Lutomirski | 392a62549f |
x86/entry: Remove TIF_SINGLESTEP entry work
Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall entry code. Remove it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Andy Lutomirski | 2a41aa4feb |
x86/entry/32: Add and check a stack canary for the SYSENTER stack
The first instruction of the SYSENTER entry runs on its own tiny stack. That stack can be used if a #DB or NMI is delivered before the SYSENTER prologue switches to a real stack. We have code in place to prevent us from overflowing the tiny stack. For added paranoia, add a canary to the stack and check it in do_debug() -- that way, if something goes wrong with the #DB logic, we'll eventually notice. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Andy Lutomirski | 6dcc94149d |
x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
The SYSENTER stack is only used on 32-bit kernels. Remove it on 64-bit kernels. ( We may end up using it down the road on 64-bit kernels. If so, we'll re-enable it for CONFIG_IA32_EMULATION. ) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9dbd18429f9ff61a76b6eda97a9ea20510b9f6ba.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Andy Lutomirski | f2b375756c |
x86/entry: Vastly simplify SYSENTER TF (single-step) handling
Due to a blatant design error, SYSENTER doesn't clear TF (single-step). As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue. That's all we need to do. Don't get too excited -- our handling is still buggy on 32-bit kernels. There's nothing wrong with the SYSENTER code itself, but the #DB prologue has a clever fixup for traps on the very first instruction of entry_SYSENTER_32, and the fixup doesn't work quite correctly. The next two patches will fix that. [1] We could probably prevent it by forcing BTF on at all times and making sure we clear TF before any branches in the SYSENTER code. Needless to say, this is a bad idea. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dan Williams | 59e6473980 |
libnvdimm, pmem: clear poison on write
If a write is directed at a known bad block perform the following: 1/ write the data 2/ send a clear poison command 3/ invalidate the poison out of the cache hierarchy Cc: <x86@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
|
Paolo Bonzini | 5a5fbdc0e3 |
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
It is now equal to use_eager_fpu(), which simply tests a cpufeature bit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Paolo Bonzini | ab92f30875 |
KVM/ARM updates for 4.6
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - Various optimizations to the vgic save/restore code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW36xjAAoJECPQ0LrRPXpDGQkQAMDppzcTOixT3e8VPdHAX09a Z5PO0gyTMVV7Jyz5Ul3pedPJA2GSK9mxOCwqvIFbdxLAR6ZB00juO5FrTHkSdI91 1XLPj4bKoMWcVvhL/g5A4Glp/pVMW1k/9Yq8zZAtYlsLRlqG5rLOutSadcqHcYaJ cTD/pFf7b2oPtkTPyoFml75KgHBT/8uvAvFDOWA66Id2z6T11+PsBT/6XnGDiwKg tpGTNzx3kPIKIzOAOHqVW6UBxFOeabebXLT8wUz3VwNn/UbG6gkumMNApMAyF2q1 zU0nAh8+7Ek6Dr4OFWE6BfW6sgg/l7i1lA8XoAmqG7ZTrSptCc59fvaZJxPruG+Q dMsU6QgR77JJjbZTinf9a1jReZ/liZrx2gZXedVKdILrjmDSq0UnGcxjUOEDZOGy 2/dbrlJhv+LhpcJtuPpxPCfoqbW5L0ynzmuYuXRdRz3lTHiOWIRx5gugrhO+wH4D 4gvZhbw3XCiYfpYHYhl8A1EH5kanKgdXDocz9yIm7mZm89gngufF/HkeXS3ZU25T yThyBGulGjqN4FCdgf1HolkTfFjnfSx4qJovJ58eHga+HNLXRkTecZZcbFy2OOHv 8Bx0PIlwj4RgSaRLWQUudAhdhKS2g22DKDDljxFwhkMPNghvqkYMJCRDKLu6GBXQ 4YsLKM+TaShHFjSpx+ao =rpvb -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.6 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - Various optimizations to the vgic save/restore code Conflicts: include/uapi/linux/kvm.h |
|
Linus Walleij | 0bae2f1732 | Merge branch 'ib-mfd-regulator-gpio-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into devel | |
David S. Miller | 810813c47a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Tony Luck | 92b0729c34 |
x86/mm, x86/mce: Add memcpy_mcsafe()
Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@gmail.com> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | 14ddde78c7 |
Merge branch 'linus' into x86/fpu, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Andy Lutomirski | 58a5aac533 |
x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
x86_64 has very clean espfix handling on paravirt: espfix64 is set up in native_iret, so paravirt systems that override iret bypass espfix64 automatically. This is robust and straightforward. x86_32 is messier. espfix is set up before the IRET paravirt patch point, so it can't be directly conditionalized on whether we use native_iret. We also can't easily move it into native_iret without regressing performance due to a bizarre consideration. Specifically, on 64-bit kernels, the logic is: if (regs->ss & 0x4) setup_espfix; On 32-bit kernels, the logic is: if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 && (regs->flags & X86_EFLAGS_VM) == 0) setup_espfix; The performance of setup_espfix itself is essentially irrelevant, but the comparison happens on every IRET so its performance matters. On x86_64, there's no need for any registers except flags to implement the comparison, so we fold the whole thing into native_iret. On x86_32, we don't do that because we need a free register to implement the comparison efficiently. We therefore do espfix setup before restoring registers on x86_32. This patch gets rid of the explicit paravirt_enabled check by introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE to skip espfix on paravirt systems where iret != native_iret. This is also messy, but it's at least in line with other things we do. This improves espfix performance by removing a branch, but no one cares. More importantly, it removes a paravirt_enabled user, which is good because paravirt_enabled is ill-defined and is going away. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Paolo Bonzini | 6bb69c9b69 |
KVM: MMU: simplify last_pte_bitmap
Branch-free code is fun and everybody knows how much Avi loves it, but last_pte_bitmap takes it a bit to the extreme. Since the code is simply doing a range check, like (level == 1 || ((gpte & PT_PAGE_SIZE_MASK) && level < N) we can make it branch-free without storing the entire truth table; it is enough to cache N. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Denys Vlasenko | 1a8aa8acab |
x86/apic: Deinline __default_send_IPI_*, save ~200 bytes
__default_send_IPI_shortcut: 49 bytes, 2 callsites __default_send_IPI_dest_field: 108 bytes, 7 callsites text data bss dec hex filename 96184086 20860488 36122624 153167198 921255e vmlinux_before 96183823 20860520 36122624 153166967 9212477 vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Borislav Petkov <bp@alien.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1457287876-6001-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Stephane Eranian | 5690ae28e4 |
perf/x86/intel: Add definition for PT PMI bit
This patch adds a definition for GLOBAL_OVFL_STATUS bit 55 which is used with the Processor Trace (PT) feature. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/1457034642-21837-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Aravind Gopalakrishnan | ea2ca36b65 |
x86/mce/AMD: Document some functionality
In an attempt to aid in understanding of what the threshold_block structure holds, provide comments to describe the members here. Also, trim comments around threshold_restart_bank() and update copyright info. No functional change is introduced. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Shorten comments. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-6-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Aravind Gopalakrishnan | 2cd3b5f903 |
x86/mce: Clarify comments regarding deferred error
Deferred errors indicate errors that hardware could not fix. But it still does not cause any interruption to program flow. So it does not generate any #MC and UC bit in MCx_STATUS is not set. Correct comment. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-5-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Aravind Gopalakrishnan | 8dd1e17a55 |
x86/mce/AMD: Fix logic to obtain block address
In upcoming processors, the BLKPTR field is no longer used to indicate the MSR number of the additional register. Insted, it simply indicates the prescence of additional MSRs. Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC() for newer processors and fall back to existing logic for older processors. [ Drop nextaddr_out label; style cleanups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-4-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Aravind Gopalakrishnan | be0aec23bf |
x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Aravind Gopalakrishnan | adc53f2e0a |
x86/mce: Move MCx_CONFIG MSR definitions
Those MSRs are used only by the MCE code so move them there. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1456785179-14378-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | a1a8ba2d4a |
Merge branch 'linus' into ras/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Borislav Petkov | c041462217 |
x86/microcode/intel: Get rid of DWSIZE
sizeof(u32) is perfectly clear as it is. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1457345404-28884-3-git-send-email-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
|
Christoph Hellwig | bc4b024a8b |
PCI: Move pci_dma_* helpers to common code
For a long time all architectures implement the pci_dma_* functions using the generic DMA API, and they all use the same header to do so. Move this header, pci-dma-compat.h, to include/linux and include it from the generic pci.h instead of having each arch duplicate this include. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> |
|
Ingo Molnar | ec87e1cf7d |
Linux 4.5-rc7
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW3LO0AAoJEHm+PkMAQRiGhewIAIVHA1+qSSXEHTFeuLRuYpiz +ptQUIjPJdakWm/XqOnwSG8SWUuD4XL6ysfNmLSZIdqXYBAPpAuwT1UA2FZhz0dN soZxMNleAvzHWRDFLqwjVdOVlTxS6CTTdEQNzi+3R0ZCADllsRcuj/GBIY+M8cr6 LvxK8BnhDU+Au3gZQjaujTMO7fKG6gOq4wKz/U7RIG37A6rwW577kEfLg4ZgFwt9 RVjsky5mrX9+4l3QFtox9ZC383P/0VZ6+vXwN2QH1/joDK4EvA8pCwsGTyjRJiqi fArHbS+mHyAtbPWJmDbVlQ5dkZJAqRgtWBydjQYoC16S4Bwdce2/FbhBiTgEQAo= =sqln -----END PGP SIGNATURE----- Merge tag 'v4.5-rc7' into x86/asm, to pick up SMAP fix Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Jiri Kosina | 335e073faa |
klp: remove CONFIG_LIVEPATCH dependency from klp headers
There is no need for livepatch.h (generic and arch-specific) to depend on CONFIG_LIVEPATCH. Remove that superfluous dependency. Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Miroslav Benes | b24b78a113 |
klp: remove superfluous errors in asm/livepatch.h
There is an #error in asm/livepatch.h for both x86 and s390 in !CONFIG_LIVEPATCH cases. It does not make much sense as pointed out by Michael Ellerman. One can happily include asm/livepatch.h with CONFIG_LIVEPATCH. Remove it as useless. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Christopher S. Hall | f9677e0f83 |
x86/tsc: Always Running Timer (ART) correlated clocksource
On modern Intel systems TSC is derived from the new Always Running Timer (ART). ART can be captured simultaneous to the capture of audio and network device clocks, allowing a correlation between timebases to be constructed. Upon capture, the driver converts the captured ART value to the appropriate system clock using the correlated clocksource mechanism. On systems that support ART a new CPUID leaf (0x15) returns parameters “m” and “n” such that: TSC_value = (ART_value * m) / n + k [n >= 1] [k is an offset that can adjusted by a privileged agent. The IA32_TSC_ADJUST MSR is an example of an interface to adjust k. See 17.14.4 of the Intel SDM for more details] Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: kevin.b.stanton@intel.com Cc: kevin.j.clarke@intel.com Cc: hpa@zytor.com Cc: jeffrey.t.kirsher@intel.com Cc: netdev@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com> [jstultz: Tweaked to fix build issue, also reworked math for 64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build fixes] Signed-off-by: John Stultz <john.stultz@linaro.org> |
|
Xiao Guangrong | 13d268ca2c |
KVM: MMU: apply page track notifier
Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Xiao Guangrong | 0eb05bf290 |
KVM: page track: add notifier support
Notifier list is introduced so that any node wants to receive the track event can register to the list Two APIs are introduced here: - kvm_page_track_register_notifier(): register the notifier to receive track event - kvm_page_track_unregister_notifier(): stop receiving track event by unregister the notifier The callback, node->track_write() is called when a write access on the write tracked page happens Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Xiao Guangrong | e5691a81e8 |
KVM: MMU: clear write-flooding on the fast path of tracked page
If the page fault is caused by write access on write tracked page, the real shadow page walking is skipped, we lost the chance to clear write flooding for the page structure current vcpu is using Fix it by locklessly waking shadow page table to clear write flooding on the shadow page structure out of mmu-lock. So that we change the count to atomic_t Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Xiao Guangrong | 3d0c27ad6e |
KVM: MMU: let page fault handler be aware tracked page
The page fault caused by write access on the write tracked page can not be fixed, it always need to be emulated. page_fault_handle_page_track() is the fast path we introduce here to skip holding mmu-lock and shadow page table walking However, if the page table is not present, it is worth making the page table entry present and readonly to make the read access happy mmu_need_write_protect() need to be cooked to avoid page becoming writable when making page table present or sync/prefetch shadow page table entries Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Xiao Guangrong | f29d4d7810 |
KVM: page track: introduce kvm_slot_page_track_{add,remove}_page
These two functions are the user APIs: - kvm_slot_page_track_add_page(): add the page to the tracking pool after that later specified access on that page will be tracked - kvm_slot_page_track_remove_page(): remove the page from the tracking pool, the specified access on the page is not tracked after the last user is gone Both of these are called under the protection both of mmu-lock and kvm->srcu or kvm->slots_lock Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Xiao Guangrong | 21ebbedadd |
KVM: page track: add the framework of guest page tracking
The array, gfn_track[mode][gfn], is introduced in memory slot for every guest page, this is the tracking count for the gust page on different modes. If the page is tracked then the count is increased, the page is not tracked after the count reaches zero We use 'unsigned short' as the tracking count which should be enough as shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2 for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there is enough room for other trackers Two callbacks, kvm_page_track_create_memslot() and kvm_page_track_free_memslot() are implemented in this patch, they are internally used to initialize and reclaim the memory of the array Currently, only write track mode is supported Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Xiao Guangrong | 92f94f1e9e |
KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
Ingo Molnar | 39a1142dbb |
Linux 4.5-rc6
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg= =tvkh -----END PGP SIGNATURE----- Merge tag 'v4.5-rc6' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Thomas Gleixner | 1f12e32f4c |
x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of CPU packages to be within bounds. Create a tracking facility, which - calculates the number of possible packages depending on nr_cpu_ids after boot - makes sure that the package id is within the number of possible packages. If the apic id is outside we map it to a logical package id if there is enough space available. Provide interfaces for drivers to query the mapping and do translations from physcial to logical ids. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | 0a7348925f |
Linux 4.5-rc6
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg= =tvkh -----END PGP SIGNATURE----- Merge tag 'v4.5-rc6' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Linus Torvalds | a9f8094aae |
PCI updates for v4.5:
Enumeration Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas) Marvell MVEBU host bridge driver Restrict build to 32-bit ARM (Thierry Reding) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW0gP1AAoJEFmIoMA60/r84qkQAJXOFW20cie2yepQXIk7f5aN M2/+iFte8YHf4ZFgZWA/oS+mZAp1OqctSTjWg1KTPZsPHAiB6DkL7WOV6fK+uXr9 fX8D7Ec2eLgeIFl78iSQaAht4kfmfz8f5LlU6Oi9kvQOt+35gp4lP834HClx7Jep XT2qZy/zUQy8GylTzRqueMBpXBCnBQR8iyaD8j4rmklQB3yLXaEMTs7HzwJKBmhM ZDnH1xrV5cWYb7niSCBkq4IomCmezJZCvxcDjh/Z8gjDKbVl7TLYOdU8Jh4wNO++ ng0J8WDSKQJ9Hfv6H+5dgPzoqgrIrWb/Oz5GXd8i6cqv00szG5S/w8nHcO8LPSJv dJxxfTlz4KRxdv/sqOVW4cDFUmScODMkDMh+hAeEVYKl9ty5fQ4O2iNwNehzrdNj FRrgN1980amYN2n09NZNF863dvVN+DMJ4Ll2VT01rOIUH3bwt4cO6rVWrEUlEKCn DiSvJlXHm5nLLCQpkkGKAeq5hYl25DFtYVwLopIbUSHFXCASHPtQewDvgzfn9zYi M7J8bDa/uTscSqJsGsb4/gHLEblCfju7Pj2gEHoiK4XtbCuuamFA3nsA7lzcAG9j W5pVDQTqctdgHq/UMLKIeoBJ592fhYzKipY8vELOKwkieDR9F3g3u8nWt4ZAUIXE /oS5F1eWMkDMvdjyZO4C =yQ41 -----END PGP SIGNATURE----- Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Enumeration: Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas) Marvell MVEBU host bridge driver: Restrict build to 32-bit ARM (Thierry Reding)" * tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: mvebu: Restrict build to 32-bit ARM Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()" Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled" |
|
Bjorn Helgaas | 6c777e8799 |
Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
|
|
Nicolai Stange | d89abe2a1f |
arch/x86/irq: Purge useless handler declarations from hw_irq.h
arch/x86/include/asm/hw_irq.h contains declarations for the C-level handlers called into directly from the IDT-referenced assembly stubs. These declarations are never used as they are referenced from assembly only. Furthermore, these declarations got their attributes wrong: there is no '__irqentry' (parameter passing via stack) attached to them. Also, the list of declarations isn't complete: none of the tracing-capable variants is declared, for example. Purge the handler declarations. Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> |
|
Ingo Molnar | 319e305ca4 |
Merge branch 'ras/core' into core/objtool, to pick up the new exception table format
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | c0853867a1 |
Merge branch 'x86/debug' into core/objtool, to pick up frame pointer fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ard Biesheuvel | 48fcb2d021 |
efi: stub: use high allocation for converted command line
Before we can move the command line processing before the allocation of the kernel, which is required for detecting the 'nokaslr' option which controls that allocation, move the converted command line higher up in memory, to prevent it from interfering with the kernel itself. Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper bound there. Otherwise, use ULONG_MAX (i.e., no limit) Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
|
Adam Buchbinder | 6a6256f9e0 |
x86: Fix misspellings in comments
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Josh Poimboeuf | 821eae7d14 |
sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()
If __preempt_schedule() or __preempt_schedule_notrace() is referenced at the beginning of a function, gcc can insert the asm inline "call ___preempt_schedule[_notrace]" instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in bad stack traces. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the inline asm statements. Specifically this fixes the following stacktool warnings: stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame pointer save/setup stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame pointer save/setup stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without frame pointer save/setup stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without frame pointer save/setup stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without frame pointer save/setup stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame pointer save/setup stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call without frame pointer save/setup So it only adds a stack frame to 15 call sites out of ~5000 calls to ___preempt_schedule[_notrace](). All the others already had stack frames. Oddly, this change actually seems to make things faster in a lot of cases. For many smaller functions it causes the stack frame creation to get moved out of the common path and into the unlikely path. For example, here's the original cyc2ns_read_end(): ffffffff8101f8c0 <cyc2ns_read_end>: ffffffff8101f8c0: 55 push %rbp ffffffff8101f8c1: 48 89 e5 mov %rsp,%rbp ffffffff8101f8c4: 83 6f 10 01 subl $0x1,0x10(%rdi) ffffffff8101f8c8: 75 08 jne ffffffff8101f8d2 <cyc2ns_read_end+0x12> ffffffff8101f8ca: 65 48 89 3d e6 5a ff mov %rdi,%gs:0x7eff5ae6(%rip) # 153b8 <cyc2ns+0x38> ffffffff8101f8d1: 7e ffffffff8101f8d2: 65 ff 0d 77 c4 fe 7e decl %gs:0x7efec477(%rip) # bd50 <__preempt_count> ffffffff8101f8d9: 74 02 je ffffffff8101f8dd <cyc2ns_read_end+0x1d> ffffffff8101f8db: 5d pop %rbp ffffffff8101f8dc: c3 retq ffffffff8101f8dd: e8 1e 37 fe ff callq ffffffff81003000 <___preempt_schedule> ffffffff8101f8e2: 5d pop %rbp ffffffff8101f8e3: c3 retq ffffffff8101f8e4: 66 66 66 2e 0f 1f 84 data16 data16 nopw %cs:0x0(%rax,%rax,1) ffffffff8101f8eb: 00 00 00 00 00 And here's the same function with the patch: ffffffff8101f8c0 <cyc2ns_read_end>: ffffffff8101f8c0: 83 6f 10 01 subl $0x1,0x10(%rdi) ffffffff8101f8c4: 75 08 jne ffffffff8101f8ce <cyc2ns_read_end+0xe> ffffffff8101f8c6: 65 48 89 3d ea 5a ff mov %rdi,%gs:0x7eff5aea(%rip) # 153b8 <cyc2ns+0x38> ffffffff8101f8cd: 7e ffffffff8101f8ce: 65 ff 0d 7b c4 fe 7e decl %gs:0x7efec47b(%rip) # bd50 <__preempt_count> ffffffff8101f8d5: 74 01 je ffffffff8101f8d8 <cyc2ns_read_end+0x18> ffffffff8101f8d7: c3 retq ffffffff8101f8d8: 55 push %rbp ffffffff8101f8d9: 48 89 e5 mov %rsp,%rbp ffffffff8101f8dc: e8 1f 37 fe ff callq ffffffff81003000 <___preempt_schedule> ffffffff8101f8e1: 5d pop %rbp ffffffff8101f8e2: c3 retq ffffffff8101f8e3: 66 66 66 66 2e 0f 1f data16 data16 data16 nopw %cs:0x0(%rax,%rax,1) ffffffff8101f8ea: 84 00 00 00 00 00 Notice that it moved the frame pointer setup code to the unlikely ___preempt_schedule() call path. Going through a sampling of the differences in the asm, that's the most common change I see. Otherwise it has no real effect on callers which already have stack frames (though it does result in the reordering of some 'mov's). Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/20160218174158.GA28230@treble.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Josh Poimboeuf | 16df4ff860 |
x86/locking: Create stack frame in PV unlock
The assembly PV_UNLOCK function is a callable non-leaf function which doesn't honor CONFIG_FRAME_POINTER, which can result in bad stack traces. Create a stack frame when CONFIG_FRAME_POINTER is enabled. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hpe.com> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/6685a72ddbbd0ad3694337cca0af4b4ea09f5f40.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Chris J Arges | f05058c4d6 |
x86/uaccess: Add stack frame output operand in get_user() inline asm
Numerous 'call without frame pointer save/setup' warnings are introduced by stacktool because of functions using the get_user() macro. Bad stack traces could occur due to lack of or misplacement of stack frame setup code. This patch forces a stack frame to be created before the inline asm code if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the get_user() inline assembly statement. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/bc85501f221ee512670797c7f110022e64b12c81.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Josh Poimboeuf | 87b240cbe3 |
x86/paravirt: Create a stack frame in PV_CALLEE_SAVE_REGS_THUNK
A function created with the PV_CALLEE_SAVE_REGS_THUNK macro doesn't set up a new stack frame before the call instruction, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in a bad stack trace. Also, the thunk functions aren't annotated as ELF callable functions. Create a stack frame when CONFIG_FRAME_POINTER is enabled and add the ELF function type. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/a2cad74e87c4aba7fd0f54a1af312e66a824a575.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Josh Poimboeuf | bb93eb4cd6 |
x86/paravirt: Add stack frame dependency to PVOP inline asm calls
If a PVOP call macro is inlined at the beginning of a function, gcc can insert the call instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in a bad stack trace. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the PVOP inline asm statements. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/6a13e48c5a8cf2de1aa112ae2d4c0ac194096282.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Josh Poimboeuf | 0e8e2238b5 |
x86/xen: Add stack frame dependency to hypercall inline asm calls
If a hypercall is inlined at the beginning of a function, gcc can insert the call instruction before setting up a stack frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in a bad stack trace. Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an output operand for the hypercall inline asm statements. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/c6face5a46713108bded9c4c103637222abc4528.1453405861.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Linus Torvalds | de9e478b9d |
x86: fix SMAP in 32-bit environments
In commit
|
|
Bryan O'Donoghue | c637fa5294 |
x86/platform/intel/quark: Drop IMR lock bit support
Isolated Memory Regions support a lock bit. The lock bit in an IMR prevents modification of the IMR until the core goes through a warm or cold reset. The lock bit feature is not useful in the context of the kernel API and is not really necessary since modification of IMRs is possible only from ring-zero anyway. This patch drops support for IMR locks bits, it simplifies the kernel API and removes an unnecessary and needlessly complex feature. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andriy.shevchenko@linux.intel.com Cc: boon.leong.ong@intel.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/1456190999-12685-3-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
David S. Miller | b633353115 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Linus Torvalds | 692b8c663c |
Xen bug fixes for 4.5-rc5
- Two scsiback fixes (resource leak and spurious warning). - Fix DMA mapping of compound pages on arm/arm64. - Fix some pciback regressions in MSI-X handling. - Fix a pcifront crash due to some uninitialize state. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWyvatAAoJEFxbo/MsZsTRBFcH+wWnv0/N+gKib3cKCI4lwmTg n8iVgf8dNWwD36M2s/OlzCAglAIt8Xr6ySNvPqTerpm7lT9yXlIVQxGXTbIGuTAA h8Kt8WiC0BNLHHlLxBuCz62KR47DvMhsr84lFURE8FmpUiulFjXmRcbrZkHIMYRS l/X+xJWO1vxwrSYho0P9n3ksTWHm488DTPvZz3ICNI2G2sndDfbT3gv3tMDaQhcX ZaQR93vtIoldqk29Ga59vaVtksbgxHZIbasY9PQ8rqOxHJpDQbPzpjocoLxAzf50 cioQVyKQ7i9vUvZ+B3TTAOhxisA2hDwNhLGQzmjgxe2TXeKdo3yjYwO6m1dDBzY= =VY/S -----END PGP SIGNATURE----- Merge tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Two scsiback fixes (resource leak and spurious warning). - Fix DMA mapping of compound pages on arm/arm64. - Fix some pciback regressions in MSI-X handling. - Fix a pcifront crash due to some uninitialize state. * tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted. xen/pcifront: Report the errors better. xen/pciback: Save the number of MSI-X entries to be copied later. xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY xen: fix potential integer overflow in queue_reply xen/arm: correctly handle DMA mapping of compound pages xen/scsiback: avoid warnings when adding multiple LUNs to a domain xen/scsiback: correct frontend counting |
|
Kees Cook | 9ccaf77cf0 |
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled. This simplifies the code and also makes it clearer that read-only mapped memory is just as fundamental a security feature in kernel-space as it is in user-space. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Brown <david.brown@linaro.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Krause <minipli@googlemail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: PaX Team <pageexec@freemail.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-hardening@lists.openwall.com Cc: linux-arch <linux-arch@vger.kernel.org> Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Kees Cook | e267d97b83 |
asm-generic: Consolidate mark_rodata_ro()
Instead of defining mark_rodata_ro() in each architecture, consolidate it. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Gross <agross@codeaurora.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashok Kumar <ashoks@broadcom.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Brown <david.brown@linaro.org> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathias Krause <minipli@googlemail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: PaX Team <pageexec@freemail.hu> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: kernel-hardening@lists.openwall.com Cc: linux-arch <linux-arch@vger.kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-parisc@vger.kernel.org Link: http://lkml.kernel.org/r/1455748879-21872-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Sai Praneeth | 6d0cc887d5 |
x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
Now that we have EFI memory region bits that indicate which regions do not need execute permission or read/write permission in the page tables, let's use them. We also check for EFI_NX_PE_DATA and only enforce the restrictive mappings if it's present (to allow us to ignore buggy firmware that sets bits it didn't mean to and to preserve backwards compatibility). Instead of assuming that firmware would set appropriate attributes in memory descriptor like EFI_MEMORY_RO for code and EFI_MEMORY_XP for data, we can expect some firmware out there which might only set *type* in memory descriptor to be EFI_RUNTIME_SERVICES_CODE or EFI_RUNTIME_SERVICES_DATA leaving away attribute. This will lead to improper mappings of EFI runtime regions. In order to avoid it, we check attribute and type of memory descriptor to update mappings and moreover Windows works this way. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1455712566-16727-13-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | ab876728a9 |
Linux 4.5-rc5
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWyN0eAAoJEHm+PkMAQRiGqIAIAKKodaqX5ACJhTRozj3GN5iV dDHU/SJQj4nIyJecaCVAJIBa3gvAX6GyY+Jg4JKJ4TKAdR0Hd/3EwOWIR+0+BQIM 0MqmB0CRLzq42AOQtpDUdwB+OTE8jFQFQd2gFKuQYJJ61ppykCC36OWV0bTfQLSV b2esO4Ry6eoQnDMw8oT52ncUIZEvQ2DZE3L6tNDEPD/0je14GWkV1Fx1+X2jb9cB diFA2TmaEEXMHNT1NCLSQ+D7QefXV3mFl85leNlFi5QQNy7ZdSh7kvvOodMQ2uAS qa9V8Uk6LZYv5O71+Jr5Rmlqh3GxNRCMXu2tlMd2gtw8ApEvBw6XoL5YZYE13Lk= =3HMg -----END PGP SIGNATURE----- Merge tag 'v4.5-rc5' into efi/core, before queueing up new changes Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Alexei Starovoitov | 568b329a02 |
perf: generalize perf_callchain
. avoid walking the stack when there is no room left in the buffer . generalize get_perf_callchain() to be called from bpf helper Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Linus Torvalds | 705d43dbe1 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina: - regression (from 4.4) fix for ordering issue, introduced by an earlier ftrace change, that broke live patching of modules. The fix replaces the ftrace module notifier by direct call in order to make the ordering guaranteed and well-defined. The patch, from Jessica Yu, has been acked both by Steven and Rusty - error message fix from Miroslav Benes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: ftrace/module: remove ftrace module notifier livepatch: change the error message in asm/livepatch.h header files |
|
Dave Hansen | 62b5f7d013 |
mm/core, x86/mm/pkeys: Add execute-only protection keys support
Protection keys provide new page-based protection in hardware. But, they have an interesting attribute: they only affect data accesses and never affect instruction fetches. That means that if we set up some memory which is set as "access-disabled" via protection keys, we can still execute from it. This patch uses protection keys to set up mappings to do just that. If a user calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only without PROT_READ/WRITE), the kernel will notice this, and set a special protection key on the memory. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. I haven't found any userspace that does this today. With this facility in place, we expect userspace to move to use it eventually. Userspace _could_ start doing this today. Any PROT_EXEC calls get converted to PROT_READ inside the kernel, and would transparently be upgraded to "true" PROT_EXEC with this code. IOW, userspace never has to do any PROT_EXEC runtime detection. This feature provides enhanced protection against leaking executable memory contents. This helps thwart attacks which are attempting to find ROP gadgets on the fly. But, the security provided by this approach is not comprehensive. The PKRU register which controls access permissions is a normal user register writable from unprivileged userspace. An attacker who can execute the 'wrpkru' instruction can easily disable the protection provided by this feature. The protection key that is used for execute-only support is permanently dedicated at compile time. This is fine for now because there is currently no API to set a protection key other than this one. Despite there being a constant PKRU value across the entire system, we do not set it unless this feature is in use in a process. That is to preserve the PKRU XSAVE 'init state', which can lead to faster context switches. PKRU *is* a user register and the kernel is modifying it. That means that code doing: pkru = rdpkru() pkru |= 0x100; mmap(..., PROT_EXEC); wrpkru(pkru); could lose the bits in PKRU that enforce execute-only permissions. To avoid this, we suggest avoiding ever calling mmap() or mprotect() when the PKRU value is expected to be unstable. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Piotr Kwapulinski <kwapulinski.piotr@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: keescook@google.com Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210240.CB4BB5CA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 878ba03932 |
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
calc_vm_prot_bits() takes PROT_{READ,WRITE,EXECUTE} bits and turns them in to the vma->vm_flags/VM_* bits. We need to do a similar thing for protection keys. We take a protection key (4 bits) and encode it in to the 4 VM_PKEY_* bits. Note: this code is not new. It was simply a part of the mprotect_pkey() patch in the past. I broke it out for use in the execute-only support. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210237.CFB94AD5@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 8459429693 |
x86/mm/pkeys: Allow kernel to modify user pkey rights register
The Protection Key Rights for User memory (PKRU) is a 32-bit user-accessible register. It contains two bits for each protection key: one to write-disable (WD) access to memory covered by the key and another to access-disable (AD). Userspace can read/write the register with the RDPKRU and WRPKRU instructions. But, the register is saved and restored with the XSAVE family of instructions, which means we have to treat it like a floating point register. The kernel needs to write to the register if it wants to implement execute-only memory or if it implements a system call to change PKRU. To do this, we need to create a 'pkru_state' buffer, read the old contents in to it, modify it, and then tell the FPU code that there is modified data in there so it can (possibly) move the buffer back in to the registers. This uses the fpu__xfeature_set_state() function that we defined in the previous patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210236.0BE13217@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | b8b9b6ba9d |
x86/fpu: Allow setting of XSAVE state
We want to modify the Protection Key rights inside the kernel, so we need to change PKRU's contents. But, if we do a plain 'wrpkru', when we return to userspace we might do an XRSTOR and wipe out the kernel's 'wrpkru'. So, we need to go after PKRU in the xsave buffer. We do this by: 1. Ensuring that we have the XSAVE registers (fpregs) in the kernel FPU buffer (fpstate) 2. Looking up the location of a given state in the buffer 3. Filling in the stat 4. Ensuring that the hardware knows that state is present there (basically that the 'init optimization' is not in place). 5. Copying the newly-modified state back to the registers if necessary. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210235.5A3139BF@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 39a0526fb3 |
x86/mm: Factor out LDT init from context init
The arch-specific mm_context_t is a great place to put protection-key allocation state. But, we need to initialize the allocation state because pkey 0 is always "allocated". All of the runtime initialization of mm_context_t is done in *_ldt() manipulation functions. This renames the existing LDT functions like this: init_new_context() -> init_new_context_ldt() destroy_context() -> destroy_context_ldt() and makes init_new_context() and destroy_context() available for generic use. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210234.DB34FCC5@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 66d375709d |
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
The syscall-level code is passed a protection key and need to return an appropriate error code if the protection key is bogus. We will be using this in subsequent patches. Note that this also begins a series of arch-specific calls that we need to expose in otherwise arch-independent code. We create a linux/pkeys.h header where we will put *all* the stubs for these functions. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210232.774EEAAB@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | d61172b4b6 |
mm/core, x86/mm/pkeys: Differentiate instruction fetches
As discussed earlier, we attempt to enforce protection keys in software. However, the code checks all faults to ensure that they are not violating protection key permissions. It was assumed that all faults are either write faults where we check PKRU[key].WD (write disable) or read faults where we check the AD (access disable) bit. But, there is a third category of faults for protection keys: instruction faults. Instruction faults never run afoul of protection keys because they do not affect instruction fetches. So, plumb the PF_INSTR bit down in to the arch_vma_access_permitted() function where we do the protection key checks. We also add a new FAULT_FLAG_INSTRUCTION. This is because handle_mm_fault() is not passed the architecture-specific error_code where we keep PF_INSTR, so we need to encode the instruction fetch information in to the arch-generic fault flags. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 1b2ee1266e |
mm/core: Do not enforce PKEY permissions on remote mm access
We try to enforce protection keys in software the same way that we do in hardware. (See long example below). But, we only want to do this when accessing our *own* process's memory. If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then tried to PTRACE_POKE a target process which just happened to have some mprotect_pkey(pkey=6) memory, we do *not* want to deny the debugger access to that memory. PKRU is fundamentally a thread-local structure and we do not want to enforce it on access to _another_ thread's data. This gets especially tricky when we have workqueues or other delayed-work mechanisms that might run in a random process's context. We can check that we only enforce pkeys when operating on our *own* mm, but delayed work gets performed when a random user context is active. We might end up with a situation where a delayed-work gup fails when running randomly under its "own" task but succeeds when running under another process. We want to avoid that. To avoid that, we use the new GUP flag: FOLL_REMOTE and add a fault flag: FAULT_FLAG_REMOTE. They indicate that we are walking an mm which is not guranteed to be the same as current->mm and should not be subject to protection key enforcement. Thanks to Jerome Glisse for pointing out this scenario. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Dominik Vogt <vogt@linux.vnet.ibm.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Geliang Tang <geliangtang@163.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Low <jason.low2@hp.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: iommu@lists.linux-foundation.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 33a709b25a |
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
Today, for normal faults and page table walks, we check the VMA and/or PTE to ensure that it is compatible with the action. For instance, if we get a write fault on a non-writeable VMA, we SIGSEGV. We try to do the same thing for protection keys. Basically, we try to make sure that if a user does this: mprotect(ptr, size, PROT_NONE); *ptr = foo; they see the same effects with protection keys when they do this: mprotect(ptr, size, PROT_READ|PROT_WRITE); set_pkey(ptr, size, 4); wrpkru(0xffffff3f); // access disable pkey 4 *ptr = foo; The state to do that checking is in the VMA, but we also sometimes have to do it on the page tables only, like when doing a get_user_pages_fast() where we have no VMA. We add two functions and expose them to generic code: arch_pte_access_permitted(pte_flags, write) arch_vma_access_permitted(vma, write) These are, of course, backed up in x86 arch code with checks against the PTE or VMA's protection key. But, there are also cases where we do not want to respect protection keys. When we ptrace(), for instance, we do not want to apply the tracer's PKRU permissions to the PTEs from the process being traced. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Dominik Vogt <vogt@linux.vnet.ibm.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Low <jason.low2@hp.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Shachar Raindel <raindel@mellanox.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | a927cb83f3 |
x86/mm/pkeys: Add functions to fetch PKRU
This adds the raw instruction to access PKRU as well as some accessor functions that correctly handle when the CPU does not support the instruction. We don't use it here, but we will use read_pkru() in the next patch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 019132ff3d |
x86/mm/pkeys: Fill in pkey field in siginfo
This fills in the new siginfo field: si_pkey to indicate to userspace which protection key was set on the PTE that we faulted on. Note though that *ALL* protection key faults have to be generated by a valid, present PTE at some point. But this code does no PTE lookups which seeds odd. The reason is that we take advantage of the way we generate PTEs from VMAs. All PTEs under a VMA share some attributes. For instance, they are _all_ either PROT_READ *OR* PROT_NONE. They also always share a protection key, so we never have to walk the page tables; we just use the VMA. Note that _pkey is a 64-bit value. The current hardware only supports 4-bit protection keys. We do this because there is _plenty_ of space in _sigfault and it is possible that future processors would support more than 4 bits of protection keys. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 8f62c88322 |
x86/mm/pkeys: Add arch-specific VMA protection bits
Lots of things seem to do: vma->vm_page_prot = vm_get_page_prot(flags); and the ptes get created right from things we pull out of ->vm_page_prot. So it is very convenient if we can store the protection key in flags and vm_page_prot, just like the existing permission bits (_PAGE_RW/PRESENT). It greatly reduces the amount of plumbing and arch-specific hacking we have to do in generic code. This also takes the new PROT_PKEY{0,1,2,3} flags and turns *those* in to VM_ flags for vma->vm_flags. The protection key values are stored in 4 places: 1. "prot" argument to system calls 2. vma->vm_flags, filled from the mmap "prot" 3. vma->vm_page prot, filled from vma->vm_flags 4. the PTE itself. The pseudocode for these for steps are as follows: mmap(PROT_PKEY*) vma->vm_flags = ... | arch_calc_vm_prot_bits(mmap_prot); vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags); pte = pfn | vma->vm_page_prot Note that this provides a new definitions for x86: arch_vm_get_page_prot() Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Dave Hansen | 5c1d90f510 |
x86/mm/pkeys: Add PTE bits for storing protection key
Previous documentation has referred to these 4 bits as "ignored". That means that software could have made use of them. But, as far as I know, the kernel never used them. They are still ignored when protection keys is not enabled, so they could theoretically still get used for software purposes. We also implement "empty" versions so that code that references to them can be optimized away by the compiler when the config option is not enabled. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210205.81E33ED6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Tony Luck | 0f68c088c0 |
x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
The Intel Software Developer Manual describes bit 24 in the MCG_CAP MSR: MCG_SER_P (software error recovery support present) flag, bit 24 — Indicates (when set) that the processor supports software error recovery But only some models with this capability bit set will actually generate recoverable machine checks. Check the model name and set a synthetic capability bit. Provide a command line option to set this bit anyway in case the kernel doesn't recognise the model name. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Ingo Molnar | 3a2f2ac9b9 |
Merge branch 'x86/urgent' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Tony Luck | 548acf1923 |
x86/mm: Expand the exception table logic to allow new handling options
Huge amounts of help from Andy Lutomirski and Borislav Petkov to produce this. Andy provided the inspiration to add classes to the exception table with a clever bit-squeezing trick, Boris pointed out how much cleaner it would all be if we just had a new field. Linus Torvalds blessed the expansion with: ' I'd rather not be clever in order to save just a tiny amount of space in the exception table, which isn't really criticial for anybody. ' The third field is another relative function pointer, this one to a handler that executes the actions. We start out with three handlers: 1: Legacy - just jumps the to fixup IP 2: Fault - provide the trap number in %ax to the fixup code 3: Cleaned up legacy for the uaccess error hack Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8.1455732970.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |