mirror of https://gitee.com/openkylin/linux.git
63313 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Benjamin Tissoires | 6aef704e38 |
HID: multitouch: add support of other generic collections in hid-mt
The ANTON Touch Pad is a device which can switch from a multitouch touchpad to a mouse. It thus presents several generic collections which are currently ignored by hid-multitouch. Enable them by not ignoring them in mt_input_mapping. Adding also a suffix for them depending on their application. Reported-by: Edel Maks <edelmaks@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | 3c86726cfe |
HID: make .raw_request mandatory
SET_REPORT and GET_REPORT are mandatory in the HID specification. Make the corresponding API in hid-core mandatory too, which removes the need to test against it in some various places. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | 5318251744 |
HID: core: check parameters when sending/receiving data from the device
It is better to check them soon enough before triggering any kernel panic. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | 4fa5a7f76c |
HID: core: implement generic .request()
.request() can be emulated through .raw_request() we can implement this emulation in hid-core, and make .request not mandatory for transport layer drivers. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | 7e845d46b1 |
HID: introduce helper to access hid_output_raw_report()
Add a helper to access hdev->hid_output_raw_report(). To convert the drivers, use the following snippets: for i in drivers/hid/*.c do sed -i.bak "s/[^ \t]*->hid_output_raw_report(/hid_output_raw_report(/g" $i done Then manually fix for checkpatch.pl Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | cafebc058b |
HID: remove hid_get_raw_report in struct hid_device
dev->hid_get_raw_report(X) and hid_hw_raw_request(X, HID_REQ_GET_REPORT) are strictly equivalent. Switch the hid subsystem to the hid_hw notation and remove the field .hid_get_raw_report in struct hid_device. Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | b40272e4d0 |
HID: remove hidinput_input_event handler
All the different transport drivers use now the generic event handling in hid-input. We can remove the handler definitively now. Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Benjamin Tissoires | b69d653679 |
HID: add inliners for ll_driver transport-layer callbacks
Those callbacks are not mandatory, so it's better to add inliners to use them safely. Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Frank Praznik | cd4dc0821b |
HID: Add transport-driver callbacks to the hid_ll_driver struct
Add raw_request and output_report callbacks to the hid_ll_driver struct. Signed-off-by: Frank Praznik <frank.praznik@oh.rr.com> Acked-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
|
Linus Torvalds | fe41c2c018 |
A set of device-mapper changes for 3.14.
A lot of attention was paid to improving the thin-provisioning target's handling of metadata operation failures and running out of space. A new 'error_if_no_space' feature was added to allow users to error IOs rather than queue them when either the data or metadata space is exhausted. Additional fixes/features include: - a few fixes to properly support thin metadata device resizing - a solution for reliably waiting for a DM device's embedded kobject to be released before destroying the device - old dm-snapshot is updated to use the dm-bufio interface to take advantage of readahead capabilities that improve snapshot activation - new dm-cache target tunables to control how quickly data is promoted to the cache (fast) device - improved write efficiency of cluster mirror target by combining userspace flush and mark requests -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJS4GClAAoJEMUj8QotnQNacdEH/2ES5k5itUQRY9jeI+u2zYNP vdsRTYf+97+B3jpRvpWbMt4kxT2tjaQbkxJ+iKRHy2MBLFUgq8ruH1RS/Q5VbDeg 6i6ol8mpNxhlvo/KTMxXqRcWDSxShiMfhz2lXC2bJ7M4sP/iiH85s4Pm4YQ59jpd OIX7qj36m/cV/le9YQbexJEEsaj+3genbzL26wyyvtG/rT9fWnXa7clj2gqTdToG YCEBCRf5FH9X6W/Oc50nMw5n2dt/MRmPre/MAlOjemeaosB0WJiKaswM25rnvHp0 JnhxQ2K2C5KIKAWIfwPOImdb9zWW7p1dIRLsS8nHBUQr0BF5VRkmvpnYH4qBtcc= =e7e0 -----END PGP SIGNATURE----- Merge tag 'dm-3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device-mapper changes from Mike Snitzer: "A lot of attention was paid to improving the thin-provisioning target's handling of metadata operation failures and running out of space. A new 'error_if_no_space' feature was added to allow users to error IOs rather than queue them when either the data or metadata space is exhausted. Additional fixes/features include: - a few fixes to properly support thin metadata device resizing - a solution for reliably waiting for a DM device's embedded kobject to be released before destroying the device - old dm-snapshot is updated to use the dm-bufio interface to take advantage of readahead capabilities that improve snapshot activation - new dm-cache target tunables to control how quickly data is promoted to the cache (fast) device - improved write efficiency of cluster mirror target by combining userspace flush and mark requests" * tag 'dm-3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits) dm log userspace: allow mark requests to piggyback on flush requests dm space map metadata: fix bug in resizing of thin metadata dm cache: add policy name to status output dm thin: fix pool feature parsing dm sysfs: fix a module unload race dm snapshot: use dm-bufio prefetch dm snapshot: use dm-bufio dm snapshot: prepare for switch to using dm-bufio dm snapshot: use GFP_KERNEL when initializing exceptions dm cache: add block sizes and total cache blocks to status output dm btree: add dm_btree_find_lowest_key dm space map metadata: fix extending the space map dm space map common: make sure new space is used during extend dm: wait until embedded kobject is released before destroying a device dm: remove pointless kobject comparison in dm_get_from_kobject dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() dm cache policy mq: introduce three promotion threshold tunables dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD dm thin: fix set_pool_mode exposed pool operation races dm thin: eliminate the no_free_space flag ... |
|
Linus Torvalds | 194e57fd18 |
SCSI for-linus on 20140122
This patch set is a lot of driver updates for qla4xxx, bfa, hpsa, qla2xxx. It also removes the aic7xxx_old driver (which has been deprecated for nearly a decade) and adds support for deadlines in error handling. Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJS4AfxAAoJEDeqqVYsXL0M8DAIAKRtc9+hy2zsratyvHxxZlz4 6tdvT/4dLTXbIS7EFHmdVb09G5791Pj2jx2PAza/zFDSzgMyL8reXjXpDtCQBvsG U9CKAGAfJi/vD3QSez2qh0SyvIfCHjatlU3IsM/g+5iFcIOO9+LMJuRlw3qtVqOZ iQQBATH8cuVW7k1YRHXS5xAO8cVCEYBPuuz0UreVnZijTV5lsRFaObaq8r7S7yVM 4F5eSb5lcgZZLDMDMB0E2KHq12QoLiLQy/ahCqAL1od41v+P90BHC4cIu8jkdB0l nUYp8vHkmsM82L3xxK9HglcZvJ94fkz6n1gwULl2dTNSYldUnXc/xmocbt0KMf8= =yko3 -----END PGP SIGNATURE----- Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This patch set is a lot of driver updates for qla4xxx, bfa, hpsa, qla2xxx. It also removes the aic7xxx_old driver (which has been deprecated for nearly a decade) and adds support for deadlines in error handling" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (75 commits) [SCSI] hpsa: allow SCSI mid layer to handle unit attention [SCSI] hpsa: do not require board "not ready" status after hard reset [SCSI] hpsa: enable unit attention reporting [SCSI] hpsa: rename scsi prefetch field [SCSI] hpsa: use workqueue instead of kernel thread for lockup detection [SCSI] ipr: increase dump size in ipr driver [SCSI] mac_scsi: Fix crash on out of memory [SCSI] st: fix enlarge_buffer [SCSI] qla1280: Annotate timer on stack so object debug does not complain [SCSI] qla4xxx: Update driver version to 5.04.00-k3 [SCSI] qla4xxx: Recreate chap data list during get chap operation [SCSI] qla4xxx: Add support for ISCSI_PARAM_LOCAL_IPADDR sysfs attr [SCSI] libiscsi: Add local_ipaddr parameter in iscsi_conn struct [SCSI] scsi_transport_iscsi: Export ISCSI_PARAM_LOCAL_IPADDR attr for iscsi_connection [SCSI] qla4xxx: Add host statistics support [SCSI] scsi_transport_iscsi: Add host statistics support [SCSI] qla4xxx: Added support for Diagnostics MBOX command [SCSI] bfa: Driver version upgrade to 3.2.23.0 [SCSI] bfa: change FC_ELS_TOV to 20sec [SCSI] bfa: Observed auto D-port mode instead of manual ... |
|
Linus Torvalds | e1ba84597c |
PCI changes for the v3.14 merge window:
Resource management - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas) - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu) - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas) - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas) - Enforce bus address limits in resource allocation (Yinghai Lu) - Allocate 64-bit BARs above 4G when possible (Yinghai Lu) - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu) PCI device hotplug - Major rescan/remove locking update (Rafael J. Wysocki) - Make ioapic builtin only (not modular) (Yinghai Lu) - Fix release/free issues (Yinghai Lu) - Clean up pciehp (Bjorn Helgaas) - Announce pciehp slot info during enumeration (Bjorn Helgaas) MSI - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev) - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev) - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev) - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman) - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong) SR-IOV - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao) Virtualization - Add support for save/restore of extended capabilities (Alex Williamson) - Add Virtual Channel to save/restore support (Alex Williamson) - Never treat a VF as a multifunction device (Alex Williamson) - Add pci_try_reset_function(), et al (Alex Williamson) AER - Ignore non-PCIe error sources (Betty Dall) - Support ACPI HEST error sources for domains other than 0 (Betty Dall) - Consolidate HEST error source parsers (Bjorn Helgaas) - Add a TLP header print helper (Borislav Petkov) Freescale i.MX6 - Remove unnecessary code (Fabio Estevam) - Make reset-gpio optional (Marek Vasut) - Report "link up" only after link training completes (Marek Vasut) - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut) - Fix PCIe startup code (Richard Zhu) Marvell MVEBU - Remove duplicate of_clk_get_by_name() call (Andrew Lunn) - Drop writes to bridge Secondary Status register (Jason Gunthorpe) - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe) - Support a bridge with no IO port window (Jason Gunthorpe) - Use max_t() instead of max(resource_size_t,) (Jingoo Han) - Remove redundant of_match_ptr (Sachin Kamat) - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni) NVIDIA Tegra - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower) Renesas R-Car - Add runtime PM support (Valentine Barshak) - Fix rcar_pci_probe() return value check (Wei Yongjun) Synopsys DesignWare - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen) - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen) - Fix missing MSI IRQs (Harro Haan) - Add dw_pcie prefix before cfg_read/write (Pratyush Anand) - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand) - Whitespace cleanup (Jingoo Han) EISA - Call put_device() if device_register() fails (Levente Kurusa) - Revert EISA initialization breakage ((Bjorn Helgaas) Miscellaneous - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger) - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas) - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas) - Use dev_is_pci() to identify PCI devices (Yijing Wang) - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches) - Update documentation 00-INDEX (Erik Ekman) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJS3ujEAAoJEFmIoMA60/r8A4EQAK9AZSUSVNWvlKdC1PrBfT3w 7fVILx5A4KWsOU8eoFwCPQLrgvUtMltg16yN2tbCjqpKEdrVc36biMO9bwhnXSyZ KopHKMWnn0sza/z2H8mcGy+0azGdWcIjcErX/a8WeS6zyWBjm+yzckrHNVpPu4Ca SpCBhfgBMjKyIZyLtP6juFSH34S2DfQex4oUSyPC+gjqPy5wW/xw/kBxZfOXl+yU P9pQT+geMIc31pETMdG9wd/TT+47YAui4ieSggoVxfVrphCXv6S8mOMCMuQc2bAy MHy9uFm1jbvKZZIYrzJ+9HFiiU/6MNiOO3Ygua52xuSp1Zrcjwi2CLD9/QBXbDVs pTKU5JIO7q43llkQUpIXTwBvEApSZRhuqzXegsMAYIg4AWmbfm/2fXkfWlQThYMp J48blAJZ4t0vhMr9usgwbtdBe8F5euExOxpwH0QMCMABbuu8/B3TLm39+LTcIbsw Efgm3N9iUTyiV5fe9Rr62nflhyqXjTevPl4wbZZe4OOdm0MXZY+/BzuNJhg3wyY8 QKz2J3FB6OR7BCLHCp80l50s5+Ih4F5kmOXwFKjT1D1MFRaNaPDmp9BY6TitU6hg zj55gP4c8x6n3alakbf972Yhgs/4oi1va8cZL+pCYWb8nPO5ldaMiT7QBBLUreQV BtDtC7u/AFWJ5e73+jVO =La1R -----END PGP SIGNATURE----- Merge tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.14 merge window: Resource management - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas) - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu) - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas) - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas) - Enforce bus address limits in resource allocation (Yinghai Lu) - Allocate 64-bit BARs above 4G when possible (Yinghai Lu) - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu) PCI device hotplug - Major rescan/remove locking update (Rafael J. Wysocki) - Make ioapic builtin only (not modular) (Yinghai Lu) - Fix release/free issues (Yinghai Lu) - Clean up pciehp (Bjorn Helgaas) - Announce pciehp slot info during enumeration (Bjorn Helgaas) MSI - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev) - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev) - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev) - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman) - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong) SR-IOV - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao) Virtualization - Add support for save/restore of extended capabilities (Alex Williamson) - Add Virtual Channel to save/restore support (Alex Williamson) - Never treat a VF as a multifunction device (Alex Williamson) - Add pci_try_reset_function(), et al (Alex Williamson) AER - Ignore non-PCIe error sources (Betty Dall) - Support ACPI HEST error sources for domains other than 0 (Betty Dall) - Consolidate HEST error source parsers (Bjorn Helgaas) - Add a TLP header print helper (Borislav Petkov) Freescale i.MX6 - Remove unnecessary code (Fabio Estevam) - Make reset-gpio optional (Marek Vasut) - Report "link up" only after link training completes (Marek Vasut) - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut) - Fix PCIe startup code (Richard Zhu) Marvell MVEBU - Remove duplicate of_clk_get_by_name() call (Andrew Lunn) - Drop writes to bridge Secondary Status register (Jason Gunthorpe) - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe) - Support a bridge with no IO port window (Jason Gunthorpe) - Use max_t() instead of max(resource_size_t,) (Jingoo Han) - Remove redundant of_match_ptr (Sachin Kamat) - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni) NVIDIA Tegra - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower) Renesas R-Car - Add runtime PM support (Valentine Barshak) - Fix rcar_pci_probe() return value check (Wei Yongjun) Synopsys DesignWare - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen) - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen) - Fix missing MSI IRQs (Harro Haan) - Add dw_pcie prefix before cfg_read/write (Pratyush Anand) - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand) - Whitespace cleanup (Jingoo Han) EISA - Call put_device() if device_register() fails (Levente Kurusa) - Revert EISA initialization breakage ((Bjorn Helgaas) Miscellaneous - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger) - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas) - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas) - Use dev_is_pci() to identify PCI devices (Yijing Wang) - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches) - Update documentation 00-INDEX (Erik Ekman)" * tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (119 commits) Revert "EISA: Initialize device before its resources" Revert "EISA: Log device resources in dmesg" vfio-pci: Use pci "try" reset interface PCI: Check parent kobject in pci_destroy_dev() xen/pcifront: Use global PCI rescan-remove locking powerpc/eeh: Use global PCI rescan-remove locking PCI: Fix pci_check_and_unmask_intx() comment typos PCI: Add pci_try_reset_function(), pci_try_reset_slot(), pci_try_reset_bus() MPT / PCI: Use pci_stop_and_remove_bus_device_locked() platform / x86: Use global PCI rescan-remove locking PCI: hotplug: Use global PCI rescan-remove locking pcmcia: Use global PCI rescan-remove locking ACPI / hotplug / PCI: Use global PCI rescan-remove locking ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug PCI: Add global pci_lock_rescan_remove() PCI: Cleanup pci.h whitespace PCI: Reorder so actual code comes before stubs PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0 ACPICA: Add helper macros to extract bus/segment numbers from HEST table. PCI: Make local functions static ... |
|
Linus Torvalds | 60eaa0190f |
This pull request has a new feature to ftrace, namely the trace event
triggers by Tom Zanussi. A trigger is a way to enable an action when an event is hit. The actions are: o trace on/off - enable or disable tracing o snapshot - save the current trace buffer in the snapshot o stacktrace - dump the current stack trace to the ringbuffer o enable/disable events - enable or disable another event Namhyung Kim added updates to the tracing uprobes code. Having the uprobes add support for fetch methods. The rest are various bug fixes with the new code, and minor ones for the old code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQEcBAABAgAGBQJS3Z9fAAoJEKQekfcNnQGuFf0H/0CteaN+BJjpif6Tnxia15Sp pcftzU0lgqfNzsfitmbjiVTgXWqCghoZo8UI9tQZvBZ9wmDIxeXQR73uoBgVlSCQ ovyBO/R8r+lq+7EsDCwntZvrLbcdn6s/jzoruRvt7r35ghK5pH81DNR1BOzTQBhW x+361Xtc13aok7N7JN8KR96VDUP9f8KU6PWqJ5lgS2Zl+wbVw6b0p8OV8IMCHczP MdYrx8y4Jv4QWW7rMShAAVBe9qJQ56JWiWA17ysa4kY8BkKQ7QtlEFr+r1YY0nX5 67brXiL8u0NFzRx5y2VRpGc25BbImnVBFpoLQ5Itluq9OdZE3aOQubzXlY70R6g= =Hkho -----END PGP SIGNATURE----- Merge tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a new feature to ftrace, namely the trace event triggers by Tom Zanussi. A trigger is a way to enable an action when an event is hit. The actions are: o trace on/off - enable or disable tracing o snapshot - save the current trace buffer in the snapshot o stacktrace - dump the current stack trace to the ringbuffer o enable/disable events - enable or disable another event Namhyung Kim added updates to the tracing uprobes code. Having the uprobes add support for fetch methods. The rest are various bug fixes with the new code, and minor ones for the old code" * tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits) tracing: Fix buggered tee(2) on tracing_pipe tracing: Have trace buffer point back to trace_array ftrace: Fix synchronization location disabling and freeing ftrace_ops ftrace: Have function graph only trace based on global_ops filters ftrace: Synchronize setting function_trace_op with ftrace_trace_function tracing: Show available event triggers when no trigger is set tracing: Consolidate event trigger code tracing: Fix counter for traceon/off event triggers tracing: Remove double-underscore naming in syscall trigger invocations tracing/kprobes: Add trace event trigger invocations tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT tracing/uprobes: Add @+file_offset fetch method uprobes: Allocate ->utask before handler_chain() for tracing handlers tracing/uprobes: Add support for full argument access methods tracing/uprobes: Fetch args before reserving a ring buffer tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg() tracing/probes: Implement 'memory' fetch method for uprobes tracing/probes: Add fetch{,_size} member into deref fetch method tracing/probes: Move 'symbol' fetch method to kprobes tracing/probes: Implement 'stack' fetch method for uprobes ... |
|
Dongmao Zhang | 5066a4df1f |
dm log userspace: allow mark requests to piggyback on flush requests
In the cluster evironment, cluster write has poor performance because userspace_flush() has to contact a userspace program (cmirrord) for clear/mark/flush requests. But both mark and flush requests require cmirrord to communicate the message to all the cluster nodes for each flush call. This behaviour is really slow. To address this we now merge mark and flush requests together to reduce the kernel-userspace-kernel time. We allow a new directive, "integrated_flush" that can be used to instruct the kernel log code to combine flush and mark requests when directed by userspace. If not directed by userspace (due to an older version of the userspace code perhaps), the kernel will function as it did previously - preserving backwards compatibility. Additionally, flush requests are performed lazily when only clear requests exist. Signed-off-by: Dongmao Zhang <dmzhang@suse.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> |
|
Linus Torvalds | df32e43a54 |
Merge branch 'akpm' (incoming from Andrew)
Merge first patch-bomb from Andrew Morton: - a couple of misc things - inotify/fsnotify work from Jan - ocfs2 updates (partial) - about half of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm/migrate: remove unused function, fail_migrate_page() mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages mm/migrate: correct failure handling if !hugepage_migration_support() mm/migrate: add comment about permanent failure path mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure mm: compaction: reset scanner positions immediately when they meet mm: compaction: do not mark unmovable pageblocks as skipped in async compaction mm: compaction: detect when scanners meet in isolate_freepages mm: compaction: reset cached scanner pfn's before reading them mm: compaction: encapsulate defer reset logic mm: compaction: trace compaction begin and end memcg, oom: lock mem_cgroup_print_oom_info sched: add tracepoints related to NUMA task migration mm: numa: do not automatically migrate KSM pages mm: numa: trace tasks that fail migration due to rate limiting mm: numa: limit scope of lock for NUMA migrate rate limiting mm: numa: make NUMA-migrate related functions static lib/show_mem.c: show num_poisoned_pages when oom mm/hwpoison: add '#' to hwpoison_inject mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter ... |
|
Linus Torvalds | fbd918a202 |
Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo: "Support for some new embedded controllers. A couple late (<= a week) fixes have stable cc'd and one patch ("SATA: MV: Add support for the optional PHYs") got committed yesterday because otherwise the resulting kernel would fail boot on an embedded board due to interdependent changes in its platform tree. Other than that, nothing too noteworthy" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: SATA: MV: Add support for the optional PHYs sata-highbank: Remove unnecessary ahci_platform.h include libata: disable LPM for some WD SATA-I devices ARM: mvebu: update the SATA compatible string for Armada 370/XP ata: sata_mv: fix disk hotplug for Armada 370/XP SoCs ata: sata_mv: introduce compatible string "marvell, armada-370-sata" ata: pata_samsung_cf: Remove unused macros ata: pata_samsung_cf: Use devm_ioremap_resource() ata: pata_samsung_cf: Merge pata_samsung_cf.h into pata_samsung_cf.c ata: pata_samsung_cf: Move plat/regs-ata.h to drivers/ata drivers: ata: Mark the function as static in libahci.c drivers: ata: Mark the function ahci_init_interrupts() as static in ahci.c ahci: imx: fix the error handling in imx_ahci_probe() ahci: imx: ahci_imx_softreset() can be static ahci: imx: Add i.MX53 support ahci: imx: Pull out the clock enable/disable calls libata, dt: Document sata_rcar bindings sata_rcar: Add R-Car Gen2 SATA PHY support ahci: mcp89: enter AHCI mode under Apple BIOS emulation ata: libata-eh: Remove unnecessary snprintf arithmetic |
|
Linus Torvalds | f075e0f699 |
Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: "The bulk of changes are cleanups and preparations for the upcoming kernfs conversion. - cgroup_event mechanism which is and will be used only by memcg is moved to memcg. - pidlist handling is updated so that it can be served by seq_file. Also, the list is not sorted if sane_behavior. cgroup documentation explicitly states that the file is not sorted but it has been for quite some time. - All cgroup file handling now happens on top of seq_file. This is to prepare for kernfs conversion. In addition, all operations are restructured so that they map 1-1 to kernfs operations. - Other cleanups and low-pri fixes" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits) cgroup: trivial style updates cgroup: remove stray references to css_id doc: cgroups: Fix typo in doc/cgroups cgroup: fix fail path in cgroup_load_subsys() cgroup: fix missing unlock on error in cgroup_load_subsys() cgroup: remove for_each_root_subsys() cgroup: implement for_each_css() cgroup: factor out cgroup_subsys_state creation into create_css() cgroup: combine css handling loops in cgroup_create() cgroup: reorder operations in cgroup_create() cgroup: make for_each_subsys() useable under cgroup_root_mutex cgroup: css iterations and css_from_dir() are safe under cgroup_mutex cgroup: unify pidlist and other file handling cgroup: replace cftype->read_seq_string() with cftype->seq_show() cgroup: attach cgroup_open_file to all cgroup files cgroup: generalize cgroup_pidlist_open_file cgroup: unify read path so that seq_file is always used cgroup: unify cgroup_write_X64() and cgroup_write_string() cgroup: remove cftype->read(), ->read_map() and ->write() hugetlb_cgroup: convert away from cftype->read() ... |
|
Linus Torvalds | 2182c815f3 |
The main topics this time are allocation, in the form of Bob's
improvements when searching resource groups and several updates to quotas which should increase scalability. The quota changes follow on from those in the last merge window, and there will likely be further work to come in this area in due course. There are also a few patches which help to improve efficiency of adding entries into directories, and clean up some of that code. One on-disk change is included this time, which is to write some additional information which should be useful to fsck and also potentially for debugging. Other than that, its just a few small random bug fixes and clean ups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJS3RZUAAoJEMrg3m4a/8jS/C4QAKAoGbGH8UxG2fNRUWK94UBL lxxn0nUaQ7bwKpIbysEO6HHg2rqcfjRSdO/H6kfD51Fgg39RfIfvXF9woJw/5Is6 1xcn3faBl3dQhIq81+lSKnEJnJbQ49dbIcgxt1FGqPI8ZY6st/CBvnvqh0TRAcvN 7t0TYSPxmA/7BnwynstTbpK2l9uUisau8z12QzihJ6lRiR7QlfFSUq6Xs/ICSmkE LE2dC6gXMIIDyMpPX5Fg9NQNEAPKOzMwduUaHnv2vpuh1jkk3ObHg4mUkYtdoUZx Hbs1Ey5oqRiMs03I5wm1ayZg+6i/pT7HTM5vbrunTa21AMmJ0NTFdZ4ihVZqZwu4 ynpXY0OT/U7CpFyx6FSKtlq0kzwx5A51mL9bpSlLU9pggJY6NHQdmSJ9fdLW9cdS Wwi2Br/0rBvIhm0ejy8kxK904DhqCGsH+RXbfOc8J4VBE4R/z5eMmBG9UqMm7c+7 hO6FhpJo9QsNMe19A8IANzG2gT+PcpgrXs2GTS1QFGhFgn08+k3RLDT/SrGRxU6r MaCM9rms0SgMr2LX6MEPZdo3LJgBTNr1rKDEIzQ7BUXs3FHJ7U0swuHOTDXFVm51 gQanPYmVvFV7aiom3ExUhegDefBu1jMnY26SVxxLAbWN1Ek2Ikg6rO6wx3ckXvYo SKY3qAewE5pNpHe7u6SS =/tv0 -----END PGP SIGNATURE----- Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 updates from Steven Whitehouse: "The main topics this time are allocation, in the form of Bob's improvements when searching resource groups and several updates to quotas which should increase scalability. The quota changes follow on from those in the last merge window, and there will likely be further work to come in this area in due course. There are also a few patches which help to improve efficiency of adding entries into directories, and clean up some of that code. One on-disk change is included this time, which is to write some additional information which should be useful to fsck and also potentially for debugging. Other than that, its just a few small random bug fixes and clean ups" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits) GFS2: revert "GFS2: d_splice_alias() can't return error" GFS2: Small cleanup GFS2: Don't use ENOBUFS when ENOMEM is the correct error code GFS2: Fix kbuild test robot reported warning GFS2: Move quota bitmap operations under their own lock GFS2: Clean up quota slot allocation GFS2: Only run logd and quota when mounted read/write GFS2: Use RCU/hlist_bl based hash for quotas GFS2: No need to invalidate pages for a dio read GFS2: Add initialization for address space in super block GFS2: Add hints to directory leaf blocks GFS2: For exhash conversion, only one block is needed GFS2: Increase i_writecount during gfs2_setattr_chown GFS2: Remember directory insert point GFS2: Consolidate transaction blocks calculation for dir add GFS2: Add directory addition info structure GFS2: Use only a single address space for rgrps GFS2: Use range based functions for rgrp sync/invalidation GFS2: Remove test which is always true GFS2: Remove gfs2_quota_change_host structure ... |
|
Joonsoo Kim | 78d5506e82 |
mm/migrate: remove unused function, fail_migrate_page()
fail_migrate_page() isn't used anywhere, so remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 59c82b70dc |
mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
Some part of putback_lru_pages() and putback_movable_pages() is duplicated, so it could confuse us what we should use. We can remove putback_lru_pages() since it is not really needed now. This makes us undestand and maintain the code more easily. And comment on putback_movable_pages() is stale now, so fix it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Vlastimil Babka | de6c60a6c1 |
mm: compaction: encapsulate defer reset logic
Currently there are several functions to manipulate the deferred compaction state variables. The remaining case where the variables are touched directly is when a successful allocation occurs in direct compaction, or is expected to be successful in the future by kswapd. Here, the lowest order that is expected to fail is updated, and in the case of successful allocation, the deferred status and counter is reset completely. Create a new function compaction_defer_reset() to encapsulate this functionality and make it easier to understand the code. No functional change. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Mel Gorman | 0eb927c0ab |
mm: compaction: trace compaction begin and end
The broad goal of the series is to improve allocation success rates for huge pages through memory compaction, while trying not to increase the compaction overhead. The original objective was to reintroduce capturing of high-order pages freed by the compaction, before they are split by concurrent activity. However, several bugs and opportunities for simple improvements were found in the current implementation, mostly through extra tracepoints (which are however too ugly for now to be considered for sending). The patches mostly deal with two mechanisms that reduce compaction overhead, which is caching the progress of migrate and free scanners, and marking pageblocks where isolation failed to be skipped during further scans. Patch 1 (from mgorman) adds tracepoints that allow calculate time spent in compaction and potentially debug scanner pfn values. Patch 2 encapsulates the some functionality for handling deferred compactions for better maintainability, without a functional change type is not determined without being actually needed. Patch 3 fixes a bug where cached scanner pfn's are sometimes reset only after they have been read to initialize a compaction run. Patch 4 fixes a bug where scanners meeting is sometimes not properly detected and can lead to multiple compaction attempts quitting early without doing any work. Patch 5 improves the chances of sync compaction to process pageblocks that async compaction has skipped due to being !MIGRATE_MOVABLE. Patch 6 improves the chances of sync direct compaction to actually do anything when called after async compaction fails during allocation slowpath. The impact of patches were validated using mmtests's stress-highalloc benchmark with mmtests's stress-highalloc benchmark on a x86_64 machine with 4GB memory. Due to instability of the results (mostly related to the bugs fixed by patches 2 and 3), 10 iterations were performed, taking min,mean,max values for success rates and mean values for time and vmstat-based metrics. First, the default GFP_HIGHUSER_MOVABLE allocations were tested with the patches stacked on top of v3.13-rc2. Patch 2 is OK to serve as baseline due to no functional changes in 1 and 2. Comments below. stress-highalloc 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp Success 1 Min 9.00 ( 0.00%) 10.00 (-11.11%) 43.00 (-377.78%) 43.00 (-377.78%) 33.00 (-266.67%) Success 1 Mean 27.50 ( 0.00%) 25.30 ( 8.00%) 45.50 (-65.45%) 45.90 (-66.91%) 46.30 (-68.36%) Success 1 Max 36.00 ( 0.00%) 36.00 ( 0.00%) 47.00 (-30.56%) 48.00 (-33.33%) 52.00 (-44.44%) Success 2 Min 10.00 ( 0.00%) 8.00 ( 20.00%) 46.00 (-360.00%) 45.00 (-350.00%) 35.00 (-250.00%) Success 2 Mean 26.40 ( 0.00%) 23.50 ( 10.98%) 47.30 (-79.17%) 47.60 (-80.30%) 48.10 (-82.20%) Success 2 Max 34.00 ( 0.00%) 33.00 ( 2.94%) 48.00 (-41.18%) 50.00 (-47.06%) 54.00 (-58.82%) Success 3 Min 65.00 ( 0.00%) 63.00 ( 3.08%) 85.00 (-30.77%) 84.00 (-29.23%) 85.00 (-30.77%) Success 3 Mean 76.70 ( 0.00%) 70.50 ( 8.08%) 86.20 (-12.39%) 85.50 (-11.47%) 86.00 (-12.13%) Success 3 Max 87.00 ( 0.00%) 86.00 ( 1.15%) 88.00 ( -1.15%) 87.00 ( 0.00%) 87.00 ( 0.00%) 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp User 6437.72 6459.76 5960.32 5974.55 6019.67 System 1049.65 1049.09 1029.32 1031.47 1032.31 Elapsed 1856.77 1874.48 1949.97 1994.22 1983.15 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp Minor Faults 253952267 254581900 250030122 250507333 250157829 Major Faults 420 407 506 530 530 Swap Ins 4 9 9 6 6 Swap Outs 398 375 345 346 333 Direct pages scanned 197538 189017 298574 287019 299063 Kswapd pages scanned 1809843 1801308 1846674 1873184 1861089 Kswapd pages reclaimed 1806972 1798684 1844219 1870509 1858622 Direct pages reclaimed 197227 188829 298380 286822 298835 Kswapd efficiency 99% 99% 99% 99% 99% Kswapd velocity 953.382 970.449 952.243 934.569 922.286 Direct efficiency 99% 99% 99% 99% 99% Direct velocity 104.058 101.832 153.961 143.200 148.205 Percentage direct scans 9% 9% 13% 13% 13% Zone normal velocity 347.289 359.676 348.063 339.933 332.983 Zone dma32 velocity 710.151 712.605 758.140 737.835 737.507 Zone dma velocity 0.000 0.000 0.000 0.000 0.000 Page writes by reclaim 557.600 429.000 353.600 426.400 381.800 Page writes file 159 53 7 79 48 Page writes anon 398 375 345 346 333 Page reclaim immediate 825 644 411 575 420 Sector Reads 2781750 2769780 2878547 2939128 2910483 Sector Writes 12080843 12083351 12012892 12002132 12010745 Page rescued immediate 0 0 0 0 0 Slabs scanned |
|
Mel Gorman | 286549dcaf |
sched: add tracepoints related to NUMA task migration
This patch adds three tracepoints o trace_sched_move_numa when a task is moved to a node o trace_sched_swap_numa when a task is swapped with another task o trace_sched_stick_numa when a numa-related migration fails The tracepoints allow the NUMA scheduler activity to be monitored and the following high-level metrics can be calculated o NUMA migrated stuck nr trace_sched_stick_numa o NUMA migrated idle nr trace_sched_move_numa o NUMA migrated swapped nr trace_sched_swap_numa o NUMA local swapped trace_sched_swap_numa src_nid == dst_nid (should never happen) o NUMA remote swapped trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped) o NUMA group swapped trace_sched_swap_numa src_ngid == dst_ngid Maybe a small number of these are acceptable but a high number would be a major surprise. It would be even worse if bounces are frequent. o NUMA avg task migs. Average number of migrations for tasks o NUMA stddev task mig Self-explanatory o NUMA max task migs. Maximum number of migrations for a single task In general the intent of the tracepoints is to help diagnose problems where automatic NUMA balancing appears to be doing an excessive amount of useless work. [akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Mel Gorman | af1839d722 |
mm: numa: trace tasks that fail migration due to rate limiting
A low local/remote numa hinting fault ratio is potentially explained by failed migrations. This patch adds a tracepoint that fires when migration fails due to migration rate limitation. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Mel Gorman | 1c5e9c27cb |
mm: numa: limit scope of lock for NUMA migrate rate limiting
NUMA migrate rate limiting protects a migration counter and window using a lock but in some cases this can be a contended lock. It is not critical that the number of pages be perfect, lost updates are acceptable. Reduce the importance of this lock. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Santosh Shilimkar | 26f09e9b3a |
mm/memblock: add memblock memory allocation apis
Introduce memblock memory allocation APIs which allow to support PAE or LPAE extension on 32 bits archs where the physical memory start address can be beyond 4GB. In such cases, existing bootmem APIs which operate on 32 bit addresses won't work and needs memblock layer which operates on 64 bit addresses. So we add equivalent APIs so that we can replace usage of bootmem with memblock interfaces. Architectures already converted to NO_BOOTMEM use these new memblock interfaces. The architectures which are still not converted to NO_BOOTMEM continue to function as is because we still maintain the fal lback option of bootmem back-end supporting these new interfaces. So no functional change as such. In long run, once all the architectures moves to NO_BOOTMEM, we can get rid of bootmem layer completely. This is one step to remove the core code dependency with bootmem and also gives path for architectures to move away from bootmem. The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK and CONFIG_NO_BOOTMEM are specified by arch. In case !CONFIG_NO_BOOTMEM, the memblock() wrappers will fallback to the existing bootmem apis so that arch's not converted to NO_BOOTMEM continue to work as is. The meaning of MEMBLOCK_ALLOC_ACCESSIBLE and MEMBLOCK_ALLOC_ANYWHERE is kept same. [akpm@linux-foundation.org: s/depricated/deprecated/] Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Grygorii Strashko | b115423357 |
mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODES
It's recommended to use NUMA_NO_NODE everywhere to select "process any node" behavior or to indicate that "no node id specified". Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct corresponding API's documentation to describe new behavior. Also, update other memblock/nobootmem APIs where MAX_NUMNODES is used dirrectly. The change was suggested by Tejun Heo. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Grygorii Strashko | 87029ee939 |
mm/memblock: reorder parameters of memblock_find_in_range_node
Reorder parameters of memblock_find_in_range_node to be consistent with other memblock APIs. The change was suggested by Tejun Heo <tj@kernel.org>. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Grygorii Strashko | 10e89523bf |
mm/bootmem: remove duplicated declaration of __free_pages_bootmem()
The __free_pages_bootmem is used internally by MM core and already defined in internal.h. So, remove duplicated declaration. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Oleg Nesterov | 0c740d0afc |
introduce for_each_thread() to replace the buggy while_each_thread()
while_each_thread() and next_thread() should die, almost every lockless usage is wrong. 1. Unless g == current, the lockless while_each_thread() is not safe. while_each_thread(g, t) can loop forever if g exits, next_thread() can't reach the unhashed thread in this case. Note that this can happen even if g is the group leader, it can exec. 2. Even if while_each_thread() itself was correct, people often use it wrongly. It was never safe to just take rcu_read_lock() and loop unless you verify that pid_alive(g) == T, even the first next_thread() can point to the already freed/reused memory. This patch adds signal_struct->thread_head and task->thread_node to create the normal rcu-safe list with the stable head. The new for_each_thread(g, t) helper is always safe under rcu_read_lock() as long as this task_struct can't go away. Note: of course it is ugly to have both task_struct->thread_node and the old task_struct->thread_group, we will kill it later, after we change the users of while_each_thread() to use for_each_thread(). Perhaps we can kill it even before we convert all users, we can reimplement next_thread(t) using the new thread_head/thread_node. But we can't do this right now because this will lead to subtle behavioural changes. For example, do/while_each_thread() always sees at least one task, while for_each_thread() can do nothing if the whole thread group has died. Or thread_group_empty(), currently its semantics is not clear unless thread_group_leader(p) and we need to audit the callers before we can change it. So this patch adds the new interface which has to coexist with the old one for some time, hopefully the next changes will be more or less straightforward and the old one will go away soon. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Sergey Dyasly <dserrg@gmail.com> Tested-by: Sergey Dyasly <dserrg@gmail.com> Reviewed-by: Sameer Nanda <snanda@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 9f32624be9 |
mm/rmap: use rmap_walk() in page_referenced()
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in page_referenced(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> page_referenced_ksm, page_referenced_anon, page_referenced_file 2. introduce new struct page_referenced_arg and pass it to page_referenced_one(), main function of rmap_walk, in order to count reference, to store vm_flags and to check finish condition. 3. mechanical change to use rmap_walk() in page_referenced(). [liwanp@linux.vnet.ibm.com: fix BUG at rmap_walk] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | e8351ac9bf |
mm/rmap: use rmap_walk() in try_to_munlock()
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_munlock(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> try_to_unmap_ksm, try_to_unmap_anon, try_to_unmap_file 2. mechanical change to use rmap_walk() in try_to_munlock(). 3. copy and paste comments. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 5262950642 |
mm/rmap: use rmap_walk() in try_to_unmap()
Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_unmap(). In this patch, I change following things. 1. enable rmap_walk() if !CONFIG_MIGRATION. 2. mechanical change to use rmap_walk() in try_to_unmap(). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 0dd1c7bbce |
mm/rmap: extend rmap_walk_xxx() to cope with different cases
There are a lot of common parts in traversing functions, but there are also a little of uncommon parts in it. By assigning proper function pointer on each rmap_walker_control, we can handle these difference correctly. Following are differences we should handle. 1. difference of lock function in anon mapping case 2. nonlinear handling in file mapping case 3. prechecked condition: checking memcg in page_referenced(), checking VM_SHARE in page_mkclean() checking temporary vma in try_to_unmap() 4. exit condition: checking page_mapped() in try_to_unmap() So, in this patch, I introduce 4 function pointers to handle above differences. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Joonsoo Kim | 051ac83adf |
mm/rmap: make rmap_walk to get the rmap_walk_control argument
In each rmap traverse case, there is some difference so that we need function pointers and arguments to them in order to handle these For this purpose, struct rmap_walk_control is introduced in this patch, and will be extended in following patch. Introducing and extending are separate, because it clarify changes. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Tang Chen | 55ac590c2f |
memblock, mem_hotplug: make memblock skip hotpluggable regions if needed
Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable memory used by the kernel won't be able to be hot-removed. To solve this problem, the basic idea is to prevent memblock from allocating hotpluggable memory for the kernel at early time, and arrange all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing zones. In the previous patches, we have marked hotpluggable memory regions with MEMBLOCK_HOTPLUG flag in memblock.memory. In this patch, we make memblock skip these hotpluggable memory regions in the default top-down allocation function if movable_node boot option is specified. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Tang Chen | e7e8de5918 |
memblock: make memblock_set_node() support different memblock_type
[sfr@canb.auug.org.au: fix powerpc build] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Tang Chen | 66b16edf9e |
memblock, mem_hotplug: introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions
In find_hotpluggable_memory, once we find out a memory region which is hotpluggable, we want to mark them in memblock.memory. So that we could control memblock allocator not to allocte hotpluggable memory for the kernel later. To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the hotpluggable memory regions in memblock and a function memblock_mark_hotplug() to mark hotpluggable memory if we find one. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Tang Chen | 66a2075721 |
memblock, numa: introduce flags field into memblock
There is no flag in memblock to describe what type the memory is. Sometimes, we may use memblock to reserve some memory for special usage. And we want to know what kind of memory it is. So we need a way to In hotplug environment, we want to reserve hotpluggable memory so the kernel won't be able to use it. And when the system is up, we have to free these hotpluggable memory to buddy. So we need to mark these memory first. In order to do so, we need to mark out these special memory in memblock. In this patch, we introduce a new "flags" member into memblock_region: struct memblock_region { phys_addr_t base; phys_addr_t size; unsigned long flags; /* This is new. */ #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int nid; #endif }; This patch does the following things: 1) Add "flags" member to memblock_region. 2) Modify the following APIs' prototype: memblock_add_region() memblock_insert_region() 3) Add memblock_reserve_region() to support reserve memory with flags, and keep memblock_reserve()'s prototype unmodified. 4) Modify other APIs to support flags, but keep their prototype unmodified. The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>. Suggested-by: Wen Congyang <wency@cn.fujitsu.com> Suggested-by: Liu Jiang <jiang.liu@huawei.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jerome Marchand | 49f0ce5f92 |
mm: add overcommit_kbytes sysctl variable
Some applications that run on HPC clusters are designed around the availability of RAM and the overcommit ratio is fine tuned to get the maximum usage of memory without swapping. With growing memory, the 1%-of-all-RAM grain provided by overcommit_ratio has become too coarse for these workload (on a 2TB machine it represents no less than 20GB). This patch adds the new overcommit_kbytes sysctl variable that allow a much finer grain. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Mel Gorman | aec6a8889a |
mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT
Commit |
|
David Rientjes | d80be7c751 |
mm, mempolicy: remove unneeded functions for UMA configs
Mempolicies only exist for CONFIG_NUMA configurations. Therefore, a certain class of functions are unneeded in configurations where CONFIG_NUMA is disabled such as functions that duplicate existing mempolicies, lookup existing policies, set certain mempolicy traits, or test mempolicies for certain attributes. Remove the unneeded functions so that any future callers get a compile- time error and protect their code with CONFIG_NUMA as required. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kirill A. Shutemov | b35f1819ac |
mm: create a separate slab for page->ptl allocation
If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64 is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab, so we loose 24 on each. An average system can easily allocate few tens thousands of page->ptl and overhead is significant. Let's create a separate slab for page->ptl allocation to solve this. To make sure that it really works this time, some numbers from my test machine (just booted, no load): Before: # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo kmalloc-96 31987 32190 128 30 1 : tunables 120 60 8 : slabdata 1073 1073 92 After: # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo page->ptl 27516 28143 72 53 1 : tunables 120 60 8 : slabdata 531 531 9 kmalloc-96 3853 5280 128 30 1 : tunables 120 60 8 : slabdata 176 176 0 Note that the patch is useful not only for debug case, but also for PREEMPT_RT, where spinlock_t is always bloated. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Yasuaki Ishimatsu | 943dca1a1f |
mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve
Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on 9TB memory machine since onlining memory sections is too slow. And we found out setup_zone_migrate_reserve spent >90% of the time. The problem is, setup_zone_migrate_reserve scans all pageblocks unconditionally, but it is only necessary if the number of reserved block was reduced (i.e. memory hot remove). Moreover, maximum MIGRATE_RESERVE per zone is currently 2. It means that the number of reserved pageblocks is almost always unchanged. This patch adds zone->nr_migrate_reserve_block to maintain the number of MIGRATE_RESERVE pageblocks and it reduces the overhead of setup_zone_migrate_reserve dramatically. The following table shows time of onlining a memory section. Amount of memory | 128GB | 192GB | 256GB| --------------------------------------------- linux-3.12 | 23.9 | 31.4 | 44.5 | This patch | 8.3 | 8.3 | 8.6 | Mel's proposal patch | 10.9 | 19.2 | 31.3 | --------------------------------------------- (millisecond) 128GB : 4 nodes and each node has 32GB of memory 192GB : 6 nodes and each node has 32GB of memory 256GB : 8 nodes and each node has 32GB of memory (*1) Mel proposed his idea by the following threads. https://lkml.org/lkml/2013/10/30/272 [akpm@linux-foundation.org: tweak comment] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Oleg Nesterov | 5eaf1a9e23 |
mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail()
get_huge_page_tail()->compound_head() looks confusing. Every caller must check PageTail(page), otherwise atomic_inc(&page->_mapcount) is simply wrong if this page is compound-trans-head. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Andrea Arcangeli | 44518d2b32 |
mm: tail page refcounting optimization for slab and hugetlbfs
This skips the _mapcount mangling for slab and hugetlbfs pages. The main trouble in doing this is to guarantee that PageSlab and PageHeadHuge remains constant for all get_page/put_page run on the tail of slab or hugetlbfs compound pages. Otherwise if they're set during get_page but not set during put_page, the _mapcount of the tail page would underflow. PageHeadHuge will remain true until the compound page is released and enters the buddy allocator so it won't risk to change even if the tail page is the last reference left on the page. PG_slab instead is cleared before the slab frees the head page with put_page, so if the tail pin is released after the slab freed the page, we would have a problem. But in the slab case the tail pin cannot be the last reference left on the page. This is because the slab code is free to reuse the compound page after a kfree/kmem_cache_free without having to check if there's any tail pin left. In turn all tail pins must be always released while the head is still pinned by the slab code and so we know PG_slab will be still set too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Andrea Arcangeli | ca641514f4 |
mm: thp: optimize compound_trans_huge
Currently we don't clobber page_tail->first_page during split_huge_page, so compound_trans_head can be set to compound_head without adverse effects, and this mostly optimizes away a smp_rmb. It looks worthwhile to keep around the implementation that doesn't relay on page_tail->first_page not to be clobbered, because it would be necessary if we'll decide to enforce page->private to zero at all times whenever PG_private is not set, also for anonymous pages. For anonymous pages enforcing such an invariant doesn't matter as anonymous pages don't use page->private so we can get away with this microoptimization. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Dave Hansen | 0e147aed4c |
mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages
Dave Jiang reported that he was seeing oopses when running NUMA systems and default_hugepagesz=1G. I traced the issue down to migrate_page_copy() trying to use the same code for hugetlb pages and transparent hugepages. It should not have been trying to pass thp pages in there. So, add some VM_BUG_ON()s for the next hapless VM developer that tries the same thing. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Geert Uytterhoeven | f92f455f67 |
mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
{,set}page_address() are macros if WANT_PAGE_VIRTUAL. If !WANT_PAGE_VIRTUAL, they're plain C functions. If someone calls them with a void *, this pointer is auto-converted to struct page * if !WANT_PAGE_VIRTUAL, but causes a build failure on architectures using WANT_PAGE_VIRTUAL (arc, m68k and sparc64): drivers/md/bcache/bset.c: In function `__btree_sort': drivers/md/bcache/bset.c:1190: warning: dereferencing `void *' pointer drivers/md/bcache/bset.c:1190: error: request for member `virtual' in something not a structure or union Convert them to static inline functions to fix this. There are already plenty of users of struct page members inside <linux/mm.h>, so there's no reason to keep them as macros. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Andrew Morton | 0afaa12047 |
posix_acl: uninlining
Uninline vast tracts of nested inline functions in include/linux/posix_acl.h. This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by 8026 bytes. The patch also regularises the positioning of the EXPORT_SYMBOLs in posix_acl.c. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Benny Halevy <bhalevy@primarydata.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |