This is easily triggerable when fuzz-testing as an unprivileged user.
We could rate-limit it, but given we don't print similar messages
for other protocols, I just removed it.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of commit e84de0c619 [MIPS: GIO bus
support for SGI IP22/28] newport con is now taking over console from
dummy con, therefore it's necessary to resize the VC to the correct size
to avoid crashes and garbage on console
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@linux-mips.org
Cc: linux-fbdev@vger.kernel.org
Cc: FlorianSchandinat@gmx.de
Patchwork: https://patchwork.linux-mips.org/patch/4138/
Acked-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/wrppmc/pci.o
/home/ralf/src/linux/linux-mips/arch/mips/wrppmc/pci.c: In function ‘gt64120_pci_init’:
/home/ralf/src/linux/linux-mips/arch/mips/wrppmc/pci.c:41:6: error: variable ‘tmp’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
This warning exists in gcc 4.6.0 and newer. Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/sgi-ip22/ip22-eisa.o
/home/ralf/src/linux/linux-mips/arch/mips/sgi-ip22/ip22-eisa.c: In function ‘ip22_eisa_intr’:
/home/ralf/src/linux/linux-mips/arch/mips/sgi-ip22/ip22-eisa.c:77:11: error: variable ‘dma2’ set but not used [-Werror=unused-but-set-variable]
/home/ralf/src/linux/linux-mips/arch/mips/sgi-ip22/ip22-eisa.c:77:5: error: variable ‘dma1’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
This warning exists in gcc 4.6.0 and newer. Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/rb532/prom.o
/home/ralf/src/linux/linux-mips/arch/mips/rb532/prom.c: In function ‘prom_setup_cmdline’:
/home/ralf/src/linux/linux-mips/arch/mips/rb532/prom.c:75:22: error: variable ‘prom_envp’ set but not used [-Werror=unused-but-set-variable]
This warning exists in gcc 4.6.0 and newer. Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/powertv/init.o
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c: In function ‘mips_nmi_setup’:
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c:80:8: error: variable ‘base’ set but not used [-Werror=unused-but-set-variable]
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c: In function ‘mips_ejtag_setup’:
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c:94:8: error: variable ‘base’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
As these two functions are, they don't serve any useful purpose so I've
deleted them entirely.
This warning exists in gcc 4.6.0 and newer. Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
I had no idea just how broken IOC3 was until I read this.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
CC arch/mips/mm/highmem.o
/home/ralf/src/linux/linux-mips/arch/mips/mm/highmem.c: In function ‘__kunmap_atomic’:
/home/ralf/src/linux/linux-mips/arch/mips/mm/highmem.c:70:6: error: variable ‘type’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
This warning exists in gcc 4.6.0 and newer. Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Normally r4k_dma_cache_inv should only ever be called with cacheline
aligned addresses. If however, it isn't there is the theoretical
possibility of data corruption. There is no correct way of handling this
and anyway, it should only happen if the DMA API is used incorrectly
so drop
There is a different corruption scenario with these CACHE instructions
removed but again there is no way of handling this correctly and it can
be triggered only through incorrect use of the DMA API.
So just get rid of the complexity.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: James Rodriguez <jamesr@juniper.net>
The default implementation of 'cpu_has_fpu' macro calls
smp_processor_id() which causes this warning to be printed when
preemption is enabled:
[ 4.664000] Algorithmics/MIPS FPU Emulator v1.5
[ 4.676000] BUG: using smp_processor_id() in preemptible [00000000] code: ini
[ 4.700000] caller is fpu_emulator_cop1Handler+0x434/0x27b8
This problem got introduced in November 2009 by
af1d2af877ef6c36990671bc86a5b9c5bb50b1da (lmo) [MIPS: Fix emulation of
64-bit FPU on 64-bit CPUs.] rsp. da0bac3341
(kernel.org) [MIPS: Fix emulation of 64-bit FPU on FPU-less 64-bit CPUs.]
in 2.6.32.
Fixed by rewriting cop1_64bit() to return a constant whenever possible
but most importantly avoid the use pf cpu_has_fpu entirely.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Jayachandran C <jchandra@broadcom.com>
Initial-patch-by: Jayachandran C <jchandra@broadcom.com>
Patchwork: https://patchwork.linux-mips.org/patch/4225/
Our FP emulator is hardcoded for the MIPS IV FP instruction set and does
not match the FP ISA with the general ISA. However for the few MIPS IV FP
instructions that use the COP1X major opcode it relies on the Coprocessor
Unusable exception to be delivered as a COP1 rather than COP3 exception.
This includes indexed transfer (LDXC1, etc.) and FP multiply-accumulate
(MADD.D, etc.) instructions.
All the MIPS I, II, III and IV processors and some newer chips that do not
implement the FPU use the COP3 exception however. Therefore I believe the
kernel should follow and redirect any COP3 Unusable traps to the emulator
unless an actual FPU part or core is present.
This is a change that implements it. Any minor opcode encodings that are
not recognised as valid FP instructions are rejected by the emulator and
will result in a SIGILL signal being delivered as they currently do. We
do not support vendor-specific coprocessor 3 implementations supported
with MIPS I and MIPS II ISA processors; we never set CP0.Status.CU3.
[Ralf: On MIPS IV processors the kernel always enables the XX bit which
replaces the CU3 bit off earlier architecture revisions.]
If matching between the CPU and the FPU ISA is considered required one
day, this can still be done in the emulator itself. I think the CpU
exception dispatcher is not the right place to do this anyway, as there
are further differences between MIPS I, MIPS II, MIPS III, MIPS IV and
MIPS32 FP ISAs.
Corresponding explanation of this implementation is included within the
change itself.
Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/project/linux-mips/list/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When poweroff machine, kernel_power_off() call disable_nonboot_cpus().
And if we have HOTPLUG_CPU configured, disable_nonboot_cpus() is not an
empty function but attempt to actually disable the nonboot cpus. Since
system state is SYSTEM_POWER_OFF, play_dead() won't be called and thus
disable_nonboot_cpus() hangs. Therefore, we make this patch to avoid
poweroff failure.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Hongliang Tao <taohl@lemote.com>
Signed-off-by: Hua Yan <yanh@lemote.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4211/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled, plain integer checking
between different uids/gids is explicitely turned into a build failure
by making the k{uid,gid}_t types a structure containing a value:
arch/mips/kernel/mips-mt-fpaff.c: In function 'check_same_owner':
arch/mips/kernel/mips-mt-fpaff.c:53:22: error: invalid operands to
binary == (have 'kuid_t' and 'kuid_t')
arch/mips/kernel/mips-mt-fpaff.c:54:15: error: invalid operands to
binary == (have 'kuid_t' and 'kuid_t')
In order to ensure proper comparison between uids, using the helper
function uid_eq() which performs the right thing whenever this config
option is turned on or off.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4717/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This header was added in commit 39b8d52542
(kernel.org) / b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo( ([MIPS] Add
support for MIPS CMP platform.). None of the functions it declared were
ever included in the tree. Commit cb7f39d2bc
(kernel.org) / b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo) [MIPS] Remove
unused maltasmp.h.] removeed the sole file that included it because that
file was itself unused.
[ralf@linux-mips.org: The whole mess happened because somebody at MIPS
thought it was a good idea to rename VSMP ("Vitual SMP") to SMVP. Which
is an IBMeque ETLA in contrast to VSMP, so public kernels as opposed to
MTI's inhouse kernels never followed suit.]
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3950/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Some initialization errors are reported with the existing OCTEON EDAC
support patch. Also some parts have more than one memory controller.
Fix the errors and add multiple controllers if present.
Signed-off-by: David Daney <david.daney@cavium.com>
The patch needs to eliminate the definition of OCTEON_IRQ_BOOTDMA so
that the device tree code can map the interrupt, so in order to not
temporarily break things, we do a single patch to both the interrupt
registration code and the pata_octeon_cf driver.
Also rolled in is a conversion to use hrtimers and corrections to the
timing calculations.
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Manuel Lauss <manuel.lauss@gmail.com> writes:
I introduced it as a fallback because early revisions of Alchemy hardware
we shipped had a non-functional 32kHz timer and had to rely on the r4k
timer instead. Previously the r4k timer was initialized regardless, but
it's useless with the "wait" instruction.
So long story short: I need either the on-chip 32kHz timer OR the r4k
timer if the 32kHz one is unusable, but not both, and r4k timer is useless
when au1k_idle is in use.
The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm
not against removing R4K_LIB symbols.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Manuel Lauss <manuel.lauss@gmail.com> writes:
I introduced it as a fallback because early revisions of Alchemy hardware
we shipped had a non-functional 32kHz timer and had to rely on the r4k
timer instead. Previously the r4k timer was initialized regardless, but
it's useless with the "wait" instruction.
So long story short: I need either the on-chip 32kHz timer OR the r4k
timer if the 32kHz one is unusable, but not both, and r4k timer is useless
when au1k_idle is in use.
The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm
not against removing R4K_LIB symbols.
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch changes the physmap-flash platform data on AR7 to pass the
correct partition parser: ar7part to used by the "physmap-flash" mapping
driver so we get the partitions probed correctly.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: blogic@openwrt.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4654/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
With the upcoming merge of the ARC architecture there is a small likelyhood
of conflicting use for the CONFIG_ARC config symbol. Rename it to
CONFIG_FW_ARC. Also rename CONFIG_ARC32 to CONFIG_FW_ARC32, CONFIG_ARC64
to CONFIG_FW_ARC64.
For consistence also rename CONFIG_SNIPROM to CONFIG_FW_SNIPROM and
CONFIG_CFE to CONFIG_FW_CFE.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Pull networking changes from David Miller:
1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.
2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.
3) Networking user namespace support from Eric W. Biederman.
4) tuntap/virtio-net multiqueue support by Jason Wang.
5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.
6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.
7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.
8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.
9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.
10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.
11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.
12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.
13) Add "oops_only" mode to netconsole, from Amerigo Wang.
14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.
15) Support PTP in the Tigon3 driver, from Matt Carlson.
16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.
17) Support per-association statistics in SCTP, from Michele
Baldessari.
And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAUMi2WhOxKuMESys7AQK2bxAApJQL2x6/k4swH933rhdVooA2TiMVST3l
XSy6yil6Qeqz82RDnVMfxQ069N8iP5x93fE918V6UzeIrUmKEL8xD2UJCZzjW6B9
vmBNrD6VUGdiBhTcGY7er4EtlnRf1XJgUPmfdIEAJoZ8VMkKyYAGkckW2I8hiYbZ
gyF+ONc+CHxspqS1CzNUmmbP84T6rij2fydqLaSNNnQYnEfICt7dciv73KBQYMtn
AsCLcmWW4DkZ37VL6Bg8yvgRaxbNlZpS0Rl5oKS65rYX9azt/SvujSta0UEv+uYF
m/2HqExwgo8HZHKyIEpRgBLqfOfekJATbSLEq3jEgA73MLdzw2DTgpJQOmWCjtjN
7bROv2O57e8ttxb81x10YyInzOTYOd18XEb2Qa6O4wbB5TS8MxZywfuTfL+sdfsN
pquqyKNgxD7HqqxIcWSNKGxkPPZ/Xk/JmgcQFVCjpvvdCizsFTwWeiAd81Jz0Dn+
SLL345nlDJPVukgIiDiwm9UvkyG0Pg03K5k6+7QOWB/5AdPqgRUeOi6gqQE7ZQ9G
GK8/2xX4xFJ8LLPqfh2X+1PUesa8Dhph4NorsW4comJtPcLuh30XbwIKTjpBF90y
7OILeZeQ+qFu8S9lLSQOr6zxs3/9uKP8ADoOAnFmUEE2PkzvZqMrnrlezAyqtNht
LkVa/IR/z50=
=ex0l
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300
Pull MN10300 changes from David Howells:
"miscellaneous MN10300 arch patches. I've based it on top of Al Viro's
signal tree - so these patches should be pulled after that."
* tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
MN10300: Use asm-generic/pci_iomap.h
MN10300: Get rid of unused variable from ASB2305 PCI code
MN10300: ASB2305 PCI code needs linux/irq.h
mn10300/mm/fault.c: Port OOM changes to do_page_fault
MN10300: Handle cacheable PCI regions in pci_iomap()
MN10300: fix debug polling in ttySM driver
MN10300: ttySM: clean up unnecessary casting
MN10300: fix SMP synchronization between txdma and serial driver
MN10300: fix serial port vdma irq setup for SMP
MN10300: cleanup IRQ affinity setting
MN10300: ttySM: Use memory barriers correctly in circular buffer logic
reserve_bootmem_generic() has no caller,
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_mkwrite is initalized with zero and only set once, from that point
exists no way to get to the oom or oom_free_new labels.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have two different implementation of is_zero_pfn() and my_zero_pfn()
helpers: for architectures with and without zero page coloring.
Let's consolidate them in <asm-generic/pgtable.h>.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the warning from __list_del_entry() which is triggered when a process
tries to do free_huge_page() for a hwpoisoned hugepage.
free_huge_page() can be called for hwpoisoned hugepage from
unpoison_memory(). This function gets refcount once and clears
PageHWPoison, and then puts refcount twice to return the hugepage back to
free pool. The second put_page() finally reaches free_huge_page().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory error handling on hugepages can break a RSS counter, which emits a
message like "Bad rss-counter state mm:ffff88040abecac0 idx:1 val:-1".
This is because PageAnon returns true for hugepage (this behavior is
necessary for reverse mapping to work on hugetlbfs).
[akpm@linux-foundation.org: clean up code layout]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a process which used a hwpoisoned hugepage tries to exit() or
munmap(), the kernel can print out "bad pmd" message because page table
walker in free_pgtables() encounters 'hwpoisoned entry' on pmd.
This is because currently we fail to clear the hwpoisoned entry in
__unmap_hugepage_range(), so this patch simply does it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
expand_stack() runs with a shared mmap_sem lock. Because of this, there
could be multiple concurrent stack expansions in the same mm, which may
cause problems in the vma gap update code.
I propose to solve this by taking the mm->page_table_lock around such vma
expansions, in order to avoid the concurrency issue. We only have to
worry about concurrent expand_stack() calls here, since we hold a shared
mmap_sem lock and all vma modificaitons other than expand_stack() are done
under an exclusive mmap_sem lock.
I previously tried to achieve the same effect by making sure all growable
vmas in a given mm would share the same anon_vma, which we already lock
here. However this turned out to be difficult - all of the schemes I
tried for refcounting the growable anon_vma and clearing turned out ugly.
So, I'm now proposing only the minimal fix.
The overhead of taking the page table lock during stack expansion is
expected to be small: glibc doesn't use expandable stacks for the threads
it creates, so having multiple growable stacks is actually uncommon and we
don't expect the page table lock to get bounced between threads.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mm given to __mem_cgroup_count_vm_event() cannot be NULL because the
function is either called from the page fault path or vma->vm_mm is used.
So the check can be dropped.
The check was introduced by commit 456f998ec8 ("memcg: add the
pagefault count into memcg stats") because the originally proposed patch
used current->mm for shmem but this has been changed to vma->vm_mm later
on without the check being removed (thanks to Hugh for this
recollection).
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert 3.5's commit f21f806220 ("tmpfs: revert SEEK_DATA and
SEEK_HOLE") to reinstate 4fb5ef089b ("tmpfs: support SEEK_DATA and
SEEK_HOLE"), with the intervening additional arg to
generic_file_llseek_size().
In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
it on tmpfs, so let's join the party.
It's quite easy for tmpfs to scan the radix_tree to support llseek's new
SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
still on my mind (in particular, the !PageUptodate-ness of pages
fallocated but still unwritten).
[akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If SPARSEMEM is enabled, it won't build page structures for non-existing
pages (holes) within a zone, so provide a more accurate estimation of
pages occupied by memmap if there are bigger holes within the zone.
And pages for highmem zones' memmap will be allocated from lowmem, so
charge nr_kernel_pages for that.
[akpm@linux-foundation.org: mark calc_memmap_size __paging_init]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Tested-by: Jianguo Wu <wujianguo@huawei.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
buffer_head comes from kmem_cache_zalloc(), no need to zero its fields.
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It makes no sense to inline an exported function.
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages may
have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866https://patchwork.kernel.org/patch/1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily. Inline the functions into the
pagefault handlers to clean the code up.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
out_of_memory() will already cause current to schedule if it has not been
killed, so doing it again in pagefault_out_of_memory() is redundant.
Remove it.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To lock the entire system from parallel oom killing, it's possible to pass
in a zonelist with all zones rather than using for_each_populated_zone()
for the iteration. This obsoletes try_set_system_oom() and
clear_system_oom() so that they can be removed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, memory management can handle movable node or nodes which don't have
any normal memory, so we can dynamic configure and add movable node by:
online a ZONE_MOVABLE memory from a previous offline node
offline the last normal memory which result a non-normal-memory-node
movable-node is very important for power-saving, hardware partitioning and
high-available-system(hardware fault management).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need a node which only contains movable memory. This feature is very
important for node hotplug. If a node has normal/highmem, the memory may
be used by the kernel and can't be offlined. If the node only contains
movable memory, we can offline the memory and the node.
All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node
[akpm@linux-foundation.org: fix Kconfig text]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While profiling numa/core v16 with cgroup_disable=memory on the command
line, I noticed mem_cgroup_count_vm_event() still showed up as high as
0.60% in perftop.
This occurs because the function is called extremely often even when memcg
is disabled.
To fix this, inline the check for mem_cgroup_disabled() so we avoid the
unnecessary function call if memcg is disabled.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>