There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.
Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One of the changes necessary for shared page tables is to standardize the
pxx_page macros. pte_page and pmd_page have always returned the struct
page associated with their entry, while pte_page_kernel and pmd_page_kernel
have returned the kernel virtual address. pud_page and pgd_page, on the
other hand, return the kernel virtual address.
Shared page tables needs pud_page and pgd_page to return the actual page
structures. There are very few actual users of these functions, so it is
simple to standardize their usage.
Since this is basic cleanup, I am submitting these changes as a standalone
patch. Per Hugh Dickins' comments about it, I am also changing the
pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.
Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The uncached allocator manages per node pools. Specify __GFP_THISNODE in
order to force allocation on the indicated node or fail. The uncached
allocator has already logic to deal with failing allocations.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a notifer chain to the out of memory killer. If one of the registered
callbacks could release some memory, do not kill the process but return and
retry the allocation that forced the oom killer to run.
The purpose of the notifier is to add a safety net in the presence of
memory ballooners. If the resource manager inflated the balloon to a size
where memory allocations can not be satisfied anymore, it is better to
deflate the balloon a bit instead of killing processes.
The implementation for the s390 ballooner is included.
[akpm@osdl.org: cleanups]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We cannot check MAX_NR_ZONES since it not defined in the preprocessor
anymore.
So remove the check.
The maximum number of zones per node for i386 is 3 since i386 does not
support ZONE_DMA32.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
eventcounters: Do not display counters for zones that are not available on an
arch
Do not define or display counters for the DMA32 and the HIGHMEM zone if such
zones were not configured.
[akpm@osdl.org: s390 fix]
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_HIGHMEM optional
- ifdef out code and definitions related to CONFIG_HIGHMEM
- __GFP_HIGHMEM falls back to normal allocations if there is no
ZONE_HIGHMEM
- GFP_ZONEMASK becomes 0x01 if there is no DMA32 and no HIGHMEM
zone.
[jdike@addtoit.com: build fix]
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_DMA32 optional
- Add #ifdefs around ZONE_DMA32 specific code and definitions.
- Add CONFIG_ZONE_DMA32 config option and use that for x86_64
that alone needs this zone.
- Remove the use of CONFIG_DMA_IS_DMA32 and CONFIG_DMA_IS_NORMAL
for ia64 and fix up the way per node ZVCs are calculated.
- Fall back to prior GFP_ZONEMASK of 0x03 if there is no
DMA32 zone.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move totalhigh_pages and nr_free_highpages() into highmem.c/.h
Move the totalhigh_pages definition into highmem.c/.h. Move the
nr_free_highpages function into highmem.c
[yoichi_yuasa@tripeaks.co.jp: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix array initialization in lots of arches
The number of zones may now be reduced from 4 to 2 for many arches. Fix the
array initialization for the zones array for all architectures so that it is
not initializing a fixed number of elements.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I keep seeing zones on various platforms that are never used and wonder why we
compile support for them into the kernel. Counters show up for HIGHMEM and
DMA32 that are alway zero.
This patch allows the removal of ZONE_DMA32 for non x86_64 architectures and
it will get rid of ZONE_HIGHMEM for arches not using highmem (like 64 bit
architectures). If an arch does not define CONFIG_HIGHMEM then ZONE_HIGHMEM
will not be defined. Similarly if an arch does not define CONFIG_ZONE_DMA32
then ZONE_DMA32 will not be defined.
No current architecture uses all the 4 zones (DMA,DMA32,NORMAL,HIGH) that we
have now. The patchset will reduce the number of zones for all platforms.
On many platforms that do not have DMA32 or HIGHMEM this will reduce the
number of zones by 50%. F.e. ia64 only uses DMA and NORMAL.
Large amounts of memory can be saved for larger systemss that may have a few
hundred NUMA nodes.
With ZONE_DMA32 and ZONE_HIGHMEM support optional MAX_NR_ZONES will be 2 for
many non i386 platforms and even for i386 without CONFIG_HIGHMEM set.
Tested on ia64, x86_64 and on i386 with and without highmem.
The patchset consists of 11 patches that are following this message.
One could go even further than this patchset and also make ZONE_DMA optional
because some platforms do not need a separate DMA zone and can do DMA to all
of memory. This could reduce MAX_NR_ZONES to 1. Such a patchset will
hopefully follow soon.
This patch:
Fix strange uses of MAX_NR_ZONES
Sometimes we use MAX_NR_ZONES - x to refer to a zone. Make that explicit.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Address a long standing issue of booting with an initrd on an i386 numa
system. Currently (and always) the numa kva area is mapped into low memory
by finding the end of low memory and moving that mark down (thus creating
space for the kva). The issue with this is that Grub loads initrds into
this similar space so when the kernel check the initrd it finds it outside
max_low_pfn and disables it (it thinks the initrd is not mapped into usable
memory) thus initrd enabled kernels can't boot i386 numa :(
My solution to the problem just converts the numa kva area to use the
bootmem allocator to save it's area (instead of moving the end of low
memory). Using bootmem allows the kva area to be mapped into more diverse
addresses (not just the end of low memory) and enables the kva area to be
mapped below the initrd if present.
I have tested this patch on numaq(no initrd) and summit(initrd) i386 numa
based systems.
[akpm@osdl.org: cleanups]
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SOUND] sparc/amd7930: Use __devinit and __devinitdata as needed.
[SUNLANCE]: Mark sparc_lance_probe_one as __devinit.
[SPARC64]: Fix section-mismatch errors in solaris emul module.
If there is only 1 node in the system cpus should think they are apart of
some other node.
If cases where a real numa system boots the Flat numa option make sure the
cpus don't claim to be apart on a non-existent node.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assume that a cpu is *physically* offlined at boot time...
Because smpboot.c::smp_boot_cpu_map() canoot find cpu's sapicid,
numa.c::build_cpu_to_node_map() cannot build cpu<->node map for
offlined cpu.
For such cpus, cpu_to_node map should be fixed at cpu-hot-add.
This mapping should be done before cpu onlining.
This patch also handles cpu hotremove case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem description:
We have additional_cpus= option for allocating possible_cpus. But nid
for possible cpus are not fixed at boot time. cpus which is offlined at
boot or cpus which is not on SRAT is not tied to its node. This will
cause panic at cpu onlining.
Usually, pxm_to_nid() mapping is fixed at boot time by SRAT.
But, unfortunately, some system (my system!) do not include
full SRAT table for possible cpus. (Then, I use
additiona_cpus= option.)
For such possible cpus, pxm<->nid should be fixed at
hot-add. We now have acpi_map_pxm_to_node() which is also
used at boot. It's suitable here.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_PHYSICAL_START set to a non default values the i386
boot_ioremap code calculated its pte index wrong and users of boot_ioremap
have their areas incorrectly mapped (for me SRAT table not mapped during
early boot). This patch removes the addr < BOOT_PTE_PTRS constraint.
[ Keith says this is applicable to 2.6.16 and 2.6.17 as well ]
Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
init_socksys() was marked __init but invoked from a
non-__init function.
Use the correct module_{init,exit}() faciltiies while we're
here and eliminate some seriously bogus ifdefs.
Signed-off-by: David S. Miller <davem@davemloft.net>
Unfortunately, sparc64 doesn't have an easy way to do a "64 X 64 -->
128" bit multiply like PowerPC and IA64 do. We were doing a
"64 X 64 --> 64" bit multiple which causes overflow very quickly with
a 30-bit quotient shift.
So use a quotientshift count of 10 instead of 30, just like x86 and
ARM do.
This also fixes the wrapping of printk timestamp values every ~17
seconds.
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata
[CPUFREQ] Fix some more CPU hotplug locking.
[CPUFREQ] Workaround for BIOS bug in software coordination of frequency
[CPUFREQ] Longhaul - Add voltage scaling to driver
[CPUFREQ] Fix sparse warning in ondemand
[CPUFREQ] make drivers/cpufreq/cpufreq_ondemand.c:powersave_bias_target() static
[CPUFREQ] Longhaul - Add ignore_latency option
[CPUFREQ] Longhaul - Disable arbiter
[CPUFREQ][2/2] ondemand: updated add powersave_bias tunable
[CPUFREQ][1/2] ondemand: updated tune for hardware coordination
[CPUFREQ] Fix typo.
iommu_init() and iounit_init() are never called for sun4, but that's not
enough - these calls should be ifdefed out since the functions in question
simply do not exist for CONFIG_SUN4 kernel.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Back when pci_dev had base_address[], loop of form
base = &...->base_address[0];
for (.....) {
...
*base++ = addr;
}
was fine, but when that array got spread in ->resource[...].start
replacing the initialization with
base = &...->resource[0].start;
was not a sufficient modification. IOW this code got broken for cases
when there had been more than one resource to fill. All way back in
2.3.41-pre3...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sw_any_bug_dmi_table can be used on resume, so it isn't initdata.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (64 commits)
[BLOCK] dm-crypt: trivial comment improvements
[CRYPTO] api: Deprecate crypto_digest_* and crypto_alg_available
[CRYPTO] padlock: Convert padlock-sha to use crypto_hash
[CRYPTO] users: Use crypto_comp and crypto_has_*
[CRYPTO] api: Add crypto_comp and crypto_has_*
[CRYPTO] users: Use crypto_hash interface instead of crypto_digest
[SCSI] iscsi: Use crypto_hash interface instead of crypto_digest
[CRYPTO] digest: Remove old HMAC implementation
[CRYPTO] doc: Update documentation for hash and me
[SCTP]: Use HMAC template and hash interface
[IPSEC]: Use HMAC template and hash interface
[CRYPTO] tcrypt: Use HMAC template and hash interface
[CRYPTO] hmac: Add crypto template implementation
[CRYPTO] digest: Added user API for new hash type
[CRYPTO] api: Mark parts of cipher interface as deprecated
[PATCH] scatterlist: Add const to sg_set_buf/sg_init_one pointer argument
[CRYPTO] drivers: Remove obsolete block cipher operations
[CRYPTO] users: Use block ciphers where applicable
[SUNRPC] GSS: Use block ciphers where applicable
[IPSEC] ESP: Use block ciphers where applicable
...
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (44 commits)
[S390] hypfs crashes with invalid mount option.
[S390] cio: subchannel evaluation function operates without lock
[S390] cio: always query all paths on path verification.
[S390] cio: update path groups on logical CHPID changes.
[S390] cio: subchannels in no-path state.
[S390] Replace nopav-message on VM.
[S390] set modalias for ccw bus uevents.
[S390] Get rid of DBG macro.
[S390] Use alternative user-copy operations for new hardware.
[S390] Make user-copy operations run-time configurable.
[S390] Cleanup in signal handling code.
[S390] Cleanup in page table related code.
[S390] Linux API for writing z/VM APPLDATA Monitor records.
[S390] xpram off by one error.
[S390] Remove kexec experimental flag.
[S390] cleanup appldata.
[S390] fix typo in vmcp.
[S390] Kernel stack overflow handling.
[S390] qdio slsb processing state.
[S390] Missing initialization in common i/o layer.
...
On detection of an EEH error, some Power4 systems seem to occasionally
want to be reset twice before they report themselves as fully recovered.
This patch re-arranges the code to attempt additional resets if the first
one doesn't take.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch causes fsl_soc.h to import the definition of phys_addr_t
itself, rather than relying on its includer to do so.
Signed-off-by: Scott Wood <scott@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Noticed that the U3_*CFA macros have some typos:
#define U3_HT_CFA0(devfn, off) \
((((unsigned long)devfn) << 8) | offset)
(refers to offset rather than off)
#define U3_AGP_CFA0(devfn, off) \
((1 << (unsigned long)PCI_SLOT(dev_fn)) \
| (((unsigned long)PCI_FUNC(dev_fn)) << 8) \
(refers to dev_fn rather than devfn)
Things happen to work, but there doesn't seem to be any reason these
shouldn't be functions. Overall behavior should be unchanged.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When there is a PCI-X mode 2 capable device behind the HT<->PCI-X
bridge, the pci core decides that the device has the extended 4K
config space, even though the bus is not operating in mode 2. This is
because the u3_ht pci ops silently accept offsets greater than 255 but
use only the 8 least significant bits, which means reading at offset
0x100 gets the data at offset 0x0, and causes confusion for lspci.
Reject accesses to configuration space offsets greater than 255.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch fixes the assignment of pending registers to IRQ numbers for
the IPIC; the code previously assigned all IRQs to the high pending word
regardless of which word the interrupt belonged to.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch changes the io operations so that they are out of line if
CONFIG_PPC_ISERIES is set and includes a firmware feature check in
that case.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Update to the PowerPC PCI error recovery code.
Add code to enable MMIO if a device driver reports that it is capable
of recovering on its own. One anticipated use of this having a device
driver enable MMIO so that it can take a register dump, which might
then be followed by the device driver requesting a full reset.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add wrapper around the rtas call to enable MMIO or DMA on a frozen pci
slot.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Clean up subroutine documentation; mostly formatting changes, with
some new content.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This corrects a pci_dev get/put imbalance that can occur only in
highly unlikely situations (kmalloc failures, pci devices with
overlapping resource addresses). No actual failures seen, this was
spotted during code review.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The stack frame address was being printed incorrectly in the backtrace
option of XMON on PPC. This patch fixes it to print the actual stack
address instead of the address of the local variable that contains it.
Signed-off-by: Josh Boyer <jdub@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Jakub noticed the cputable.c entry for Xilinx Virtex-4 FX was missing
a .platform value, so the AT_PLATFORM value wouldn't be set correctly.
This adds it.
Signed-off-by: Peter Bergner <bergner@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cleanup for user headers, as noted:
asm-sh/page.h requires asm-generic/memory_model.h, which does not exist in exported headers
asm-sh/ptrace.h requires asm/ubc.h, which does not exist in exported headers
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch removes obsolete block operations of the simple cipher type
from drivers. These were preserved so that existing users can make a
smooth transition. Now that the transition is complete, they are no
longer needed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds block cipher algorithms for S390. Once all users of the
old cipher type have been converted the existing CBC/ECB non-block cipher
operations will be removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Accelerated versions of crypto algorithms must carry a distinct driver name
and priority in order to distinguish themselves from their generic counter-
part.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the tfm is passed directly to setkey instead of the ctx, we no
longer need to pass the &tfm->crt_flags pointer.
This patch also gets rid of a few unnecessary checks on the key length
for ciphers as the cipher layer guarantees that the key length is within
the bounds specified by the algorithm.
Rather than testing dia_setkey every time, this patch does it only once
during crypto_alloc_tfm. The redundant check from crypto_digest_setkey
is also removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>