Commit Graph

675 Commits

Author SHA1 Message Date
Matt Mackall cd7619d6bf [PATCH] Exterminate PAGE_BUG
Remove PAGE_BUG - repalce it with BUG and BUG_ON.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:01 -07:00
akpm@osdl.org d59dd4620f [PATCH] use smp_mb/wmb/rmb where possible
Replace a number of memory barriers with smp_ variants.  This means we won't
take the unnecessary hit on UP machines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:47 -07:00
Manfred Spraul 97e2bde47f [PATCH] add kmalloc_node, inline cleanup
The patch makes the following function calls available to allocate memory
on a specific node without changing the basic operation of the slab
allocator:

 kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
 kmalloc_node(size_t size, unsigned int flags, int node);

in a similar way to the existing node-blind functions:

 kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags);
 kmalloc(size, flags);

kmem_cache_alloc_node was changed to pass flags and the node information
through the existing layers of the slab allocator (which lead to some minor
rearrangements).  The functions at the lowest layer (kmem_getpages,
cache_grow) are already node aware.  Also __alloc_percpu can call
kmalloc_node now.

Performance measurements (using the pageset localization patch) yields:

w/o patches:
Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      484.27  100       484.2736     12.02      1.97   Wed Mar 30 20:50:43 2005
  100    25170.83   91       251.7083     23.12    150.10   Wed Mar 30 20:51:06 2005
  200    34601.66   84       173.0083     33.64    294.14   Wed Mar 30 20:51:40 2005
  300    37154.47   86       123.8482     46.99    436.56   Wed Mar 30 20:52:28 2005
  400    39839.82   80        99.5995     58.43    580.46   Wed Mar 30 20:53:27 2005
  500    40036.32   79        80.0726     72.68    728.60   Wed Mar 30 20:54:40 2005
  600    44074.21   79        73.4570     79.23    872.10   Wed Mar 30 20:55:59 2005
  700    44016.60   78        62.8809     92.56   1015.84   Wed Mar 30 20:57:32 2005
  800    40411.05   80        50.5138    115.22   1161.13   Wed Mar 30 20:59:28 2005
  900    42298.56   79        46.9984    123.83   1303.42   Wed Mar 30 21:01:33 2005
 1000    40955.05   80        40.9551    142.11   1441.92   Wed Mar 30 21:03:55 2005

with pageset localization and slab API patches:
Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      484.19  100       484.1930     12.02      1.98   Wed Mar 30 21:10:18 2005
  100    27428.25   92       274.2825     21.22    149.79   Wed Mar 30 21:10:40 2005
  200    37228.94   86       186.1447     31.27    293.49   Wed Mar 30 21:11:12 2005
  300    41725.42   85       139.0847     41.84    434.10   Wed Mar 30 21:11:54 2005
  400    43032.22   82       107.5805     54.10    582.06   Wed Mar 30 21:12:48 2005
  500    42211.23   83        84.4225     68.94    722.61   Wed Mar 30 21:13:58 2005
  600    40084.49   82        66.8075     87.12    873.11   Wed Mar 30 21:15:25 2005
  700    44169.30   79        63.0990     92.24   1008.77   Wed Mar 30 21:16:58 2005
  800    43097.94   79        53.8724    108.03   1155.88   Wed Mar 30 21:18:47 2005
  900    41846.75   79        46.4964    125.17   1303.38   Wed Mar 30 21:20:52 2005
 1000    40247.85   79        40.2478    144.60   1442.21   Wed Mar 30 21:23:17 2005

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:38 -07:00
William Lee Irwin III dd1d5afca8 [PATCH] sync_page() smp_mb() comment
The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses
page_mapping(page).  The comments in the patch (the entire patch is the
addition of this comment) try to explain further how and why smp_mb() is
used.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:38 -07:00
Chris Wright 93ea1d0a12 [PATCH] RLIMIT_MEMLOCK checking fix
Always use page counts when doing RLIMIT_MEMLOCK checking to avoid possible
overflow.

Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:38 -07:00
KAMEZAWA Hiroyuki edfbe2b003 [PATCH] count bounce buffer pages in vmstat
This is a patch for counting the number of pages for bounce buffers.  It's
shown in /proc/vmstat.

Currently, the number of bounce pages are not counted anywhere.  So, if
there are many bounce pages, it seems that there are leaked pages.  And
it's difficult for a user to imagine the usage of bounce pages.  So, it's
meaningful to show # of bouce pages.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:37 -07:00
Nick Piggin bd53b714d3 [PATCH] mm: use __GFP_NOMEMALLOC
Use the new __GFP_NOMEMALLOC to simplify the previous handling of
PF_MEMALLOC.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:37 -07:00
Nick Piggin 20a77776c2 [PATCH] mempool: simplify alloc
Mempool is pretty clever.  Looks too clever for its own good :) It
shouldn't really know so much about page reclaim internals.

- don't guess about what effective page reclaim might involve.

- don't randomly flush out all dirty data if some unlikely thing
  happens (alloc returns NULL). page reclaim can (sort of :P) handle
  it.

I think the main motivation is trying to avoid pool->lock at all costs.
However the first allocation is attempted with __GFP_WAIT cleared, so it
will be 'can_try_harder' if it hits the page allocator.  So if allocation
still fails, then we can probably afford to hit the pool->lock - and what's
the alternative?  Try page reclaim and hit zone->lru_lock?

A nice upshot is that we don't need to do any fancy memory barriers or do
(intentionally) racy access to pool-> fields outside the lock.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:37 -07:00
Nick Piggin b84a35be02 [PATCH] mempool: NOMEMALLOC and NORETRY
Mempools have 2 problems.

The first is that mempool_alloc can possibly get stuck in __alloc_pages
when they should opt to fail, and take an element from their reserved pool.

The second is that it will happily eat emergency PF_MEMALLOC reserves
instead of going to their reserved pools.

Fix the first by passing __GFP_NORETRY in the allocation calls in
mempool_alloc.  Fix the second by introducing a __GFP_MEMPOOL flag which
directs the page allocator not to allocate from the reserve pool.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:36 -07:00
Nick Piggin 8e30f272a9 [PATCH] mm: pcp use non powers of 2 for batch size
Jack Steiner reported this to have fixed his problem (bad colouring):
"The patches fix both problems that I found - bad
 coloring & excessive pages in pagesets."

In most workloads this is not likely to be such a pronounced problem,
however it should help corner cases.  And avoiding powers of 2 in these
types of memory operations is always a good idea.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:36 -07:00
Nikita Danilov 81b4082dc7 [PATCH] mm: rmap.c cleanup
mm/rmap.c:page_referenced_one() and mm/rmap.c:try_to_unmap_one() contain
identical code that

 - takes mm->page_table_lock;

 - drills through page tables;

 - checks that correct pte is reached.

Coalesce this into page_check_address()

Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:36 -07:00
akpm@osdl.org 119f657c72 [PATCH] RLIMIT_AS checking fix
Address bug #4508: there's potential for wraparound in the various places
where we perform RLIMIT_AS checking.

(I'm a bit worried about acct_stack_growth().  Are we sure that vma->vm_mm is
always equal to current->mm?  If not, then we're comparing some other
process's total_vm with the calling process's rlimits).

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:35 -07:00
akpm@osdl.org f021e92101 [PATCH] generic_file_buffered_write fixes
Anton Altaparmakov <aia21@cam.ac.uk> points out:

- It calls fault_in_pages_readable() which is completely bogus if @nr_segs >
  1.  It needs to be replaced by a to be written
  "fault_in_pages_readable_iovec()".

- It increments @buf even in the iovec case thus @buf can point to random
  memory really quickly (in the iovec case) and then it calls
  fault_in_pages_readable() on this random memory.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:35 -07:00
Al Viro 01424961e6 [PATCH] mempolicy.c GFP fix
zonelist_policy() forgot to mask non-zone bits from gfp when comparing
zone number with policy_zone. 

ACKed-by: Andi Kleen <ak@suse.de>
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-24 12:28:34 -07:00
Hugh Dickins 561bbe3235 [PATCH] freepgt: remove FIRST_USER_ADDRESS hack
Once all the MMU architectures define FIRST_USER_ADDRESS, remove hack from
mmap.c which derived it from FIRST_USER_PGD_NR.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:23 -07:00
Hugh Dickins 8462e20175 [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR
Remove use of FIRST_USER_PGD_NR from sys_mincore: it's inconsistent (no other
syscall refers to it), unnecessary (sys_mincore loops over vmas further down)
and incorrect (misses user addresses in ARM's first pgd).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:20 -07:00
Hugh Dickins e2cdef8c84 [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS
The patches to free_pgtables by vma left problems on any architectures which
leave some user address page table entries unencapsulated by vma.  Andi has
fixed the 32-bit vDSO on x86_64 to use a vma.  Now fix arm (and arm26), whose
first PAGE_SIZE is reserved (perhaps) for machine vectors.

Our calls to free_pgtables must not touch that area, and exit_mmap's
BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
allocated an extra page table, which its free_pgd_slow would free later.

FIRST_USER_PGD_NR has misled me and others: until all the arches define
FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other.  This
patch fixes the bugs, the remaining patches just clean it up.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:19 -07:00
Hugh Dickins 146425a316 [PATCH] freepgt: mpnt to vma cleanup
While dabbling here in mmap.c, clean up mysterious "mpnt"s to "vma"s.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:18 -07:00
Hugh Dickins 3bf5ee9564 [PATCH] freepgt: hugetlb_free_pgd_range
ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
called, and it wasn't obvious what to do about them.

The ppc64 case turns out to be easy: the associated tables are noted elsewhere
and freed later, safe to either skip its hugetlb areas or go through the
motions of freeing nothing.  Since ia64 does need a special case, restore to
ppc64 the special case of skipping them.

The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
probably appeared to work okay if you just had one such area; in fact it's
been broken much longer if you consider a long munmap spanning from another
region into the hugetlb region.

In the ia64 hugetlb region, more virtual address bits are available than in
the other regions, yet the page tables are structured the same way: the page
at the bottom is larger.  Here we need to scale down each addr before passing
it to the standard free_pgd_range.  Was about to write a hugely_scaled_down
macro, but found htlbpage_to_page already exists for just this purpose.  Fixed
off-by-one in ia64 is_hugepage_only_range.

Uninline free_pgd_range to make it available to ia64.  Make sure the
vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
other (safe to join huges?  probably but don't bother).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:16 -07:00
Hugh Dickins ee39b37b23 [PATCH] freepgt: remove MM_VM_SIZE(mm)
There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro
because mm doesn't contain the (32-bit emulation?) info needed.  But it too is
only needed because we ignore the end from the vma list.

We could make flush_pgtables return that end, or unmap_vmas.  Choose the
latter, since it's a natural fit with unmap_mapping_range_vma needing to know
its restart addr.  This does make more than minimal change, but if unmap_vmas
had returned the end before, this is how we'd have done it, rather than
storing the break_addr in zap_details.

unmap_vmas used to return count of vmas scanned, but that's just debug which
hasn't been useful in a while; and if we want the map_count 0 on exit check
back, it can easily come from the final remove_vm_struct loop.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:15 -07:00
Hugh Dickins e0da382c92 [PATCH] freepgt: free_pgtables use vma list
Recent woes with some arches needing their own pgd_addr_end macro; and 4-level
clear_page_range regression since 2.6.10's clear_page_tables; and its
long-standing well-known inefficiency in searching throughout the higher-level
page tables for those few entries to clear and free: all can be blamed on
ignoring the list of vmas when we free page tables.

Replace exit_mmap's clear_page_range of the total user address space by
free_pgtables operating on the mm's vma list; unmap_region use it in the same
way, giving floor and ceiling beyond which it may not free tables.  This
brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled,
in which case latency fixes spoil unmap_vmas throughput).

Beware: the do_mmap_pgoff driver failure case must now use unmap_region
instead of zap_page_range, since a page table might have been allocated, and
can only be freed while it is touched by some vma.

Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted
from the clear_page_range levels.  (Most of free_pgtables' old code was
actually for a non-existent case, prev not properly set up, dating from before
hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we
might want to add latency lockdrops later; but no attempt to do so yet, going
by vma should itself reduce latency.

But what if is_hugepage_only_range?  Those ia64 and ppc64 cases need careful
examination: put that off until a later patch of the series.

What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma?

And the range to sparc64's flush_tlb_pgtables?  It's less clear to me now that
we need to do more than is done here - every PMD_SIZE ever occupied will be
flushed, do we really have to flush every PGDIR_SIZE ever partially occupied? 
A shame to complicate it unnecessarily.

Special thanks to David Miller for time spent repairing my ceilings.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19 13:29:15 -07:00
akpm@osdl.org 323aca6c0b [PATCH] vmscan: pageout(): remove unneeded test
)



We only call pageout() for dirty pages, so this test is redundant.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:06 -07:00
Andrea Arcangeli 79befd0c08 [PATCH] oom-killer disable for iscsi/lvm2/multipath userland critical sections
iscsi/lvm2/multipath needs guaranteed protection from the oom-killer, so
make the magical value of -17 in /proc/<pid>/oom_adj defeat the oom-killer
altogether.

(akpm: we still need to document oom_adj and friends in
Documentation/filesystems/proc.txt!)

Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:05 -07:00
Jeff Moyer d345734267 [PATCH] filemap_getpage can block when MAP_NONBLOCK specified
We will return NULL from filemap_getpage when a page does not exist in the
page cache and MAP_NONBLOCK is specified, here:

	page = find_get_page(mapping, pgoff);
	if (!page) {
		if (nonblock)
			return NULL;
		goto no_cached_page;
	}

But we forget to do so when the page in the cache is not uptodate.  The
following could result in a blocking call:

	/*
	 * Ok, found a page in the page cache, now we need to check
	 * that it's up-to-date.
	 */
	if (!PageUptodate(page))
		goto page_not_uptodate;



Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:24:05 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00