GDB uses a PTRACE_PEEKUSR call with offset 0 to see
if a thread is alive, so provide a success return for
this particular special case.
Signed-off-by: David S. Miller <davem@davemloft.net>
switch_mm() changes the mm state and does a tsb_context_switch()
first, then we do the cpu register state switch which changes
current_thread_info() and current().
So it's safer to check the PGD physical address stored in the
trap block (which will be updated by the tsb_context_switch() in
switch_mm()) than current->active_mm.
Technically we should never run here in between those two
updates, because interrupts are disabled during the entire
context switch operation. But some day we might like to leave
interrupts enabled during the context switch and this change
allows that to happen without any surprises.
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide proper kprobes fault handling, if a user-specified pre/post handlers
tries to access user address space, through copy_from_user(), get_user() etc.
The user-specified fault handler gets called only if the fault occurs while
executing user-specified handlers. In such a case user-specified handler is
allowed to fix it first, later if the user-specifed fault handler does not fix
it, we try to fix it by calling fix_exception().
The user-specified handler will not be called if the fault happens when single
stepping the original instruction, instead we reset the current probe and
allow the system page fault handler to fix it up.
I could not test this patch for sparc64.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently kprobe handler traps only happen in kernel space, so function
kprobe_exceptions_notify should skip traps which happen in user space.
This patch modifies this, and it is based on 2.6.16-rc4.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: <hiramatu@sdl.hitachi.co.jp>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create compat_sys_adjtimex and use it an all appropriate places.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a copy of the compatibility version of struct timex in each 64 bit
architecture. This patch just creates a global one and replaces all the
usages of the old ones.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
init/do_mounts_rd.c depends upon CONFIG_BLK_DEV_RAM, not CONFIG_BLK_DEV_INITRD.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only need to write an invalid tag every 16 bytes,
so taking advantage of this can save many instructions
compared to the simple memset() call we make now.
A prefetching implementation is implemented for sun4u
and a block-init store version if implemented for Niagara.
The next trick is to be able to perform an init and
a copy_tsb() in parallel when growing a TSB table.
Signed-off-by: David S. Miller <davem@davemloft.net>
Put it one page below the top of the 32-bit address space.
This gives us ~16MB more address space to work with.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently allocations are very constrained for 32-bit processes.
It grows down-up from 0x70000000 to 0xf0000000 which gives about
2GB of stack + dynamic mmap() space.
So support the top-down method, and we need to override the
generic helper function in order to deal with D-cache coloring.
With these changes I was able to squeeze out a mmap() just over
3.6GB in size in a 32-bit process.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is good for up to %50 performance improvement of some test cases.
The problem has been the race conditions, and hopefully I've plugged
them all up here.
1) There was a serious race in switch_mm() wrt. lazy TLB
switching to and from kernel threads.
We could erroneously skip a tsb_context_switch() and thus
use a stale TSB across a TSB grow event.
There is a big comment now in that function describing
exactly how it can happen.
2) All code paths that do something with the TSB need to be
guarded with the mm->context.lock spinlock. This makes
page table flushing paths properly synchronize with both
TSB growing and TLB context changes.
3) TSB growing events are moved to the end of successful fault
processing. Previously it was in update_mmu_cache() but
that is deadlock prone. At the end of do_sparc64_fault()
we hold no spinlocks that could deadlock the TSB grow
sequence. We also have dropped the address space semaphore.
While we're here, add prefetching to the copy_tsb() routine
and put it in assembler into the tsb.S file. This piece of
code is quite time critical.
There are some small negative side effects to this code which
can be improved upon. In particular we grab the mm->context.lock
even for the tsb insert done by update_mmu_cache() now and that's
a bit excessive. We can get rid of that locking, and the same
lock taking in flush_tsb_user(), by disabling PSTATE_IE around
the whole operation including the capturing of the tsb pointer
and tsb_nentries value. That would work because anyone growing
the TSB won't free up the old TSB until all cpus respond to the
TSB change cross call.
I'm not quite so confident in that optimization to put it in
right now, but eventually we might be able to and the description
is here for reference.
This code seems very solid now. It passes several parallel GCC
bootstrap builds, and our favorite "nut cruncher" stress test which is
a full "make -j8192" build of a "make allmodconfig" kernel. That puts
about 256 processes on each cpu's run queue, makes lots of process cpu
migrations occur, causes lots of page table and TLB flushing activity,
incurs many context version number changes, and it swaps the machine
real far out to disk even though there is 16GB of ram on this test
system. :-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Report 'sun4v' when appropriate in /proc/cpuinfo
Remove all the verifications of the OBP version string. Just
make sure it's there, and report it raw in the bootup logs and
via /proc/cpuinfo.
Signed-off-by: David S. Miller <davem@davemloft.net>
The mapping is a simple "(cpuid >> 2) == core" for now.
Later we'll add more sophisticated code that will walk
the sun4v machine description and figure this out from
there.
We should also add core mappings for jaguar and panther
processors.
Signed-off-by: David S. Miller <davem@davemloft.net>
This has been pending for a long time, and the fact
that we waste a ton of ram on some configurations
kind of pushed things over the edge.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't piggy back the SMP receive signal code to do the
context version change handling.
Instead allocate another fixed PIL number for this
asynchronous cross-call. We can't use smp_call_function()
because this thing is invoked with interrupts disabled
and a few spinlocks held.
Also, fix smp_call_function_mask() to count "cpus" correctly.
There is no guarentee that the local cpu is in the mask
yet that is exactly what this code was assuming.
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch converts arch/sparc64 to kzalloc usage.
Crosscompile tested with allyesconfig.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't try to avoid putting non-base page sized entries
into the user TSB. It actually costs us more to check
this than it helps.
Eventually we'll have a multiple TSB scheme for user
processes. Once a process starts using larger pages,
we'll allocate and use such a TSB.
Signed-off-by: David S. Miller <davem@davemloft.net>
This cpu mondo sending interface isn't all that easy to
use correctly...
We were clearing out the wrong bits from the "mask" after getting
something other than EOK from the hypervisor.
It turns out the hypervisor can just be resent the same cpu_list[]
array, with the 0xffff "done" entries still in there, and it will do
the right thing.
So don't update or try to rebuild the cpu_list[] array to condense it.
This requires the "forward_progress" check to be done slightly
differently, but this new scheme is less bug prone than what we were
doing before.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were clobbering a base register before we were done
using it. Fix a comment typo while we're here.
Signed-off-by: David S. Miller <davem@davemloft.net>
The UltraSPARC T1 manual recommends this because the chip
could instruction prefetch into the VA hole, and this would
also make decoding certain kinds of memory access traps
more difficult (because the chip sign extends certain pieces
of trap state).
Signed-off-by: David S. Miller <davem@davemloft.net>
First of all, use the known _PAGE_EXEC_{4U,4V} value instead
of loading _PAGE_EXEC from memory. We either know which one
to use by context, or we can code patch the test.
Next, we need to check executability of a PTE in the generic
TSB miss handler.
Signed-off-by: David S. Miller <davem@davemloft.net>
There were several bugs in the SUN4V cpu mondo dispatch code.
In fact, if we ever got a EWOULDBLOCK or other error from
the hypervisor call, we'd potentially send a cpu mondo multiple
times to the same cpu and even worse we could loop until the
timeout resending the same mondo over and over to such cpus.
So let's bulletproof this thing as follows:
1) Implement cpu_mondo_send() and cpu_state() hypervisor calls
in arch/sparc64/kernel/entry.S, add prototypes to asm/hypervisor.h
2) Don't build and update the cpulist using inline functions, this
was causing the cpu mask to not get updated in the caller.
3) Disable interrupts during the entire mondo send, otherwise our
cpu list and/or mondo block could get overwritten if we take
an interrupt and do a cpu mondo send on the current cpu.
4) Check for all possible error return types from the cpu_mondo_send()
hypervisor call. In particular:
HV_EOK) Our work is done, all cpus have received the mondo.
HV_CPUERROR) One or more of the cpus in the cpu list we passed
to the hypervisor are in error state. Use cpu_state()
calls over the entries in the cpu list to see which
ones. Record them in "error_mask" and report this
after we are done sending the mondo to cpus which are
not in error state.
HV_EWOULDBLOCK) We need to keep trying.
Any other error we consider fatal, we report the event and exit
immediately.
5) We only timeout if forward progress is not made. Forward progress
is defined as having at least one cpu get the mondo successfully
in a given cpu_mondo_send() call. Otherwise we bump a counter
and delay a little. If the counter hits a limit, we signal an
error and report the event.
Also, smp_call_function_mask() error handling reports the number
of cpus incorrectly.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) We must flush the TLB, duh.
2) Even if the sw context was seen to be valid, the local cpu's
hw context can be out of date, so reload it unconditionally.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check TLB flush hypervisor calls for errors and report them.
Pass HV_MMU_ALL always for now, we can add back the optimization
to avoid the I-TLB flush later.
Always explicitly page align the virtual address arguments.
Signed-off-by: David S. Miller <davem@davemloft.net>
The context allocation scheme we use depends upon there being a 1<-->1
mapping from cpu to physical TLB for correctness. Chips like Niagara
break this assumption.
So what we do is notify all cpus with a cross call when the context
version number changes, and if necessary this makes them allocate
a valid context for the address space they are running at the time.
Stress tested with make -j1024, make -j2048, and make -j4096 kernel
builds on a 32-strand, 8 core, T2000 with 16GB of ram.
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise with too much stuff enabled in the kernel config
we can end up with an unaligned trap table.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we take a window fault, on SUN4V set %gl to zero before we
turn PSTATE_IE back on in %pstate. Otherwise if we take an
interrupt we'll end up with corrupt register state.
Signed-off-by: David S. Miller <davem@davemloft.net>
It can map all of the linear kernel mappings with zero TSB hash
conflicts for systems with 16GB or less ram. In such cases, on
SUN4V, once we load up this TSB the first time with all the
mappings, we never take a linear kernel mapping TLB miss ever
again, the hypervisor handles them all.
Signed-off-by: David S. Miller <davem@davemloft.net>
We use a bitmap, one bit for every 256MB of memory. If the
bit is set we can use a 256MB PTE for linear mappings, else
we have to use a 4MB PTE.
SUN4V support is there, and we can very easily add support
for Panther cpu 256MB PTEs in the future.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to turn off the "polling nrflag" bit when we sleep
the cpu like this, so that we'll get a cross-cpu interrupt
to wake the processor up from the yield.
We also have to disable PSTATE_IE in %pstate around the yield
call and recheck need_resched() in order to avoid any races.
Signed-off-by: David S. Miller <davem@davemloft.net>
Set, but never used.
We used to use this for dynamic IRQ retargetting, but that
code died a long time ago.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's extremely noisy and causes much grief on slow
consoles with large numbers of cpus.
We'll have to provide this some saner way in order
to re-enable this.
Signed-off-by: David S. Miller <davem@davemloft.net>
We're about to seriously die in these cases so it is important
that the messages make it to the console.
Signed-off-by: David S. Miller <davem@davemloft.net>
Another case where we have to force ourselves into global register
level one. Also make sure the arguments passed to sun4v_do_mna() are
correct.
This area actually needs some more work, for example spill fixup is
not necessarily going to do the right thing for this case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like kvmap_dtlb_longpath we have to force the
global register level to one in order to mimick the
PSTATE_MG --> PSTATE_AG trasition done on SUN4U.
Signed-off-by: David S. Miller <davem@davemloft.net>
The SUN4V convention with non-shared TSBs is that the context
bit of the TAG is clear. So we have to choose an "invalid"
bit and initialize new TSBs appropriately. Otherwise a zero
TAG looks "valid".
Make sure, for the window fixup cases, that we use the right
global registers and that we don't potentially trample on
the live global registers in etrap/rtrap handling (%g2 and
%g6) and that we put the missing virtual address properly
in %g5.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Add error return checking for TLB load hypervisor
calls.
2) Don't fallthru to dtlb tsb miss handler from itlb tsb
miss handler, oops.
3) On window fixups, propagate fault information to fixup
handler correctly.
Signed-off-by: David S. Miller <davem@davemloft.net>
This gives more consistent bogomips and delay() semantics,
especially on sun4v. It gives weird looking values though...
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to use the real hardware processor ID when
targetting interrupts, not the "define to 0" thing
the uniprocessor build gives us.
Also, fill in the Node-ID and Agent-ID fields properly
on sun4u/Safari.
Signed-off-by: David S. Miller <davem@davemloft.net>
If the top-level cnode had multi entries in it's "reg"
property, we'd fail. The buffer wasn't large enough in
such cases.
Signed-off-by: David S. Miller <davem@davemloft.net>
The sibling cpu bringup is extremely fragile. We can only
perform the most basic calls until we take over the trap
table from the firmware/hypervisor on the new cpu.
This means no accesses to %g4, %g5, %g6 since those can't be
TLB translated without our trap handlers.
In order to achieve this:
1) Change sun4v_init_mondo_queues() so that it can operate in
several modes.
It can allocate the queues, or install them in the current
processor, or both.
The boot cpu does both in it's call early on.
Later, the boot cpu allocates the sibling cpu queue, starts
the sibling cpu, then the sibling cpu loads them in.
2) init_cur_cpu_trap() is changed to take the current_thread_info()
as an argument instead of reading %g6 directly on the current
cpu.
3) Create a trampoline stack for the sibling cpus. We do our basic
kernel calls using this stack, which is locked into the kernel
image, then go to our proper thread stack after taking over the
trap table.
4) While we are in this delicate startup state, we put 0xdeadbeef
into %g4/%g5/%g6 in order to catch accidental accesses.
5) On the final prom_set_trap_table*() call, we put &init_thread_union
into %g6. This is a hack to make prom_world(0) work. All that
wants to do is restore the %asi register using
get_thread_current_ds().
Longer term we should just do the OBP calls to set the trap table by
hand just like we do for everything else. This would avoid that silly
prom_world(0) issue, then we can remove the init_thread_union hack.
Signed-off-by: David S. Miller <davem@davemloft.net>
For 32 cpus and a slow console, it just wedges the
machine especially with DETECT_SOFTLOCKUP enabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole algorithm was wrong. What we need to do is:
1) Walk each PCI bus above this device on the path to the
PCI controller nexus, and for each:
a) If interrupt-map exists, apply it, record IRQ controller node
b) Else, swivel interrupt number using PCI_SLOT(), use PCI bus
parent OBP node as controller node
c) Walk up to "controller node" until we hit the first PCI bus
in this domain, or "controller node" is the PCI controller
OBP node
2) If we walked to PCI controller OBP node, we're done.
3) Else, apply PCI controller interrupt-map to interrupt.
There is some stuff that needs to be checked out for ebus and
isa, but the PCI part is good to go.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to set the global register set _AND_ disable
PSTATE_IE in %pstate. The original patch sequence was
leaving PSTATE_IE enabled when returning to kernel mode,
oops.
This fixes the random register corruption being seen
on SUN4V.
Signed-off-by: David S. Miller <davem@davemloft.net>
Forgot to multiply by 8 * 1024, oops. Correct the size constant when
the virtual-dma arena is 2GB in size, it should bet 256 not 128.
Finally, log some info about the TSB at probe time.
Signed-off-by: David S. Miller <davem@davemloft.net>
For SUN4V, we were clobbering %o5 to do the hypervisor call.
This clobbers the saved %pstate value and we end up writing
garbage into that register as a result. Oops.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use prom_startcpu_cpuid() on SUN4V instead of prom_startcpu().
We should really test for "SUNW,start-cpu-by-cpuid" presence
and use it if present even on SUN4U.
Signed-off-by: David S. Miller <davem@davemloft.net>
When crawling up the PCI bus chain, stop at the first node
that has an interrupt-map property before we hit the root.
Also, if we use a bus interrupt-{map,mask} do not forget to
update the 'intmask' pointer as we do for the 'intmap' pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
On SUN4V, force IRQ state to idle in enable_irq(). However,
I'm still not sure this is %100 correct.
Call add_interrupt_randomness() on SUN4V too.
Signed-off-by: David S. Miller <davem@davemloft.net>
On the PBM's first bus number, only allow device 0, function 0, to be
poked at with PCI config space accesses.
For some reason, this single device responds to all device numbers.
Also, reduce the verbiage of the debugging log printk's for PCI cfg
space accesses in the SUN4V PCI controller driver, so that it doesn't
overwhelm the slow SUN4V hypervisor console.
Signed-off-by: David S. Miller <davem@davemloft.net>
We should dynamically allocate the per-cpu pglist not use
an in-kernel-image datum, since __pa() does not work on
such addresses.
Also, consistently use "u32" for devhandle.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add udelay to polling console write loop, and increment
the loop limit.
Name the device "ttyHV" and pass that to add_preferred_console()
when we're using hypervisor console.
Kill sunhv_console_setup(), it's empty.
Handle the case where we don't want to use hypervisor console.
(ie. we have a head attached to a sun4v machine)
Signed-off-by: David S. Miller <davem@davemloft.net>
Get bus range from child of PCI controller root nexus.
This is actually a hack, but the PCI-E bridge sitting
at the top of the PCI tree responds to PCI config cycles
for every device number, so best to just ignore it for now.
Preliminary PCI irq routing, needs lots of work.
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear top 8-bits of physical addresses in "ranges" property.
This gives the actual physical address.
Detect PBM-A vs. PBM-B by checking bit 0x40 of the devhandle.
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI cfg space is accessed transparently through the Hypervisor and not
through direct cpu PIO operations.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to use bootmem during init_IRQ and page alloc
for sibling cpu calls.
Also, fix incorrect hypervisor call return value
checks in the hypervisor SMP cpu mondo send code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Yes, you heard it right, they changed the PTE layout for
SUN4V. Ho hum...
This is the simple and inefficient way to support this.
It'll get optimized, don't worry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Code patching did not sign extend negative branch
offsets correctly.
Kernel TLB miss path needs patching and %g4 register
preservation in order to handle SUN4V correctly.
Signed-off-by: David S. Miller <davem@davemloft.net>
prom_sun4v_name should be "sun4v" not "SUNW,sun4v"
Also, this is too early to make use of the
.sun4v_Xinsn_patch code patching, so just check
things manually.
This gets us at least to prom_init() on Niagara.
Signed-off-by: David S. Miller <davem@davemloft.net>
There was also a bug in sun4v_itlb_miss, it loaded the
MMU Fault Status base into %g3 instead of %g2.
This pointed out a fast path for TSB miss processing,
since we have %g2 with the MMU Fault Status base, we
can use that to quickly load up the PGD phys address.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is where the virtual address of the fault status
area belongs.
To set it up we don't make a hypervisor call, instead
we call OBP's SUNW,set-trap-table with the real address
of the fault status area as the second argument. And
right before that call we write the virtual address into
ASI_SCRATCHPAD vaddr 0x0.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add assembler file for PCI hypervisor calls.
Setup basic skeleton of SUN4V PCI controller driver.
Add 32-bit devhandle to PBM struct, as this is needed for
hypervisor calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
Abstract out IOMMU operations so that we can have a different
set of calls on sun4v, which needs to do things through
hypervisor calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we register a TSB with the hypervisor, so that it or hardware can
handle TLB misses and do the TSB walk for us, the hypervisor traps
down to these trap when it incurs a TSB miss.
Processing is simple, we load the missing virtual address and context,
and do a full page table walk.
Signed-off-by: David S. Miller <davem@davemloft.net>
We look for "SUNW,sun4v" in the 'compatible' property
of the root OBP device tree node.
Protect every %ver register access, to make sure it is
not touched on sun4v, as %ver is hyperprivileged there.
Lock kernel TLB entries using hypervisor calls instead of
calls into OBP.
Signed-off-by: David S. Miller <davem@davemloft.net>
Technically the hypervisor call supports sending in a list
of all cpus to get the cross-call, but I only pass in one
cpu at a time for now.
The multi-cpu support is there, just ifdef'd out so it's easy to
enable or delete it later.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sun4v has 4 interrupt queues: cpu, device, resumable errors,
and non-resumable errors. A set of head/tail offset pointers
help maintain a work queue in physical memory. The entries
are 64-bytes in size.
Each queue is allocated then registered with the hypervisor
as we bring cpus up.
The two error queues each get a kernel side buffer that we
use to quickly empty the main interrupt queue before we
call up to C code to log the event and possibly take evasive
action.
Signed-off-by: David S. Miller <davem@davemloft.net>
Happily we have no D-cache aliasing issues on these
chips, so the implementation is very straightforward.
Add a stub in bootup which will be where the patching
calls will be made for niagara/sun4v/hypervisor.
Signed-off-by: David S. Miller <davem@davemloft.net>
Things are a little tricky because, unlike sun4u, we have
to:
1) do a hypervisor trap to do the TLB load.
2) do the TSB lookup calculations by hand
Signed-off-by: David S. Miller <davem@davemloft.net>
If we're just switching between different alternate global
sets, nop it out on sun4v. Also, get rid of all of the
alternate global save/restore in the OBP CIF trampoline code.
Signed-off-by: David S. Miller <davem@davemloft.net>
They are totally unnecessary because:
1) Interrupts are already disabled when switch_to()
runs.
2) We don't use hard-coded alternate globals any longer.
This found a case in rtrap, which still assumed alternate
global %g6 was current_thread_info(), and that is fixed
by this changeset as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
As we save trap state onto the stack, the store buffer fills up
mid-way through and we stall for several cycles as the store buffer
trickles out to the L2 cache. Meanwhile we can do some privileged
register reads and other calculations, essentially for free.
Signed-off-by: David S. Miller <davem@davemloft.net>
And more consistently check cheetah{,_plus} instead
of assuming anything not spitfire is cheetah{,_plus}.
Signed-off-by: David S. Miller <davem@davemloft.net>
When saving and restoing trap state, do the window spill/fill
handling inline so that we never trap deeper than 2 trap levels.
This is important for chips like Niagara.
The window fixup code is massively simplified, and many more
improvements are now possible.
Signed-off-by: David S. Miller <davem@davemloft.net>
On uniprocessor, it's always zero for optimize that.
On SMP, the jmpl to the stub kills the return address stack in the cpu
branch prediction logic, so expand the code sequence inline and use a
code patching section to fix things up. This also always better and
explicit register selection, which will be taken advantage of in a
future changeset.
The hard_smp_processor_id() function is big, so do not inline it.
Fix up tests for Jalapeno to also test for Serrano chips too. These
tests want "jbus Ultra-IIIi" cases to match, so that is what we should
test for.
Signed-off-by: David S. Miller <davem@davemloft.net>
The are distrupting, which by the sparc v9 definition means they
can only occur when interrupts are enabled in the %pstate register.
This never occurs in any of the trap handling code running at
trap levels > 0.
So just mark it as an unexpected trap.
This allows us to kill off the cee_stuff member of struct thread_info.
Signed-off-by: David S. Miller <davem@davemloft.net>
This way we don't need to lock the TSB into the TLB.
The trick is that every TSB load/store is registered into
a special instruction patch section. The default uses
virtual addresses, and the patch instructions use physical
address load/stores.
We can't do this on all chips because only cheetah+ and later
have the physical variant of the atomic quad load.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we are returning back to kernel mode, %g4 could be live
(for example, in the case where we window spill in the etrap
code). So do not change it's value if going back to kernel.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we use %g5 itself as a temporary, it can get clobbered
if we take an interrupt mid-stream and thus cause end up with
the final %g5 value too early as a result of rtrap processing.
Set %g5 at the very end, atomically, to avoid this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
%g6 is not necessarily set to current_thread_info()
at sparc64_realfault_common. So store the fault
code and address after we invoke etrap and %g6 is
properly set up.
Signed-off-by: David S. Miller <davem@davemloft.net>
Just flip the bit off of whatever it's currently set to.
PSTATE_IE is guarenteed to be enabled when we get here.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is totally unnecessary complexity. After we take over
the trap table, we handle all PROM tlb misses fully.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the trap code was still assuming that alternate
global %g6 was hard coded with current_thread_info().
Let's just consistently flush at KERNBASE when we need
a pipeline synchronization. That's locked into the TLB
and will always work.
Signed-off-by: David S. Miller <davem@davemloft.net>
As the RSS grows, grow the TSB in order to reduce the likelyhood
of hash collisions and thus poor hit rates in the TSB.
This definitely needs some serious tuning.
Signed-off-by: David S. Miller <davem@davemloft.net>
This also cleans up tsb_context_switch(). The assembler
routine is now __tsb_context_switch() and the former is
an inline function that picks out the bits from the mm_struct
and passes it into the assembler code as arguments.
setup_tsb_parms() computes the locked TLB entry to map the
TSB. Later when we support using the physical address quad
load instructions of Cheetah+ and later, we'll simply use
the physical address for the TSB register value and set
the map virtual and PTE both to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move {init_new,destroy}_context() out of line.
Do not put huge pages into the TSB, only base page size translations.
There are some clever things we could do here, but for now let's be
correct instead of fancy.
Signed-off-by: David S. Miller <davem@davemloft.net>
UltraSPARC has special sets of global registers which are switched to
for certain trap types. There is one set for MMU related traps, one
set of Interrupt Vector processing, and another set (called the
Alternate globals) for all other trap types.
For what seems like forever we've hard coded the values in some of
these trap registers. Some examples include:
1) Interrupt Vector global %g6 holds current processors interrupt
work struct where received interrupts are managed for IRQ handler
dispatch.
2) MMU global %g7 holds the base of the page tables of the currently
active address space.
3) Alternate global %g6 held the current_thread_info() value.
Such hardcoding has resulted in some serious issues in many areas.
There are some code sequences where having another register available
would help clean up the implementation. Taking traps such as
cross-calls from the OBP firmware requires some trick code sequences
wherein we have to save away and restore all of the special sets of
global registers when we enter/exit OBP.
We were also using the IMMU TSB register on SMP to hold the per-cpu
area base address, which doesn't work any longer now that we actually
use the TSB facility of the cpu.
The implementation is pretty straight forward. One tricky bit is
getting the current processor ID as that is different on different cpu
variants. We use a stub with a fancy calling convention which we
patch at boot time. The calling convention is that the stub is
branched to and the (PC - 4) to return to is in register %g1. The cpu
number is left in %g6. This stub can be invoked by using the
__GET_CPUID macro.
We use an array of per-cpu trap state to store the current thread and
physical address of the current address space's page tables. The
TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this
table, it uses __GET_CPUID and also clobbers %g1.
TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load
the current processor's IRQ software state into %g6. It also uses
__GET_CPUID and clobbers %g1.
Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the
current address space's page tables into %g7, it clobbers %g1 and uses
__GET_CPUID.
Many refinements are possible, as well as some tuning, with this stuff
in place.
Signed-off-by: David S. Miller <davem@davemloft.net>
Taking a nod from the powerpc port.
With the per-cpu caching of both the page allocator and SLAB, the
pgtable quicklist scheme becomes relatively silly and primitive.
Signed-off-by: David S. Miller <davem@davemloft.net>
We now use the TSB hardware assist features of the UltraSPARC
MMUs.
SMP is currently knowingly broken, we need to find another place
to store the per-cpu base pointers. We hid them away in the TSB
base register, and that obviously will not work any more :-)
Another known broken case is non-8KB base page size.
Also noticed that flush_tlb_all() is not referenced anywhere, only
the internal __flush_tlb_all() (local cpu only) is used by the
sparc64 port, so we can get rid of flush_tlb_all().
The kernel gets it's own 8KB TSB (swapper_tsb) and each address space
gets it's own private 8K TSB. Later we can add code to dynamically
increase the size of per-process TSB as the RSS grows. An 8KB TSB is
good enough for up to about a 4MB RSS, after which the TSB starts to
incur many capacity and conflict misses.
We even accumulate OBP translations into the kernel TSB.
Another area for refinement is large page size support. We could use
a secondary address space TSB to handle those.
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch "[SPARC64]: Get rid of fast IRQ feature"
moved the the code from arch/sparc64/kernel/entry.S:
lduba [%g7] ASI_PHYS_BYPASS_EC_E, %g5
or %g5, AUXIO_AUX1_FTCNT, %g5
stba %g5, [%g7] ASI_PHYS_BYPASS_EC_E
andn %g5, AUXIO_AUX1_FTCNT, %g5
stba %g5, [%g7] ASI_PHYS_BYPASS_EC_E
to arch/sparc64/kernel/irq.c:
val = readb(auxio_register);
val |= AUXIO_AUX1_FTCNT;
writeb(val, auxio_register);
val &= AUXIO_AUX1_FTCNT;
writeb(val, auxio_register);
This looks like it it missing a bitwise not, which is reintroduced
by this patch.
Due to lack of a floppy device, I could not test it, but it looks
evident.
Signed-off-by: Bernhard R Link <brlink@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must use the "a" (allocate) attribute every time we
emit an entry into the __ex_table section.
For consistency, use "a" instead of #alloc which is some
Solaris compat cruft GNU as provides on Sparc.
Signed-off-by: David S. Miller <davem@davemloft.net>
The change to kernel/sched.c's init code to use for_each_cpu()
requires that the cpu_possible_map be setup much earlier.
Set it up via setup_arch(), constrained to NR_CPUS, and later
constrain it to max_cpus in smp_prepare_cpus().
This fixes SMP booting on sparc64.
Signed-off-by: David S. Miller <davem@davemloft.net>
The sparc64 64 bit syscall table seems to be broken as it has
compat_sys_newfstatat in its syscall table instead of sys_newfstatat.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, the Solaris syscall table is sized differrently,
and does not go beyond entry 255, so trim off the excess
entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
This also includes by necessity _TIF_RESTORE_SIGMASK support,
which actually resulted in a lot of cleanups.
The sparc signal handling code is quite a mess and I should
clean it up some day.
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name. These functions, openat etc, have been discussed on numerous
occasions. They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.
We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic. But this code is rather expensive. Here are some
results (similar to what Jim Meyering posted before).
The test creates a deep directory hierarchy on a tmpfs filesystem. Then
rm -fr is used to remove all directories. Without syscall support I get
this:
real 0m31.921s
user 0m0.688s
sys 0m31.234s
With syscall support the results are much better:
real 0m20.699s
user 0m0.536s
sys 0m20.149s
The interfaces are for obvious reasons currently not much used. But they'll
be used. coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them. I expect a patch to make follow soon. Every program which is walking
the filesystem tree will benefit.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Eddie C. Dost <ecd@brainaid.de>
I have the following patch for serial console over the RSC
(remote system controller) on my E250 machine. It basically adds
support for input-device=rsc and output-device=rsc from OBP, and
allows 115200,8,n,1,- serial mode setting.
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure a consistent value is read from the STICK register by ensuring
that both high and low are read without high changing due to a roll
over of the low register.
Various Debian/SPARC users (myself include) have noticed problems with
Hummingbird based systems. The symptoms are that the system time is
seen to jump forward 3 days, 6 hours, 11 minutes give or take a few
seconds. In many cases the system then hangs some time afterwards.
I've spotted a race condition in the code to read the STICK register.
I could not work out why 3d, 6h, 11m is important but guess that it is
due to the 2^32 jump of STICK (forwards on one read and then the next
read will seem to be backwards) during a timer interrupt. I'm guessing
that a change of -2^32 will get converted to a large unsigned
increment after the arithmetic manipulation between STICK,
nanoseconds, jiffies etc.
I did a test where I modified __hbird_read_stick to artificially
inject rollover faults forcefully every few seconds. With this I saw
the clock jump over 6 times in 12 hours compared to once every month
or so.
Signed-off-by: Richard Mortimer <richm@oldelvet.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
arch: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a window where a probe gets removed right after the probe is hit
on some different cpu. In this case probe handlers can't find a matching
probe instance related to break address. In this case we need to read the
original instruction at break address to see if that is not a break/int3
instruction and recover safely.
Previous code had a bug where we were not checking for the above race in
case of reentrant probes and the below patch fixes this race.
Tested on IA64, Powerpc, x86_64.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc. All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.
This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s) do { } while(0)
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since Kprobes runtime exception handlers is now lock free as this code path is
now using RCU to walk through the list, there is no need for the
register/unregister{_kprobe} to use spin_{lock/unlock}_isr{save/restore}. The
serialization during registration/unregistration is now possible using just a
mutex.
In the above process, this patch also fixes a minor memory leak for x86_64 and
powerpc.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft. We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures. Also remove some
superflous includes in fs/compat_ioctl.c
Tested on ppc64.
[1] parisc still had it's PPP handler left, which is not fully correct
for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
kick in for all netdevice users. We can introduce a proper handler
in one of the next patch series by adding a compat_ioctl method to
struct net_device but for now let's just kill it - parisc doesn't
compile in mainline anyway and I don't want this to block this
patchset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The comment in compat.c is wrong, every architecture provides a
get_compat_sigevent() for the IPC compat code already.
This basically moves the x86_64 version to common code and removes all the
others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Adrian Bunk <bunk@stusta.de>
- create one common dump_thread() prototype in kernel.h
- dump_thread() is only used in fs/binfmt_aout.c and can therefore be
removed on all architectures where CONFIG_BINFMT_AOUT is not
available
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't clobber register %l0 while checking TI_SYS_NOERROR value in
syscall return path. This bug was introduced by:
db7d9a4eb7
Problem narrowed down by Luis F. Ortiz and Richard Mortimer.
I tried using %l2 as suggested by Luis and that works for me.
Looking at the code I wonder if it makes sense to simplify the code
a little bit. The following works for me but I'm not sure how to
exercise the "NOERROR" codepath.
Signed-off-by: David S. Miller <davem@davemloft.net>
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.
Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface. We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns. It's a bit more code in the callers, but we have two sane routines
that do one thing well now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's definition is wrong (-1 means "no limit" not 999),
only the Sparc SunOS/Solaris compat code uses it, so
let's just kill it off completely from limits.h and
all referencing code.
Noticed by Ulrich Drepper.
Signed-off-by: David S. Miller <davem@davemloft.net>
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.
The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Earlier I unifdefed PageCompound, so that snd_pcm_mmap_control_nopage and
others can give out a 0-order component of a higher-order page, which won't
be mistakenly freed when zap_pte_range unmaps it. But many Bad page states
reported a PG_reserved was freed after all: I had missed that we need to
say __GFP_COMP to get compound page behaviour.
Some of these higher-order pages are allocated by snd_malloc_pages, some by
snd_malloc_dev_pages; or if SBUS, by sbus_alloc_consistent - but that has
no gfp arg, so add __GFP_COMP into its sparc32/64 implementations.
I'm still rather puzzled that DRM seems not to need a similar change.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new function, sbusfb_compat_ioctl() to
drivers/video/sbuslib.c and uses it as compat_ioctl in all sbus fb
drivers
This remove the last per-arch compat ioctl bits in
arch/sparc64/kernel/ioctl32.c so it would be nice if people could test
if this actually copiles and works and if yes apply it :)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed by Tom 'spot' Callaway.
Even on uniprocessor we always reported the number of physical
cpus in the system via /proc/cpuinfo. But when this got changed
to use num_possible_cpus() it always reads as "1" on uniprocessor.
This change was unintentional.
So scan the firmware device tree and count the number of cpu
nodes, and report that, as we always did.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE which is never used anyways.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some architectures define and use this type in their compat_ioctl code, but
all of them can easily use the identical ioctl_trans_handler_t type that is
defined in common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/drm/ now implements proper ->compat_ioctl methods, so this isn't
needed anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
all ioctls are 32bit compat clean, so the driver can use ->compat_ioctl
and ->unlocked_ioctl easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
implement a compat_ioctl handle in the driver instead of having table
entries in sparc64 ioctl32.c (I plan to get rid of the arch ioctl32.c
file eventually)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
all the ioctls in the driver are 32bit compat clean and don't need BKL,
so we can switch it to ->unlocked_ioctl and ->compat_ioctl trivially.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Would you mind applying the following patch that kills those two + the
m68k and Documentation/ references?
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
all these are handled by fs/compat_ioctls.c already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't know if we ever implemented this, but the only user in any 2.6
tree are the compat ioctls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The old keyboard driver is gone in 2.6, so the only user left are the
compat ioctls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The old sound drivers are gone in 2.6, so the only user left are the
compat ioctls.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
this inline routine in arch/sparc64/kernel/ioctl32.c is completely
unused and superceeded by compat_alloc_user_space()
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
It only serves to generate false-positive buildcheck warnings.
Just set it initially to tick_operations which uses the v9
%tick register which every sparc64 processor has.
Signed-off-by: David S. Miller <davem@davemloft.net>
It isn't needed any longer, as noted by Hugh Dickins.
We still need the flush routines, due to the one remaining
call site in hugetlb_prefault_arch_hook(). That can be
eliminated at some later point, however.
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc64 is unique among architectures in taking the page_table_lock in
its context switch (well, cris does too, but erroneously, and it's not
yet SMP anyway).
This seems to be a private affair between switch_mm and activate_mm,
using page_table_lock as a per-mm lock, without any relation to its uses
elsewhere. That's fine, but comment it as such; and unlock sooner in
switch_mm, more like in activate_mm (preemption is disabled here).
There is a block of "if (0)"ed code in smp_flush_tlb_pending which would
have liked to rely on the page_table_lock, in switch_mm and elsewhere;
but its comment explains how dup_mmap's flush_tlb_mm defeated it. And
though that could have been changed at any time over the past few years,
now the chance vanishes as we push the page_table_lock downwards, and
perhaps split it per page table page. Just delete that block of code.
Which leaves the mysterious spin_unlock_wait(&oldmm->page_table_lock)
in kernel/fork.c copy_mm. Textual analysis (supported by Nick Piggin)
suggests that the comment was written by DaveM, and that it relates to
the defeated approach in the sparc64 smp_flush_tlb_pending. Just delete
this block too.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc64 prom_callback and new_setup_frame32 each operates on a user page
table without holding lock, and no doubt they've good reason. But I'd
feel more confident if they were to do a "pte = *ptep" and then operate
on pte, rather than re-evaluating *ptep.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the arch/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in arch/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reorganize the preempt_disable/enable calls to eliminate the extra preempt
depth. Changes based on Paul McKenney's review suggestions for the kprobes
RCU changeset.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes to the arch kprobes infrastructure to take advantage of the locking
changes introduced by usage of RCU for synchronization. All handlers are now
run without any locks held, so they have to be re-entrant or provide their own
synchronization.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sparc64 changes to track kprobe execution on a per-cpu basis. We now track
the kprobe state machine independently on each cpu using an arch specific
kprobe control block.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following set of patches are aimed at improving kprobes scalability. We
currently serialize kprobe registration, unregistration and handler execution
using a single spinlock - kprobe_lock.
With these changes, kprobe handlers can run without any locks held. It also
allows for simultaneous kprobe handler executions on different processors as
we now track kprobe execution on a per processor basis. It is now necessary
that the handlers be re-entrant since handlers can run concurrently on
multiple processors.
All changes have been tested on i386, ia64, ppc64 and x86_64, while sparc64
has been compile tested only.
The patches can be viewed as 3 logical chunks:
patch 1: Reorder preempt_(dis/en)able calls
patches 2-7: Introduce per_cpu data areas to track kprobe execution
patches 8-9: Use RCU to synchronize kprobe (un)registration and handler
execution.
Thanks to Maneesh Soni, James Keniston and Anil Keshavamurthy for their
review and suggestions. Thanks again to Anil, Hien Nguyen and Kevin Stafford
for testing the patches.
This patch:
Reorder preempt_disable/enable() calls in arch kprobes files in preparation to
introduce locking changes. No functional changes introduced by this patch.
Signed-off-by: Ananth N Mavinakayahanalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define jiffies_64 in kernel/timer.c rather than having 24 duplicated
defines in each architecture.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
TIOCSTART and TIOCSTOP are defined in asm/ioctls.h and asm/termios.h by
various architectures but not actually implemented anywhere but in the IRIX
compatibility layer, so remove their COMPATIBLE_IOCTL from parisc, ppc64
and sparc64.
Move the TIOCSLTC COMPATIBLE_IOCTL to common code, guided by an ifdef to
only show up on architectures that support it (same as the code handling it
in tty_ioctl.c), aswell as it's brother TIOCGLTC that wasn't handled so
far.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type. And how is rss
initialized? By set_mm_counter, all over the place. Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Doing a "SUNW,stop-self" firmware call on the other cpus is not the
correct thing to do when dropping into the firmware for a halt,
reboot, or power-off.
For now, just do nothing to quiet the other cpus, as the system should
be quiescent enough. Later we may decide to implement smp_send_stop()
like the other SMP platforms do.
Based upon a report from Christopher Zimmermann.
Signed-off-by: David S. Miller <davem@davemloft.net>
The hairy fast allocator in the sparc64 PCI IOMMU code
has a hard limit of 256 pages. Certain devices can
exceed this when performing very large I/Os.
So replace with a more simple allocator, based largely
upon the arch/ppc64/kernel/iommu.c code.
Signed-off-by: David S. Miller <davem@davemloft.net>
All the PCI controller drivers were doing the same thing
setting up the IOMMU software state, put it all in one spot.
Signed-off-by: David S. Miller <davem@davemloft.net>
The sequence to move over to the Linux trap tables from
the firmware ones needs to be more air tight. It turns
out that to be %100 safe we do need to be able to translate
OBP mappings in our TLB miss handlers early.
In order not to eat up a lot of kernel image memory with
static page tables, just use the translations array in
the OBP TLB miss handlers. That solves the bulk of the
problem.
Furthermore, to make sure the OBP TLB miss path will work
even before the fixed MMU globals are loaded, explicitly
load %g1 to TLB_SFSR at the beginning of the i-TLB and
d-TLB miss handlers.
To ease the OBP TLB miss walking of the prom_trans[] array,
we sort it then delete all of the non-OBP entries in there
(for example, there are entries for the kernel image itself
which we're not interested in at all).
We also save about 32K of kernel image size with this change.
Not a bad side effect :-)
There are still some reasons why trampoline.S can't use the
setup_trap_table() yet. The most noteworthy are:
1) OBP boots secondary processors with non-bias'd stack for
some reason. This is easily fixed by using a small bootup
stack in the kernel image explicitly for this purpose.
2) Doing a firmware call via the normal C call prom_set_trap_table()
goes through the whole OBP enter/exit sequence that saves and
restores OBP and Linux kernel state in the MMUs. This path
unfortunately does a "flush %g6" while loading up the OBP locked
TLB entries for the firmware call.
If we setup the %g6 in the trampoline.S code properly, that
is in the PAGE_OFFSET linear mapping, but we're not on the
kernel trap table yet so those addresses won't translate properly.
One idea is to do a by-hand firmware call like we do in the
early bootup code and elsewhere here in trampoline.S But this
fails as well, as aparently the secondary processors are not
booted with OBP's special locked TLB entries loaded. These
are necessary for the firwmare to processes TLB misses correctly
up until the point where we take over the trap table.
This does need to be resolved at some point.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were not doing alignment properly when remapping the kernel image.
What we want is a 4MB aligned physical address to map at KERNBASE.
Mistakedly we were 4MB aligning the virtual address where the kernel
initially sits, that's wrong.
Instead, we should PAGE align the virtual address, then 4MB align the
physical address result the prom gives to us.
Signed-off-by: David S. Miller <davem@davemloft.net>
On the boot processor, we need to do the move onto the Linux trap
table a little bit differently else we'll take unhandlable faults in
the firmware address space.
Previously we would do the following:
1) Disable PSTATE_IE in %pstate.
2) Set %tba by hand to sparc64_ttable_tl0
3) Initialize alternate, mmu, and interrupt global
trap registers.
4) Call prom_set_traptable()
That doesn't work very well actually with the way we boot the kernel
VM these days. It worked by luck on many systems because the firmware
accesses for the prom_set_traptable() call happened to be loaded into
the TLB already, something we cannot assume.
So the new scheme is this:
1) Clear PSTATE_IE in %pstate and set %pil to 15
2) Call prom_set_traptable()
3) Initialize alternate, mmu, and interrupt global
trap registers.
and this works quite well. This sequence has been moved into a
callable function in assembler named setup-trap_table(). The idea is
that eventually trampoline.S can use this code as well. That isn't
possible currently due to some complications, but eventually we should
be able to do it.
Thanks to Meelis Roos for the Ultra5 boot failure report.
Signed-off-by: David S. Miller <davem@davemloft.net>
irq.c is missing the inclusion of asm/io.h, which causes
readb() and writeb() the be undefined.
Signed-off-by: Sven Hartge <hartge@ds9.argh.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to use stricter memory barriers around the block
load and store instructions we use to save and restore the
FPU register file.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of code patching to handle the page size fields in
the context registers, just use variables from which we get
the proper values.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Use cpudata cache line sizes, not magic constants.
2) Align start address in cheetah case so we do not get
unaligned address traps. (pgrep was good at triggering
this, via /proc/${pid}/cmdline accesses)
Signed-off-by: David S. Miller <davem@davemloft.net>
Delete all of the code working with sp_banks[] and replace
with clean acquisition and sorting of physical memory
parameters from the firmware.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were not calling kernel_mna_trap_fault() correctly.
Instead of being fancy, just return 0 vs. -EFAULT from
the assembler stubs, and handle that return value as
appropriate.
Create an "__retl_efault" stub for assembler exception
table entries and use it where possible.
Signed-off-by: David S. Miller <davem@davemloft.net>
The funny "range" exception table entries we had were only
used by the compat layer socketcall assembly, and it wasn't
even needed there.
For free we now get proper exception table sorting and fast
binary searching.
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, the us3_cpufreq driver can work on Ultra-IV and IV+.
They use the SAFARI bus register to control the clock divider
just like Ultra-III and III+ do.
Signed-off-by: David S. Miller <davem@davemloft.net>
At boot time, determine the D-cache, I-cache and E-cache size and
line-size. Use them in cache flushes when appropriate.
This change was motivated by discovering that the D-cache on
UltraSparc-IIIi and later are 64K not 32K, and the flushes done by the
Cheetah error handlers were assuming a 32K size.
There are still some pieces of code that are hard coding things and
will need to be fixed up at some point.
While we're here, fix the D-cache and I-cache parity error handlers
to run with interrupts disabled, and when the trap occurs at trap
level > 1 log the event via a counter displayed in /proc/cpuinfo.
Signed-off-by: David S. Miller <davem@davemloft.net>
The trick is that we do the kernel linear mapping TLB miss starting
with an instruction sequence like this:
ba,pt %xcc, kvmap_load
xor %g2, %g4, %g5
succeeded by an instruction sequence which performs a full page table
walk starting at swapper_pg_dir.
We first take over the trap table from the firmware. Then, using this
constant PTE generation for the linear mapping area above, we build
the kernel page tables for the linear mapping.
After this is setup, we patch that branch above into a "nop", which
will cause TLB misses to fall through to the full page table walk.
With this, the page unmapping for CONFIG_DEBUG_PAGEALLOC is trivial.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of all of this cpu-specific code to remap the kernel
to the correct location, use portable firmware calls to do
this instead.
What we do now is the following in position independant
assembler:
chosen_node = prom_finddevice("/chosen");
prom_mmu_ihandle_cache = prom_getint(chosen_node, "mmu");
vaddr = 4MB_ALIGN(current_text_addr());
prom_translate(vaddr, &paddr_high, &paddr_low, &mode);
prom_boot_mapping_mode = mode;
prom_boot_mapping_phys_high = paddr_high;
prom_boot_mapping_phys_low = paddr_low;
prom_map(-1, 8 * 1024 * 1024, KERNBASE, paddr_low);
and that replaces the massive amount of by-hand TLB probing and
programming we used to do here.
The new code should also handle properly the case where the kernel
is mapped at the correct address already (think: future kexec
support).
Consequently, the bulk of remap_kernel() dies as does the entirety
of arch/sparc64/prom/map.S
We try to share some strings in the PROM library with the ones used
at bootup, and while we're here mark input strings to oplib.h routines
with "const" when appropriate.
There are many more simplifications now possible. For one thing, we
can consolidate the two copies we now have of a lot of cpu setup code
sitting in head.S and trampoline.S.
This is a significant step towards CONFIG_DEBUG_PAGEALLOC support.
Signed-off-by: David S. Miller <davem@davemloft.net>
This was kind of ugly, and actually buggy. The bug was that
we didn't handle a machine with memory starting > 4GB. If
the 'prompmd' was allocated in physical memory > 4GB we'd
croak because the obp_iaddr_patch and obp_daddr_patch things
only supported a 32-bit physical address.
So fix this by just loading the appropriate values from two
variables in the kernel image, which is locked into the TLB
and thus accesses to them can't cause a recursive TLB miss.
Signed-off-by: David S. Miller <davem@davemloft.net>
Arrange the modules, OBP, and vmalloc areas such that a range
verification can be done quite minimally.
Signed-off-by: David S. Miller <davem@davemloft.net>
This showed that arch/sparc64/kernel/ptrace.c was not getting
the define properly, and thus the code protected by this ifdef
was never actually compiled before. So fix that too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Because we use byte loads/stores to cons up the value
in and out of registers, we can't expect the ASI endianness
setting to take care of this for us. So do it by hand.
This case is triggered by drivers/block/aoe/aoecmd.c in the
ataid_complete() function where it goes:
/* word 100: number lba48 sectors */
ssize = le64_to_cpup((__le64 *) &id[100<<1]);
This &id[100<<1] address is 4 byte, rather than 8 byte aligned,
thus triggering the unaligned exception.
Signed-off-by: David S. Miller <davem@davemloft.net>
Several implementations were essentialy a common piece of C code using
the cmpxchg() macro. Put the implementation in one spot that everyone
can share, and convert sparc64 over to using this.
Alpha is the lone arch-specific implementation, which codes up a
special fast path for the common case in order to avoid GP reloading
which a pure C version would require.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code. It does the following
things:
- consolidates and enhances the spinlock/rwlock debugging code
- simplifies the asm/spinlock.h files
- encapsulates the raw spinlock type and moves generic spinlock
features (such as ->break_lock) into the generic code.
- cleans up the spinlock code hierarchy to get rid of the spaghetti.
Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c. (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)
Also, i've enhanced the rwlock debugging facility, it will now track
write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.
The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:
include/asm-i386/spinlock_types.h | 16
include/asm-x86_64/spinlock_types.h | 16
I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:
SMP | UP
----------------------------|-----------------------------------
asm/spinlock_types_smp.h | linux/spinlock_types_up.h
linux/spinlock_types.h | linux/spinlock_types.h
asm/spinlock_smp.h | linux/spinlock_up.h
linux/spinlock_api_smp.h | linux/spinlock_api_up.h
linux/spinlock.h | linux/spinlock.h
/*
* here's the role of the various spinlock/rwlock related include files:
*
* on SMP builds:
*
* asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
* initializers
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel
* implementations, mostly inline assembly code
*
* (also included on UP-debug builds:)
*
* linux/spinlock_api_smp.h:
* contains the prototypes for the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*
* on UP builds:
*
* linux/spinlock_type_up.h:
* contains the generic, simplified UP spinlock type.
* (which is an empty structure on non-debug builds)
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* linux/spinlock_up.h:
* contains the __raw_spin_*()/etc. version of UP
* builds. (which are NOPs on non-debug, non-preempt
* builds)
*
* (included on UP-non-debug builds:)
*
* linux/spinlock_api_up.h:
* builds the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*/
All SMP and UP architectures are converted by this patch.
arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.
From: Grant Grundler <grundler@parisc-linux.org>
Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
Builds 32-bit SMP kernel (not booted or tested). I did not try to build
non-SMP kernels. That should be trivial to fix up later if necessary.
I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids
some ugly nesting of linux/*.h and asm/*.h files. Those particular locks
are well tested and contained entirely inside arch specific code. I do NOT
expect any new issues to arise with them.
If someone does ever need to use debug/metrics with them, then they will
need to unravel this hairball between spinlocks, atomic ops, and bit ops
that exist only because parisc has exactly one atomic instruction: LDCW
(load and clear word).
From: "Luck, Tony" <tony.luck@intel.com>
ia64 fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There were three changes necessary in order to allow
sparc64 to use setup-res.c:
1) Sparc64 roots the PCI I/O and MEM address space using
parent resources contained in the PCI controller structure.
I'm actually surprised no other platforms do this, especially
ones like Alpha and PPC{,64}. These resources get linked into the
iomem/ioport tree when PCI controllers are probed.
So the hierarchy looks like this:
iomem --|
PCI controller 1 MEM space --|
device 1
device 2
etc.
PCI controller 2 MEM space --|
...
ioport --|
PCI controller 1 IO space --|
...
PCI controller 2 IO space --|
...
You get the idea. The drivers/pci/setup-res.c code allocates
using plain iomem_space and ioport_space as the root, so that
wouldn't work with the above setup.
So I added a pcibios_select_root() that is used to handle this.
It uses the PCI controller struct's io_space and mem_space on
sparc64, and io{port,mem}_resource on every other platform to
keep current behavior.
2) quirk_io_region() is buggy. It takes in raw BUS view addresses
and tries to use them as a PCI resource.
pci_claim_resource() expects the resource to be fully formed when
it gets called. The sparc64 implementation would do the translation
but that's absolutely wrong, because if the same resource gets
released then re-claimed we'll adjust things twice.
So I fixed up quirk_io_region() to do the proper pcibios_bus_to_resource()
conversion before passing it on to pci_claim_resource().
3) I was mistakedly __init'ing the function methods the PCI controller
drivers provide on sparc64 to implement some parts of these
routines. This was, of course, easy to fix.
So we end up with the following, and that nasty SPARC64 makefile
ifdef in drivers/pci/Makefile is finally zapped.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some PCI devices (e.g. 3c905B, 3c556B) lose all configuration
(including BARs) when transitioning from D3hot->D0. This leaves such
a device in an inaccessible state. The patch below causes the BARs
to be restored when enabling such a device, so that its driver will
be able to access it.
The patch also adds pci_restore_bars as a new global symbol, and adds a
correpsonding EXPORT_SYMBOL_GPL for that.
Some firmware (e.g. Thinkpad T21) leaves devices in D3hot after a
(re)boot. Most drivers call pci_enable_device very early, so devices
left in D3hot that lose configuration during the D3hot->D0 transition
will be inaccessible to their drivers.
Drivers could be modified to account for this, but it would
be difficult to know which drivers need modification. This is
especially true since often many devices are covered by the same
driver. It likely would be necessary to replicate code across dozens
of drivers.
The patch below should trigger only when transitioning from D3hot->D0
(or at boot), and only for devices that have the "no soft reset" bit
cleared in the PM control register. I believe it is safe to include
this patch as part of the PCI infrastructure.
The cleanest implementation of pci_restore_bars was to call
pci_update_resource. Unfortunately, that does not currently exist
for the sparc64 architecture. The patch below includes a null
implemenation of pci_update_resource for sparc64.
Some have expressed interest in making general use of the the
pci_restore_bars function, so that has been exported to GPL licensed
modules.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since GCC has to emit a call and a delay slot to the
out-of-line "membar" routines in arch/sparc64/lib/mb.S
it is much better to just do the necessary predicted
branch inline instead as:
ba,pt %xcc, 1f
membar #whatever
1:
instead of the current:
call membar_foo
dslot
because this way GCC is not required to allocate a stack
frame if the function can be a leaf function.
This also makes this bug fix easier to backport to 2.4.x
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the sparc64 architecture specific changes to prevent the
possible race conditions.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
64 bit architectures all implement their own compatibility sys_open(),
when in fact the difference is simply not forcing the O_LARGEFILE
flag. So use the a common function instead.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch cleans up a commonly repeated set of changes to the NTP state
variables by adding two helper inline functions:
ntp_clear(): Clears the ntp state variables
ntp_synced(): Returns 1 if the system is synced with a time server.
This was compile tested for alpha, arm, i386, x86-64, ppc64, s390, sparc,
sparc64.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Need to use compat struct sizes and compat_sys_ioctl().
Reported by Adrian Bunk via kernel bugzilla #2683
Signed-off-by: David S. Miller <davem@davemloft.net>
We can put the __softirq_pending mask in the cpudata,
no need for the silly NR_CPUS array in kernel/softirq.c
Signed-off-by: David S. Miller <davem@davemloft.net>
It appears that a memory barrier soon after a mispredicted
branch, not just in the delay slot, can cause the hang
condition of this cpu errata.
So move them out-of-line, and explicitly put them into
a "branch always, predict taken" delay slot which should
fully kill this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the spinlock routines were moved out of line into
kernel/spinlock.c this made it so that the debugging
spinlocks record lock acquisition program counts in the
kernel/spinlock.c functions not in their callers.
This makes the debugging info kind of useless.
So record the correct caller's program counter and
now this feature is useful once more.
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed sparc64 architecture specific users of asm/segment.h and
asm-sparc64/segment.h itself
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current uncorrectable error handling was poor enough
that the processor could just loop taking the same
trap over and over again. Fix things up so that we
at least get a log message and perhaps even some register
state.
In the process, much consolidation became possible,
particularly with the correctable error handler.
Prefix assembler and C function names with "spitfire"
to indicate that these are for Ultra-I/II/IIi/IIe only.
More work is needed to make these routines robust and
featureful to the level of the Ultra-III error handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
Verify we really are taking a data access exception trap, at TL1, from
one of the window spill/fill handlers.
Else call a new function, data_access_exception_tl1, to log the error.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Read ASI_IMMU SFSR not ASI_DMMU.
2) IMMU has no SFAR, read TPC instead
3) Delete old and incorrect comment about the DTLB protection
trap having a dependency on the SFSR contents in order to
function correctly
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been reported that the way Linux handles NODEFER for signals is
not consistent with the way other Unix boxes handle it. I've written a
program to test the behavior of how this flag affects signals and had
several reports from people who ran this on various Unix boxes,
confirming that Linux seems to be unique on the way this is handled.
The way NODEFER affects signals on other Unix boxes is as follows:
1) If NODEFER is set, other signals in sa_mask are still blocked.
2) If NODEFER is set and the signal is in sa_mask, then the signal is
still blocked. (Note: this is the behavior of all tested but Linux _and_
NetBSD 2.0 *).
The way NODEFER affects signals on Linux:
1) If NODEFER is set, other signals are _not_ blocked regardless of
sa_mask (Even NetBSD doesn't do this).
2) If NODEFER is set and the signal is in sa_mask, then the signal being
handled is not blocked.
The patch converts signal handling in all current Linux architectures to
the way most Unix boxes work.
Unix boxes that were tested: DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
* NetBSD was the only other Unix to behave like Linux on point #2. The
main concern was brought up by point #1 which even NetBSD isn't like
Linux. So with this patch, we leave NetBSD as the lonely one that
behaves differently here with #2.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pcibios_bus_to_resource is exported on all architectures except ia64
and sparc. Add exports for the two missing architectures. Needed when
Yenta socket support is compiled as a module.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC 4.x really dislikes the games we are playing in
unaligned.c, and the cleanest way to fix this is to
move things into assembler.
Noted by Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert commit fec59a711e, which is
breaking sparc64 that doesn't have a working pci_update_resource.
We'll re-do this after 2.6.13 when we'll do it all properly.
Some PCI devices (e.g. 3c905B, 3c556B) lose all configuration
(including BARs) when transitioning from D3hot->D0. This leaves such
a device in an inaccessible state. The patch below causes the BARs
to be restored when enabling such a device, so that its driver will
be able to access it.
The patch also adds pci_restore_bars as a new global symbol, and adds a
correpsonding EXPORT_SYMBOL_GPL for that.
Some firmware (e.g. Thinkpad T21) leaves devices in D3hot after a
(re)boot. Most drivers call pci_enable_device very early, so devices
left in D3hot that lose configuration during the D3hot->D0 transition
will be inaccessible to their drivers.
Drivers could be modified to account for this, but it would
be difficult to know which drivers need modification. This is
especially true since often many devices are covered by the same
driver. It likely would be necessary to replicate code across dozens
of drivers.
The patch below should trigger only when transitioning from D3hot->D0
(or at boot), and only for devices that have the "no soft reset" bit
cleared in the PM control register. I believe it is safe to include
this patch as part of the PCI infrastructure.
The cleanest implementation of pci_restore_bars was to call
pci_update_resource. Unfortunately, that does not currently exist
for the sparc64 architecture. The patch below includes a null
implemenation of pci_update_resource for sparc64.
Some have expressed interest in making general use of the the
pci_restore_bars function, so that has been exported to GPL licensed
modules.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with. Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These two bits were accesses non-atomically from assembler
code. So, in order to eliminate any potential races resulting
from that, move these pieces of state into two bytes elsewhere
in struct thread_info.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is only used by some localized code in irq.c, and also
delete enable_prom_timer() as that is totally unused.
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/sparc64/kernel/smp.c:48: error: parse error before "__attribute__"
arch/sparc64/kernel/smp.c:49: error: parse error before "__attribute__"
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also fix a bug in 32-bit syscall tracing. We forgot to update
this code when we moved over to the convention that all 32-bit
syscall arguments are zero extended by default.
Signed-off-by: David S. Miller <davem@davemloft.net>
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.
Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use macro instead of magic value for Tomatillo discard-
timeout interrupt enable register bit.
Leave OBP programming PTO value unless Tomatillo and
version >= 0x2.
If no-bus-parking property is present, explicitly clear
PCICTRL_PARK bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
This was the main impetus behind adding the PCI IRQ shim.
In order to properly order DMA writes wrt. interrupts, you have to
write to a PCI controller register, then poll for that bit clearing.
There is one bit for each interrupt source, and setting this register
bit tells Tomatillo to drain all pending DMA from that device.
Furthermore, Tomatillo's with revision less than 4 require us to do a
block store due to some memory transaction ordering issues it has on
JBUS.
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows a PCI controller to shim into IRQ delivery
so that DMA queues can be drained, if necessary.
If some bus specific code needs to run before an IRQ
handler is invoked, the bus driver simply needs to setup
the function pointer in bucket->irq_info->pre_handler and
the two args bucket->irq_info->pre_handler_arg[12].
The Schizo PCI driver is converted over to use a pre-handler
for the DMA write-sync processing it needs when a device
is behind a PCI->PCI bus deeper than the top-level APB
bridges.
While we're here, clean up all of the action allocation
and handling. Now, we allocate the irqaction as part of
the bucket->irq_info area. There is an array of 4 irqaction
(for PCI irq sharing) and a bitmask saying which entries
are active.
The bucket->irq_info is allocated at build_irq() time, not
at request_irq() time. This simplifies request_irq() and
free_irq() tremendously.
The SMP dynamic IRQ retargetting code got removed in this
change too. It was disabled for a few months now, and we
can resurrect it in the future if we want.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only real user was the assembler floppy interrupt
handler, which does not need to be in assembly.
This makes it so that there are less pieces of code which
know about the internal layout of ivector_table[] and
friends.
Signed-off-by: David S. Miller <davem@davemloft.net>
In particular, avoid membar instructions in the delay
slot of a jmpl instruction.
UltraSPARC-I, II, IIi, and IIe have a bug, documented in
the UltraSPARC-IIi User's Manual, Appendix K, Erratum 51
The long and short of it is that if the IMU unit misses
on a branch or jmpl, and there is a store buffer synchronizing
membar in the delay slot, the chip can stop fetching instructions.
If interrupts are enabled or some other trap is enabled, the
chip will unwedge itself, but performance will suffer.
We already had a workaround for this bug in a few spots, but
it's better to have the entire tree sanitized for this rule.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is based on work by Carlos O'Donell and Matthew Wilcox. It
introduces/updates the compat_time_t type and uses it for compat siginfo
structures. I have built this on ppc64 and x86_64.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch includes sparc64 architecture specific changes to support temporary
disarming on reentrancy of probes.
Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time. The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address. This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:
*p->addr = BREAKPOINT_INSTRUCTION;
The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:
* void arch_arm_kprobe(struct kprobe *p)
* void arch_disarm_kprobe(struct kprobe *p)
and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).
I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So... I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.
So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.
The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.
The problem is twofold:
1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.
So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.
2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.
The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.
The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.
Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.
Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.
Signed-off-by: Wolfgang Wander <wwc@rentec.com>
Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The initial peek read PIO of the match register is just a waste.
Just do the flush writes first, as that is more efficient.
Signed-off-by: David S. Miller <davem@davemloft.net>
Firstly, if the direction is TODEVICE, then dirty data in the
streaming cache is impossible so we can elide the flush-flag
synchronization in that case.
Next, the context allocator is broken. It is highly likely
that contexts get used multiple times for different dma
mappings, which confuses the strbuf flushing code and makes
it run inefficiently.
Signed-off-by: David S. Miller <davem@davemloft.net>
Older UltraSPARC-III chips have a P-Cache bug that makes us disable it
by default at boot time.
However, this does hurt performance substantially, particularly with
memcpy(), and the bug is _incredibly_ obscure. I have never seen it
triggered in practice, ever.
So provide a "-P" boot option that forces the P-Cache on. It taints
the kernel, so if it does trigger and cause some data corruption or
OOPS, we will find out in the logs that this option was on when it
happened.
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent change to add a timeout to strbuf flushing had
a negative performance impact. The udelay()'s are too long,
and they were done in the wrong order wrt. the register read
checks. Fix both, and things are happy again.
There are more possible improvements in this area. In fact,
PCI streaming buffer flushing seems to be part of the bottleneck
in network receive performance on my SunBlade1000 box.
Signed-off-by: David S. Miller <davem@davemloft.net>
If some hardware error occurs and the flush flag never updates,
we will hang forever in these routines. Add a timeout, and
print out a diagnostic if it is reached.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently sparc and sparc64's UP cpu_idle() checks current pid. This
is old time legacy. Now it's paranoia.
Signed-off-by: Coywolf Qi Hunt <coywolf@lovecn.org>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is some race whereby IRQs get stuck, the IRQ status
is pending but no processor actually handles the IRQ vector
and thus the interrupt.
This is a temporary workaround.
Signed-off-by: David S. Miller <davem@davemloft.net>
We would never advance the goal_cpu counter like we
should, so all IRQs would go to a single processor.
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert most of the current code that uses _NSIG directly to instead use
valid_signal(). This avoids gcc -W warnings and off-by-one errors.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
void * __iomem foo is not a pointer to iomem - it's an iomem variable
containing void *. A pile of such guys in arch/sparc64/kernel/time.c,
drivers/sbus/char/rtc.c and include/asm-sparc64/mostek.h turned into
intended void __iomem *.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide support for drivers/char/rtc.c ioctls in the
Mostek rtc driver as well as the Sparc specific RTCGET
and RTCSET.
This allows userspace to be much less messy. Currently
util-linux and other spots jump through hoops trying
various ioctl variants until it hits the right one whatever
driver actually being used supports.
Eventually all of this should move over to the genrtc.c
driver, but not today...
While we are here, fix up the register types for sparse.
Thanks to Frans Pop for helping point out this issue.
Signed-off-by: David S. Miller <davem@davemloft.net>
Like Alpha, sparc64's struct stat was defined before we had the
nanosecond et al. fields added. So like Alpha I have to cons up a
struct stat64 to get this stuff. I'll work on the glibc bits soon.
Also, we were forgetting to fill in the nanosecond fields in the sparc
compat stat64 syscalls.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The compat routine to copy over this data structure was not
handling SI_POLL correctly, breaking various fcntl() variants
in compat tasks.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We were flushing the D-cache excessively for ptrace() processing
and this makes debugging threads so slow as to be totally unusable.
All process page accesses via ptrace() go via access_process_vm().
This routine, for each process page, uses get_user_pages(). That
in turn does a flush_dcache_page() on the child pages before we
copy in/out the ptrace request data.
Therefore, all we need to do after the data movement is:
1) Flush the D-cache pages if the kernel maps the page to a different
color than userspace does.
2) If we wrote to the page, we need to flush the I-cache on older cpus.
Previously we just flushed the entire cache at the end of a ptrace()
request, and that was beyond stupid.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SunOS aparently had this weird PTRACE_CONT semantic which
we copied. If the addr argument is something other than
1, it sets the process program counter to whatever that
value is.
This is different from every other Linux architecture, which
don't do anything with the addr and data args.
This difference in particular breaks the Linux native GDB support
for fork and vfork tracing on sparc and sparc64.
There is no interest in running SunOS binaries using this weird
PTRACE_CONT behavior, so just delete it so we behave like other
platforms do.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple message queue system call entries for compat tasks
were not using the necessary compat_sys_*() functions, causing
some glibc test cases to fail.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!