Add support for the flexible mmap memory layout (as described in
http://lwn.net/Articles/91829). This is especially very interesting on
parisc since we currently only support 32bit userspace (even with a
64bit Linux kernel).
Signed-off-by: Helge Deller <deller@gmx.de>
This commit:
f8dae00684d678afa13041ef170cecfd1297ed40: parisc: Ensure full cache coherency for kmap/kunmap
caused negative caching side-effects, e.g. hanging processes with expect and
too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems.
This patch now partly reverts it and has been in production use on our debian buildd
makeservers since a week without any major problems.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Helge Deller noted a few weeks ago problems with the AIO support on
parisc. This change is the result of numerous iterations on how best to
deal with this problem.
The solution adopted here is to provide full cache coherency in a
uniform manner on all parisc systems. This involves calling
flush_dcache_page() on kmap operations and flush_kernel_dcache_page() on
kunmap operations. As a result, the copy_user_page() and
clear_user_page() functions can be removed and the overall code is
simpler.
The change ensures that both userspace and kernel aliases to a mapped
page are invalidated and flushed. This is necessary for the correct
operation of PA8800 and PA8900 based systems which do not support
inequivalent aliases.
With this change, I have observed no cache related issues on c8000 and
rp3440. It is now possible for example to do kernel builds with "-j64"
on four way systems.
On systems using XFS file systems, the patch recently posted by Mikulas
Patocka to "fix crash using XFS on loopback" is needed to avoid a hang
caused by an uninitialized lock passed to flush_dcache_page() in the
page struct.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Use dev_is_pci() instead of equivalent local function.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When building a 64bit kernel sometimes functions in the .init section were not
able to reach the standard kernel function. Main reason for this problem is,
that the linkage tables (.plt, .opd, .dlt) tend to become pretty huge and thus
the distance gets too big for short calls.
One option to avoid this is to use the -mlong-calls compiler option, but this
increases the binary size and introduces a performance penalty.
Instead, with this patch we just lay out the binary differently. Init code is
stored first, followed by text, R/O and finally R/W data. This means, that init
and text code is now much closer to each other, which is sufficient to reach
each other by short calls.
Signed-off-by: Helge Deller <deller@gmx.de>
Sadly the correct names for machines which end with a question-mark aren't
known, so let's give it a best-guessed-name.
Signed-off-by: Helge Deller <deller@gmx.de>
locale-gen on Debian showed a strange problem on parisc:
mmap2(NULL, 536870912, PROT_NONE, MAP_SHARED, 3, 0) = 0x42a54000
mmap2(0x42a54000, 103860, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 3, 0) = -1 EINVAL (Invalid argument)
Basically it was just trying to re-mmap() a file at the same address
which it was given by a previous mmap() call. But this remapping failed
with EINVAL.
The problem is, that when MAP_FIXED and MAP_SHARED flags were used, we didn't
included the mapping-based offset when we verified the alignment of the given
fixed address against the offset which we calculated it in the previous call.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # 3.10+
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc()
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull IRQ changes from Ingo Molnar:
"The biggest change this cycle are the softirq/hardirq stack
interaction and nesting fixes, cleanups and reorganizations from
Frederic. This is the longer followup story to the softirq nesting
fix that is already upstream (commit ded797547548: "irq: Force hardirq
exit's softirq processing on its own stack")"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: bcm2835: Convert to use IRQCHIP_DECLARE macro
powerpc: Tell about irq stack coverage
x86: Tell about irq stack coverage
irq: Optimize softirq stack selection in irq exit
irq: Justify the various softirq stack choices
irq: Improve a bit softirq debugging
irq: Optimize call to softirq on hardirq exit
irq: Consolidate do_softirq() arch overriden implementations
x86/irq: Correct comment about i8259 initialization
In case we fail to power up other CPUs in a SMP system, the kernel
currently shows a wrong number of online CPUs. This change makes the
output more verbose on how many of the CPUs are online. Example:
CPU(s): 1 out of 2 PA8800 (Mako) at 900.000000 MHz online.
Signed-off-by: Helge Deller <deller@gmx.de>
The number of IPI calls is already visible as per-cpu IPI irq counters
in/proc/cpuinfo, so let's drop this additional counting.
This partly reverts:
cd85d55 parisc: more irq statistics in /proc/interrupts
Signed-off-by: Helge Deller <deller@gmx.de>
Provide a macro ASM_EXCEPTIONTABLE_ENTRY() to create exception table
entries and convert all open-coded places to use that macro.
This patch is a first step toward creating a exception table which only
holds 32bit pointers even on a 64bit kernel. That way in my own kernel
I was able to reduce the in-kernel exception table from 44kB to 22kB.
Signed-off-by: Helge Deller <deller@gmx.de>
Since the beginning of the parisc-linux port, sometimes 64bit SMP kernels were
not able to bring up other CPUs than the monarch CPU and instead crashed the
kernel. The reason was unclear, esp. since it involved various machines (e.g.
J5600, J6750 and SuperDome). Testing showed, that those crashes didn't happened
when less than 4GB were installed, or if a 32bit Linux kernel was booted.
In the end, the fix for those SMP problems is trivial:
During the early phase of the initialization of the CPUs, including the monarch
CPU, the PDC_PSW firmware function to enable WIDE (=64bit) mode is called.
It's documented that this firmware function may clobber various registers, and
one one of those possibly clobbered registers is %cr30 which holds the task
thread info pointer.
Now, if %cr30 would always have been clobbered, then this bug would have been
detected much earlier. But lots of testing finally showed, that - at least for
%cr30 - on some machines only the upper 32bits of the 64bit register suddenly
turned zero after the firmware call.
So, after finding the root cause, the explanation for the various crashes
became clear:
- On 32bit SMP Linux kernels all upper 32bit were zero, so we didn't faced this
problem.
- Monarch CPUs in 64bit mode always booted sucessfully, because the inital task
thread info pointer was below 4GB.
- Secondary CPUs booted sucessfully on machines with less than 4GB RAM because
the upper 32bit were zero anyay.
- Secondary CPus failed to boot if we had more than 4GB RAM and the task thread
info pointer was located above the 4GB boundary.
Finally, the patch to fix this problem is trivial by saving the %cr30 register
before the firmware call and restoring it afterwards.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # 2.6.12+
Signed-off-by: Helge Deller <deller@gmx.de>
This reverts commit 320c90be7b.
Christoph Hellwig <hch@infradead.org> commented:
This one shouldn't go in - Geert sent it a bit prematurely, as Lustre
shouldn't use it just to reimplement core VM functionality (which it
shouldn't use either, but that's a separate story).
Signed-off-by: Helge Deller <deller@gmx.de>
Running an "echo t > /proc/sysrq-trigger" crashes the parisc kernel. The
problem is, that in print_worker_info() we try to read the workqueue info via
the probe_kernel_read() functions which use pagefault_disable() to avoid
crashes like this:
probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq));
probe_kernel_read(&wq, &pwq->wq, sizeof(wq));
probe_kernel_read(name, wq->name, sizeof(name) - 1);
The problem here is, that the first probe_kernel_read(&pwq) might return zero
in pwq and as such the following probe_kernel_reads() try to access contents of
the page zero which is read protected and generate a kernel segfault.
With this patch we fix the interruption handler to call parisc_terminate()
directly only if pagefault_disable() was not called (in which case
preempt_count()==0). Otherwise we hand over to the pagefault handler which
will try to look up the faulting address in the fixup tables.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v3.0+
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 9a46ad6d6d "smp: make smp_call_function_many() use logic
similar to smp_call_function_single()" has unified the way to handle
single and multiple cross-CPU function calls. Now only one interrupt
is needed for architecture specific code to support generic SMP function
call interfaces, so kill the redundant single function call interrupt.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
All arch overriden implementations of do_softirq() share the following
common code: disable irqs (to avoid races with the pending check),
check if there are softirqs pending, then execute __do_softirq() on
a specific stack.
Consolidate the common parts such that archs only worry about the
stack switch.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Pull trivial tree from Jiri Kosina:
"The usual trivial updates all over the tree -- mostly typo fixes and
documentation updates"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
doc: Documentation/cputopology.txt fix typo
treewide: Convert retrun typos to return
Fix comment typo for init_cma_reserved_pageblock
Documentation/trace: Correcting and extending tracepoint documentation
mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
power: Documentation: Update s2ram link
doc: fix a typo in Documentation/00-INDEX
Documentation/printk-formats.txt: No casts needed for u64/s64
doc: Fix typo "is is" in Documentations
treewide: Fix printks with 0x%#
zram: doc fixes
Documentation/kmemcheck: update kmemcheck documentation
doc: documentation/hwspinlock.txt fix typo
PM / Hibernate: add section for resume options
doc: filesystems : Fix typo in Documentations/filesystems
scsi/megaraid fixed several typos in comments
ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
page_isolation: Fix a comment typo in test_pages_isolated()
doc: fix a typo about irq affinity
...
Using 0x%# emits 0x0x. Only one is necessary.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We can't use dev->mod_index for selecting the interrupt routing entry,
because it's not an index into interrupt routing table. It will be even
wrong on a machine with 2 CPUs (4 cores). But all needed information is
contained in the PAT entries for the serial ports. mod[0] contains the
iosapic address and mod_info has some indications for the interrupt
input (at least it looks like it). This patch implements the searching
for the right iosapic and uses this interrupt input information.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: Helge Deller <deller@gmx.de>
The KERNEL_SYSCALL define is not used anymore so the header can be
removed.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
The parisc architecture does not have a pte special bit. As a result,
special mappings are handled with the VM_PFNMAP and VM_MIXEDMAP flags.
VM_MIXEDMAP mappings may or may not have a "struct page" backing. When
pfn_valid() is false, there is no "struct page" backing. Otherwise, they
are treated as normal pages.
The FireGL driver uses the VM_MIXEDMAP without a backing "struct page".
This treatment caused a panic due to a TLB data miss in
update_mmu_cache. This appeared to be in the code generated for
page_address(). We were in fact using a very circular bit of code to
determine the physical address of the PFN in various cache routines.
This wasn't valid when there was no "struct page" backing. The needed
address can in fact be determined simply from the PFN itself without
using the "struct page".
The attached patch updates update_mmu_cache(), flush_cache_mm(),
flush_cache_range() and flush_cache_page() to check pfn_valid() and to
directly compute the PFN physical and virtual addresses.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: Helge Deller <deller@gmx.de>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the parisc uses of the __cpuinit macros.
[1] https://lkml.org/lkml/2013/5/20/589
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
I still see the occasional random segv on rp3440. Looking at one of
these (a code 15), it appeared the problem must be with the cache
handling of anonymous pages. Reviewing this, I noticed that the space
register %sr1 might be being clobbered when we flush an anonymous page.
Register %sr1 is used for TLB purges in a couple of places. These
purges are needed on PA8800 and PA8900 processors to ensure cache
consistency of flushed cache lines.
The solution here is simply to move the %sr1 load into the TLB lock
region needed to ensure that one purge executes at a time on SMP
systems. This was already the case for one use. After a few days of
operation, I haven't had a random segv on my rp3440.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: Helge Deller <deller@gmx.de>
The comment at the start of pacache.S states that the base and index
registers used for fdc,fic, and pdc instructions should not use shadowed
registers. Although this is probably unnecessary for tmpalias flushes,
there is also no reason not to comply.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
pci_mmap_page_range() is needed for X11-server support on C8000 with ATI
FireGL card.
Signed-off-by Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The C8000 workstation (64 bit kernel only) has a somewhat different
serial port configuration than other models.
Thomas Bogendoerfer sent a patch to fix this in September 2010, which
was now minimally modified by me.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Helge Deller <deller@gmx.de>
'boot_args' is an input args, and 'boot_command_line' has a fix length.
So use strlcpy() instead of strcpy() to avoid memory overflow.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Helge Deller <deller@gmx.de>
'path.bc[i]' can be asigned by PCI_SLOT() which can '> 10', so sizeof(6
* "%u:" + "%u" + '\0') may be 21.
Since 'name' length is 20, it may be memory overflow.
And 'path.bc[i]' is 'unsigned char' for printing, we can be sure the
max length of 'name' must be less than 28.
So simplify thinking, we can use 28 instead of 20 directly, and do not
think of whether 'patchc.bc[i]' can '> 100'.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The logic to detect if the irq stack was already in use with
raw_spin_trylock() is wrong, because it will generate a "trylock failure
on UP" error message with CONFIG_SMP=n and CONFIG_DEBUG_SPINLOCK=y.
arch_spin_trylock() can't be used either since in the CONFIG_SMP=n case
no atomic protection is given and we are reentrant here. A mutex didn't
worked either and brings more overhead by turning off interrupts.
So, let's use the fastest path for parisc which is the ldcw instruction.
Counting how often the irq stack was used is pretty useless, so just
drop this piece of code.
Signed-off-by: Helge Deller <deller@gmx.de>
The get_stack_use_cr30 and get_stack_use_r30 macros allocate a stack
frame for external interrupts and interruptions requiring a stack frame.
They are currently not reentrant in that they save register context
before the stack is set or adjusted.
I have observed a number of system crashes where there was clear
evidence of stack corruption during interrupt processing, and as a
result register corruption. Some interruptions can still occur during
interruption processing, however external interrupts are disabled and
data TLB misses don't occur for absolute accesses. So, it's not entirely
clear what triggers this issue. Also, if an interruption occurs when
Q=0, it is generally not possible to recover as the shadowed registers
are not copied.
The attached patch reworks the get_stack_use_cr30 and get_stack_use_r30
macros to allocate stack before doing register saves. The new code is a
couple of instructions shorter than the old implementation. Thus, it's
an improvement even if it doesn't fully resolve the stack corruption
issue. Based on limited testing, it improves SMP system stability.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Impact:
1:convert all remain take_over_console to do_take_over_console
2:update take_over_console to do_take_over_console in comment
Commit dc9641895a ("vt: delete unneeded functions
register_con_driver|take_over_console") delete take_over_console,
but forget to convert remain take_over_console's users to new API
do_take_over_console, this patch fix it.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, race conditions exist in the handling of TLB interruptions in
entry.S. In particular, dirty bit updates can be lost if an accessed
interruption occurs just after the dirty bit interruption on a different
cpu. Lost dirty bit updates result in user pages not being flushed and
general system instability. This change adds lock and unlock macros to
synchronize all PTE and TLB updates done in entry.S. As a result,
userspace stability is significantly improved.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
This patch fixes few build issues which were introduced with the last
irq stack patch, e.g. the combination of stack overflow check and irq
stack.
Furthermore we now do proper locking and change the irq bh handler
to use the irq stack as well.
In /proc/interrupts one now can monitor how huge the irq stack has grown
and how often it was preferred over the kernel stack.
IRQ stacks are now enabled by default just to make sure that we not
overflow the kernel stack by accident.
Signed-off-by: Helge Deller <deller@gmx.de>