Commit Graph

33 Commits

Author SHA1 Message Date
John David Anglin 4c5fe5db1a parisc: Optimze cache flush algorithms
The attached patch implements three optimizations:

1) Loops in flush_user_dcache_range_asm, flush_kernel_dcache_range_asm,
purge_kernel_dcache_range_asm, flush_user_icache_range_asm, and
flush_kernel_icache_range_asm are unrolled to reduce branch overhead.

2) The static branch prediction for cmpb instructions in pacache.S have
been reviewed and the operand order adjusted where necessary.

3) For flush routines in cache.c, we purge rather flush when we have no
context.  The pdc instruction at level 0 is not required to write back
dirty lines to memory. This provides a performance improvement over the
fdc instruction if the feature is implemented.

Version 2 adds alternative patching.

The patch provides an average improvement of about 2%.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-20 21:10:26 +02:00
Helge Deller 3847dab774 parisc: Add alternative coding infrastructure
This patch adds the necessary code to patch a running kernel at runtime
to improve performance.

The current implementation offers a few optimizations variants:

- When running a SMP kernel on a single UP processor, unwanted assembler
  statements like locking functions are overwritten with NOPs. When
  multiple instructions shall be skipped, one branch instruction is used
  instead of multiple nop instructions.

- In the UP case, some pdtlb and pitlb instructions are patched to
  become pdtlb,l and pitlb,l which only flushes the CPU-local tlb
  entries instead of broadcasting the flush to other CPUs in the system
  and thus may improve performance.

- fic and fdc instructions are skipped if no I- or D-caches are
  installed.  This should speed up qemu emulation and cacheless systems.

- If no cache coherence is needed for IO operations, the relevant fdc
  and sync instructions in the sba and ccio drivers are replaced by
  nops.

- On systems which share I- and D-TLBs and thus don't have a seperate
  instruction TLB, the pitlb instruction is replaced by a nop.

Live-patching is done early in the boot process, just after having run
the system inventory. No drivers are running and thus no external
interrupts should arrive. So the hope is that no TLB exceptions will
occur during the patching. If this turns out to be wrong we will
probably need to do the patching in real-mode.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-17 17:22:26 +02:00
Helge Deller c8921d72e3 parisc: Fix and improve kernel stack unwinding
This patchset fixes and improves stack unwinding a lot:
1. Show backward stack traces with up to 30 callsites
2. Add callinfo to ENTRY_CFI() such that every assembler function will get an
   entry in the unwind table
3. Use constants instead of numbers in call_on_stack()
4. Do not depend on CONFIG_KALLSYMS to generate backtraces.
5. Speed up backtrace generation

Make sure you have this patch to GNU as installed:
https://sourceware.org/ml/binutils/2018-07/msg00474.html
Without this patch, unwind info in the kernel is often wrong for various
functions.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13 09:54:17 +02:00
John David Anglin fedb8da963 parisc: Define mb() and add memory barriers to assembler unlock sequences
For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.

This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml

For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-08 22:13:32 +02:00
Helge Deller 2a03bb9e7a parisc: Move cache flush functions into .text.hot section
and move the disable_sr_hashing() C and assembly functions into the
.init section.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11 11:40:35 +02:00
John David Anglin 0adb24e03a parisc: Fix ordering of cache and TLB flushes
The change to flush_kernel_vmap_range() wasn't sufficient to avoid the
SMP stalls.  The problem is some drivers call these routines with
interrupts disabled.  Interrupts need to be enabled for flush_tlb_all()
and flush_cache_all() to work.  This version adds checks to ensure
interrupts are not disabled before calling routines that need IPI
interrupts.  When interrupts are disabled, we now drop into slower code.

The attached change fixes the ordering of cache and TLB flushes in
several cases.  When we flush the cache using the existing PTE/TLB
entries, we need to flush the TLB after doing the cache flush.  We don't
need to do this when we flush the entire instruction and data caches as
these flushes don't use the existing TLB entries.  The same is true for
tmpalias region flushes.

The flush_kernel_vmap_range() and invalidate_kernel_vmap_range()
routines have been updated.

Secondly, we added a new purge_kernel_dcache_range_asm() routine to
pacache.S and use it in invalidate_kernel_vmap_range().  Nominally,
purges are faster than flushes as the cache lines don't have to be
written back to memory.

Hopefully, this is sufficient to resolve the remaining problems due to
cache speculation.  So far, testing indicates that this is the case.  I
did work up a patch using tmpalias flushes, but there is a performance
hit because we need the physical address for each page, and we also need
to sequence access to the tmpalias flush code.  This increases the
probability of stalls.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
2018-03-02 10:03:28 +01:00
Helge Deller 88776c0e70 parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures
about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9."

Those opcodes evaluate to the ldcw() assembly instruction which requires
(on 32bit) an alignment of 16 bytes to ensure atomicity.

As it turns out, qemu is correct and in our assembly code in entry.S and
pacache.S we don't pay attention to the required alignment.

This patch fixes the problem by aligning the lock offset in assembly
code in the same manner as we do in our C-code.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.0+
2018-01-02 22:21:54 +01:00
John David Anglin febe42964f parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asm
We have four routines in pacache.S that use temporary alias pages:
copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and
flush_icache_page_asm().  copy_user_page_asm() and clear_user_page_asm()
don't purge the TLB entry used for the operation.
flush_dcache_page_asm() and flush_icache_page_asm do purge the entry.

Presumably, this was thought to optimize TLB use.  However, the
operation is quite heavy weight on PA 1.X processors as we need to take
the TLB lock and a TLB broadcast is sent to all processors.

This patch removes the purges from flush_dcache_page_asm() and
flush_icache_page_asm.

Signed-off-by: John David Anglin  <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Helge Deller <deller@gmx.de>
2016-12-07 09:01:21 +01:00
John David Anglin 5035b230e7 parisc: Also flush data TLB in flush_icache_page_asm
This is the second issue I noticed in reviewing the parisc TLB code.

The fic instruction may use either the instruction or data TLB in
flushing the instruction cache.  Thus, on machines with a split TLB, we
should also flush the data TLB after setting up the temporary alias
registers.

Although this has no functional impact, I changed the pdtlb and pitlb
instructions to consistently use the index register %r0.  These
instructions do not support integer displacements.

Tested on rp3440 and c8000.

Signed-off-by: John David Anglin  <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Helge Deller <deller@gmx.de>
2016-11-25 12:32:01 +01:00
Helge Deller f39cce654f parisc: Add cfi_startproc and cfi_endproc to assembly code
Add ENTRY_CFI() and ENDPROC_CFI() macros for dwarf debug info and
convert assembly users to new macros.

Signed-off-by: Helge Deller <deller@gmx.de>
2016-10-05 22:54:40 +02:00
John David Anglin 910a86435d parisc: Update comment regarding implementation of copy_user_page_asm
The attached patch describes the current implementation of
copy_user_page_asm().  It is possible to implement this routine using
either the kernel page mappings or equivalent aliases.  I tested both
and decided the former was more efficient.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2016-09-20 20:00:12 +02:00
John David Anglin d65ea48dc6 parisc: Use unshadowed index register for flush instructions in flush_dcache_page_asm and flush_icache_page_asm
The comment at the start of pacache.S states that the base and index
registers used for fdc,fic, and pdc instructions should not use shadowed
registers. Although this is probably unnecessary for tmpalias flushes,
there is also no reason not to comply.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-18 20:29:10 +02:00
Helge Deller d845b5fb36 parisc: use PAGE_SHIFT instead of hardcoded value 12 in pacache.S
additionally clean up some whitespaces & tabs.

Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-24 22:30:28 +02:00
Helge Deller 6a45716abb parisc: fix partly 16/64k PAGE_SIZE boot
This patch fixes partly PAGE_SIZEs of 16K or 64K by adjusting the
assembler PTE lookup code and the assembler TEMPALIAS code.  Furthermore
some data alignments for PAGE_SIZE have been limited to 4K (or less) to
not waste too much memory with greater page sizes. As a side note, the
palo loader can (currently) only handle up to 10 ELF segments which is
fixed with tighter aligning as well.

My testings indicated that the ldci command in the sba iommu coding
needed adjustment by the PAGE_SHIFT value and that the I/O PDIR Page
size was only set to 4K for my machine (C3000).

All this fixes partly the boot, but there are still quite some caching
problems left.  Examples are e.g. the symbios logic driver which is
failing:

sym0: <896> rev 0x7 at pci 0000:00:0f.0 irq 69
sym0: PA-RISC Firmware, ID 7, Fast-40, SE, parity checking
CACHE TEST FAILED: DMA error (dstat=0x81).sym0: CACHE INCORRECTLY CONFIGURED.

and the tulip network driver which doesn't seem to work correctly
either:

Sending BOOTP requests .net eth0: Setting full-duplex based on MII#1
link partner capability of 05e1
..... timed out!

Beside those kernel fixes glibc will need fixes too to be able to handle
>4K page sizes.

Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-06 23:08:32 +02:00
John David Anglin 6d2ddc2f94 parisc: fixes and cleanups in page cache flushing (2/4)
Implement clear_page_asm and copy_page_asm. These are optimized routines to
clear and copy a page.  I tested prefetch optimizations in clear_page_asm and
copy_page_asm but didn't see any significant performance improvement on rp3440.
I'm not sure if these are routines are significantly faster than memset and/or
memcpy, but they are there for further performance evaluation.

TLB purge operations on PA 1.X SMP machines are now serialized with the help of
the new tlb_lock() and tlb_unlock() macros, since on some PA-RISC machines, TLB
purges need to be serialized in software.  Obviously, lock isn't needed in UP
kernels.  On PA 2.0 machines, there is a local TLB instruction which is much
less disruptive to the memory subsystem.  No lock is needed for local purge.

Loops are also unrolled in flush_instruction_cache_local and
flush_data_cache_local.

The implementation of what used to be copy_user_page (now copy_user_page_asm)
is now fixed. Additionally 64-bit support is now added. Read the preceding
comment which I didn't change.  I left the comment but it is now inaccurate.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2013-02-20 22:49:29 +01:00
John David Anglin 207f583d71 [PARISC] fix crash in flush_icache_page_asm on PA1.1
As pointed out by serveral people, PA1.1 only has a type 26 instruction
meaning that the space register must be explicitly encoded.  Not giving an
explicit space means that the compiler uses the type 24 version which is PA2.0
only resulting in an illegal instruction crash.

This regression was caused by

    commit f311847c2f
    Author: James Bottomley <James.Bottomley@HansenPartnership.com>
    Date:   Wed Dec 22 10:22:11 2010 -0600

        parisc: flush pages through tmpalias space

Reported-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org	#2.6.39+
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-16 13:15:06 +01:00
Meelis Roos b54cd0d505 [PARISC] fix pacache .size with new binutils
Fix style of flush_user_dcache_range_asm procedure declaration in
arch/parisc/kernel/pacache.s to be consistent with other assembly
procedures.

Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-15 12:50:41 -05:00
James Bottomley f311847c2f parisc: flush pages through tmpalias space
The kernel has an 8M tmpailas space (originally designed for copying
and clearing pages but now only used for clearing).  The idea is
to place zeros into the cache above a physical page rather than into
the physical page and flush the cache, because often the zeros end up
being replaced quickly anyway.

We can also use the tmpalias space for flushing a page.  The difference
here is that we have to do tmpalias processing in the non access data and
instruction traps.  The principle is the same: as long as we know the physical
address and have a virtual address congruent to the real one, the flush will
be effective.

In order to use the tmpalias space, the icache miss path has to be enhanced to
check for the alias region to make the fic instruction effective.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-01-15 08:44:40 -06:00
Kyle McMartin dfcf753bd3 Revert "parisc: fix trivial section name warnings"
This reverts commit bd3bb8c15b.

Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2008-06-13 10:49:45 -04:00
Kyle McMartin 872f6debca parisc: use conditional macro for 64-bit wide ops
This work enables us to remove -traditional from $AFLAGS on
parisc.

Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2008-05-15 11:03:43 -04:00
Helge Deller bd3bb8c15b parisc: fix trivial section name warnings
This trivial patch fixes the following section warnings on PARISC:
> WARNING: vmlinux.o (.text.1): unexpected section name.
>The (.[number]+) following section name are ld generated and not expected.
> Did you forget to use "ax"/"aw" in a .S file?
> Note that for example <linux/init.h> contains
> section definitions for use in .S files.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2008-05-15 10:38:54 -04:00
Kyle McMartin 6ebeafff64 [PARISC] Clean up pointless ASM_PAGE_SIZE_DIV use
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2007-10-18 00:59:23 -07:00
Helge Deller 0b3d643f9e [PARISC] add ASM_EXCEPTIONTABLE_ENTRY() macro
- this macro unifies the code to add exception table entries
- additionally use ENTRY()/ENDPROC() at more places

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2007-02-17 01:16:26 -05:00
Helge Deller 8e9e9844b4 [PARISC] more ENTRY(), ENDPROC(), END() conversions
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2007-02-17 01:16:12 -05:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Helge Deller 2fd8303816 [PARISC] Further work for multiple page sizes
More work towards supporing multiple page sizes on 64-bit. Convert
some assumptions that 64bit uses 3 level page tables into testing
PT_NLEVELS. Also some BUG() to BUG_ON() conversions and some cleanups
to assembler.

Signed-off-by: Helge Deller <deller@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-04-21 22:20:34 +00:00
James Bottomley ba57583396 [PARISC] Add parisc implementation of flush_kernel_dcache_page()
We need to do a little renaming of our original syntax because
of the difference in arguments.

Signed-off-by: James Bottomley <jejb@parisc-linux.org>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30 17:48:44 +00:00
Matthew Wilcox e635c96ed6 [PARISC] Explicitly specify sr4 when flushing kernel space
Specify sr4 when flushing kernel space (we could equally well use sr5-7,
but must not use sr0).

Signed-off-by: Matthew Wilcox <willy@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:56:14 -04:00
Grant Grundler 9b3b331d03 [PARISC] Properly specify index field to I/D cache flush ops
replace use of "0" with "%r0" since PA 1.1 I/D flush ops only take a
general register and not an immediate value for the index field.
This just forces the code to always be PA 1.1 "clean".

From: Joel Soete <soete.joel@tiscali.be>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:55:51 -04:00
Grant Grundler 37318a3cb1 [PARISC] Fix copy_user_page_asm to NOT access past end of page
2.6.12-rc2-pa3 fix copy_user_page_asm to NOT access past end of page.

My bad. /o\
Lamont confirmed that instructions following a conditional
branch are *alway* executed regardless if the branch is taken or not.
Unless they are nullified (which was missing in this case).

He also noted:
Conditional branches nullify on forward taken branch, and on
non-taken backward branch. Note that .+4 is a backwards branch.

This makes alot more sense than the giberish in the PA20 arch book.

Compiles and boots on both 64-bit (a500) and 32-bit (j6k).

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:55:34 -04:00
Grant Grundler 413059f28e [PARISC] Replace uses of __LP64__ with CONFIG_64BIT
2.6.12-rc4-pa3 s/__LP64__/CONFIG_64BIT/ and fixup config.h usage

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:46:48 -04:00
Grant Grundler 896a375623 [PARISC] Make sure use of RFI conforms to PA 2.0 and 1.1 arch docs
2.6.12-rc4-pa3 : first pass at making sure use of RFI conforms to
PA 2.0 arch pages F-4 and F-5, PA 1.1 Arch page 3-19 and 3-20.

The discussion revolves around all the rules for clearing PSW Q-bit.
The hard part is meeting all the rules for "relied upon translation".

.align directive is used to guarantee the critical sequence ends more than
8 instructions (32 bytes) from the end of page.

Signed-off-by: Grant Grundler <grundler@parisc-linux.org>

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21 22:40:07 -04:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00