Commit Graph

39 Commits

Author SHA1 Message Date
Vineet Gupta ee40bd1e0c ARCv2: mm: Merge 2 updates to DC_CTRL for region flush
Region Flush has a weird programming model.

 1. Flush or Invalidate is selected by DC_CTRL.RGN_OP
 2 Flush-n-Invalidate is done by DC_CTRL.IM

Given the code structuring before, case #2 above was generating two
seperate updates to DC_CTRL which was pointless.

| 80a342b0 <__dma_cache_wback_inv_l1>:
| 80a342b0:	clri	r4
| 80a342b4:	lr	r2,[dc_ctrl]
| 80a342b8:	bset_s	r2,r2,0x6
| 80a342ba:	sr	r2,[dc_ctrl]	<-- FIRST
|
| 80a342be:	bmskn	r3,r0,0x5
|
| 80a342c2:	lr	r2,[dc_ctrl]
| 80a342c6:	and	r2,r2,0xfffff1ff
| 80a342ce:	bset_s	r2,r2,0x9
| 80a342d0:	sr	r2,[dc_ctrl]	<-- SECOND
|
| 80a342d4:	add_s	r1,r1,0x3f
| 80a342d6:	bmsk_s	r0,r0,0x5
| 80a342d8:	add_s	r0,r0,r1
| 80a342da:	add_s	r0,r0,r3
| 80a342dc:	sr	r0,[78]
| 80a342e0:	sr	r3,[77]
|...
|...

So move setting of DC_CTRL.RGN_OP into __before_dc_op() and combine with
any other update.

| 80b63324 <__dma_cache_wback_inv_l1>:
| 80b63324:	clri	r3
| 80b63328:	lr	r2,[dc_ctrl]
| 80b6332c:	and	r2,r2,0xfffff1ff
| 80b63334:	or	r2,r2,576
| 80b63338:	sr	r2,[dc_ctrl]
|
| 80b6333c:	add_s	r1,r1,0x3f
| 80b6333e:	bmskn	r2,r0,0x5
| 80b63342:	add_s	r0,r0,r1
| 80b63344:	sr	r0,[78]
| 80b63348:	sr	r2,[77]

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-05-02 16:16:07 -07:00
Vineet Gupta 0d77117fc5 ARCv2: mm: Implement cache region flush operations
These are more efficient than the per-line ops

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-05-02 15:57:22 -07:00
Vineet Gupta 7d3d162bbd ARC: mm: Move full_page computation into cache version agnostic wrapper
This reduces code duplication in each of cache version specific handlers

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-05-02 15:19:34 -07:00
Alexey Brodkin c70c473396 ARCv2: SLC: Make sure busy bit is set properly on SLC flushing
As reported in STAR 9001165532, an SLC control reg read (for checking
busy state) right after SLC invalidate command may incorrectly return
NOT busy causing software to NOT spin-wait while operation is underway.
(and for some reason this only happens if L1 cache is also disabled - as
required by IOC programming model)

Suggested workaround is to do an additional Control Reg read, which
ensures the 2nd read gets the right status.

Cc: stable@vger.kernel.org  #4.10
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: reworte changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-03-30 17:32:33 -07:00
Vineet Gupta d0e73e2ac6 ARC: Revert "ARC: mm: IOC: Don't enable IOC by default"
The programming model has been fixed with prev patches so re-enable it
by default

This reverts commit 23cb1f6440.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-01-18 19:21:06 -08:00
Vineet Gupta 76894a72a0 ARC: mm: split arc_cache_init to allow __init reaping of bulk
arc_cache_init() is called for each core so can't be tagged __init.
However bulk of it is only executed by master core and thus is candidate
for __init reaping.

So split it up to allow that.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-01-18 19:21:02 -08:00
Vineet Gupta e497c8e52a ARCv2: IOC: Use actual memory size to setup aperture size
vs. fixed 512M before.

But this still assumes that all of memory is under IOC which may not be
true for the SoC. Improve that later when this becomes a real issue, by
specifying this from DT.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-01-18 14:52:43 -08:00
Vineet Gupta 8c47f83ba4 ARCv2: IOC: Adhere to progamming model guidelines to avoid DMA corruption
On AXS103 release bitfiles, DMA data corruptions were seen because IOC
setup was not following the recommended way in documentation.

Flipping IOC on when caches are enabled or coherency transactions are in
flight, might cause some of the memory operations to not observe
coherency as expected.

So strictly follow the programming model recommendations as documented
in comment header above arc_ioc_setup()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-01-18 14:48:33 -08:00
Vineet Gupta d4911cdd32 ARCv2: IOC: refactor the IOC and SLC operations into own functions
- Move IOC setup into arc_ioc_setup()
 - Move SLC disabling into arc_slc_disable()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-01-18 14:35:10 -08:00
Vineet Gupta fa84d7310d ARC: mmu: clarify the MMUv3 programming model
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-01-04 17:12:09 -08:00
Vineet Gupta 08fe007968 ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache
An ARC700 customer reported linux boot crashes when upgrading to bigger
L1 dcache (64K from 32K). Turns out they had an aliasing VIPT config and
current code only assumed 2 colours, while theirs had 4. So default to 4
colours and complain if there are fewer. Ideally this needs to be a
Kconfig option, but heck that's too much of hassle for a single user.

Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-12-19 11:55:17 -08:00
Vineet Gupta f64915be2d ARC: mm: No need to save cache version in @cpuinfo
Historical MMU revisions have been paired with Cache revision updates
which are captured in MMU and Cache Build Configuration Registers respectively.

This was used in boot code to check for configurations mismatches,
speically in simulations (such as running with non existent caches,
non pairing MMU and Cache version etc). This can instead be inferred
from other cache params such as line size. So remove @ver from post
processed @cpuinfo which could be used later to save soem other
interesting info.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-12-19 11:54:41 -08:00
Vineet Gupta 23cb1f6440 ARC: mm: IOC: Don't enable IOC by default
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-11-28 09:18:36 -08:00
Vineet Gupta 711c1f2671 ARCv2: boot log: print IOC exists as well as enabled status
Previously we would not print the case when IOC existed but was not
enabled.

And while at it, reduce one line off boot printing by consolidating
the Peripheral address space and IO-Coherency which in a way
applies to them

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-10-28 10:06:48 -07:00
Vineet Gupta cf986d4702 ARCv2: IOC: use @ioc_enable not @ioc_exist where intended
if user disables IOC from debugger at startup (by clearing @ioc_enable),
@ioc_exists is cleared too. This means boot prints don't capture the
fact that IOC was present but disabled which could be misleading.

So invert how we use @ioc_enable and @ioc_exists and make it more
canonical. @ioc_exists represent whether hardware is present or not and
stays same whether enabled or not. @ioc_enable is still user driven,
but will be auto-disabled if IOC hardware is not present, i.e. if
@ioc_exist=0. This is opposite to what we were doing before, but much
clearer.

This means @ioc_enable is now the "exported" toggle in rest of code such
as dma mapping API.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-10-24 09:24:47 -07:00
Vineet Gupta 26c01c49d5 ARCv2: Support dynamic peripheral address space in HS38 rel 3.0 cores
HS release 3.0 provides for even more flexibility in specifying the
volatile address space for mapping peripherals.

With HS 2.1 @start was made flexible / programmable - with HS 3.0 even
@end can be setup (vs. fixed to 0xFFFF_FFFF before).

So add code to reflect that and while at it remove an unused struct
defintion

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-09-30 14:48:17 -07:00
Vineet Gupta 45c3b08a11 ARC: Elide redundant setup of DMA callbacks
For resources shared by all cores such as SLC and IOC, only the master
core needs to do any setups / enabling / disabling etc.

Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-08-10 10:16:46 -07:00
Andrea Gelmini 2547476a5e Fix typos
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-05-30 10:07:32 +05:30
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Vineet Gupta deaf7565eb ARCv2: ioremap: Support dynamic peripheral address space
The peripheral address space is architectural address window which is
uncached and typically used to wire up peripherals.

For ARC700 cores (ARCompact ISA based) this was fixed to 1GB region
0xC000_0000 - 0xFFFF_FFFF.

For ARCv2 based HS38 cores the start address is flexible and can be
0xC, 0xD, 0xE, 0xF 000_000 by programming AUX_NON_VOLATILE_LIMIT reg
(typically done in bootloader)

Further in cas of PAE, the physical address can extend beyond 4GB so
need to confine this check, otherwise all pages beyond 4GB will be
treated as uncached

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-03-19 14:34:10 +05:30
Vineet Gupta f5db19e93f ARC: dma: ioremap: use phys_addr_t consistenctly in code paths
To support dma in physical memory beyond 4GB with PAE40

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-03-19 14:34:09 +05:30
Adam Buchbinder 7423cc0cae ARC: Fix misspellings in comments.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-03-11 14:59:53 +05:30
Kirill A. Shutemov e1534ae950 mm: differentiate page_mapped() from page_mapcount() for compound pages
Let's define page_mapped() to be true for compound pages if any
sub-pages of the compound page is mapped (with PMD or PTE).

On other hand page_mapcount() return mapcount for this particular small
page.

This will make cases like page_get_anon_vma() behave correctly once we
allow huge pages to be mapped with PTE.

Most users outside core-mm should use page_mapcount() instead of
page_mapped().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Vineet Gupta 5a364c2a17 ARC: mm: PAE40 support
This is the first working implementation of 40-bit physical address
extension on ARCv2.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-29 18:41:30 +05:30
Vineet Gupta 28b4af729f ARC: mm: PAE40: switch to using phys_addr_t for physical addresses
That way a single flip of phys_addr_t to 64 bit ensures all places
dealing with physical addresses get correct data

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 19:50:29 +05:30
Vineet Gupta 336e2136e1 ARC: mm: preps ahead of HIGHMEM support
Before we plug in highmem support, some of code needs to be ready for it
 - copy_user_highpage() needs to be using the kmap_atomic API
 - mk_pte() can't assume page_address()
 - do_page_fault() can't assume VMALLOC_END is end of kernel vaddr space

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 19:31:05 +05:30
Vineet Gupta 964cf28f9d ARC: boot log: move helper macros to header for reuse
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-17 17:48:25 +05:30
Vineet Gupta fd0881a24a ARC: Eliminate some ARCv2 specific code for ARCompact build
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-21 15:06:43 +05:30
Alexey Brodkin 1648c70d30 ARCv2: IOC: Allow boot time disable
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:15:31 +05:30
Vineet Gupta 79335a2ca0 ARCv2: SLC: Allow boot time disable
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:11:52 +05:30
Alexey Brodkin f2b0b25a37 ARCv2: Support IO Coherency and permutations involving L1 and L2 caches
In case of ARCv2 CPU there're could be following configurations
that affect cache handling for data exchanged with peripherals
via DMA:
 [1] Only L1 cache exists
 [2] Both L1 and L2 exist, but no IO coherency unit
 [3] L1, L2 caches and IO coherency unit exist

Current implementation takes care of [1] and [2].
Moreover support of [2] is implemented with run-time check
for SLC existence which is not super optimal.

This patch introduces support of [3] and rework of DMA ops
usage. Instead of doing run-time check every time a particular
DMA op is executed we'll have 3 different implementations of
DMA ops and select appropriate one during init.

As for IOC support for it we need:
 [a] Implement empty DMA ops because IOC takes care of cache
     coherency with DMAed data
 [b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
     This is required to make IOC work in first place and also
     serves as optimization as LD/ST to coherent buffers can be
     srviced from caches w/o going all the way to memory

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta:
  -Added some comments about IOC gains
  -Marked dma ops as static,
  -Massaged changelog a bit]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20 18:11:17 +05:30
Alexey Brodkin b607eddd71 ARCv2: guard SLC DMA ops with spinlock
SLC maintenance ops need to be serialized by software as there is no
inherent buffering / quequing of aux commands. It can silently ignore a
new aux operation if previous one is still ongoing (SLC_CTRL_BUSY)

So gaurd the SLC op using a spin lock

The spin lock doesn't seem to be contended even in heavy workloads such
as iperf. On FPGA @ 75 MHz.

 [1] Before this change:
 ============================================================
  # iperf -c 10.42.0.1
 ------------------------------------------------------------
 Client connecting to 10.42.0.1, TCP port 5001
 TCP window size: 43.8 KByte (default)
 ------------------------------------------------------------
 [  3] local 10.42.0.110 port 38935 connected with 10.42.0.1 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec  48.4 MBytes  40.6 Mbits/sec
 ============================================================

 [2] After this change:
 ============================================================
 # iperf -c 10.42.0.1
 ------------------------------------------------------------
 Client connecting to 10.42.0.1, TCP port 5001
 TCP window size: 43.8 KByte (default)
 ------------------------------------------------------------
 [  3] local 10.42.0.243 port 60248 connected with 10.42.0.1 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec  47.5 MBytes  39.8 Mbits/sec
 # iperf -c 10.42.0.1
 ------------------------------------------------------------
 Client connecting to 10.42.0.1, TCP port 5001
 TCP window size: 43.8 KByte (default)
 ------------------------------------------------------------
 [  3] local 10.42.0.243 port 60249 connected with 10.42.0.1 port 5001
 [ ID] Interval       Transfer     Bandwidth
 [  3]  0.0-10.0 sec  54.9 MBytes  46.0 Mbits/sec
 ============================================================

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-07-06 10:12:39 +05:30
Vineet Gupta 795f455856 ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)
L2 cache on ARCHS processors is called SLC (System Level Cache)
For working DMA (in absence of hardware assisted IO Coherency) we need
to manage SLC explicitly when buffers transition between cpu and
controllers.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-25 06:00:19 +05:30
Vineet Gupta bcc4d65abe ARCv2: MMUv4: support aliasing icache config
This is also default for AXS103 release

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-22 14:06:56 +05:30
Vineet Gupta d1f317d825 ARCv2: MMUv4: cache programming model changes
Caveats about cache flush on ARCv2 based cores

- dcache is PIPT so paddr is sufficient for cache maintenance ops (no
  need to setup PTAG reg

- icache is still VIPT but only aliasing configs need PTAG setup

So basically this is departure from MMU-v3 which always need vaddr in
line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG,
IC_PTAG respectively.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-22 14:06:55 +05:30
Vineet Gupta 11e14896ea ARC: untangle cache flush loop
- Remove the ifdef'ery and write distinct versions for each mmu ver even
  if there is some code duplication

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-19 18:09:33 +05:30
Vineet Gupta 6c31068137 ARC: cacheflush: No need to retain DC_CTRL from __before_dc_op()
That is because __after_dc_op() already reads it for status check, so it
is better anyways to use that "newer" value.

Also reduces the clutter in callers for passing from/to these routines.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-19 18:09:33 +05:30
Vineet Gupta 8ea2ddff41 ARC: cacheflush: move some code around, delete old comments
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-19 18:09:33 +05:30
Vineet Gupta 8362c389a4 ARC: mm/cache_arc700.c -> mm/cache.c
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-19 18:09:32 +05:30