Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax

Pull XArray conversion from Matthew Wilcox:
 "The XArray provides an improved interface to the radix tree data
  structure, providing locking as part of the API, specifying GFP flags
  at allocation time, eliminating preloading, less re-walking the tree,
  more efficient iterations and not exposing RCU-protected pointers to
  its users.

  This patch set

   1. Introduces the XArray implementation

   2. Converts the pagecache to use it

   3. Converts memremap to use it

  The page cache is the most complex and important user of the radix
  tree, so converting it was most important. Converting the memremap
  code removes the only other user of the multiorder code, which allows
  us to remove the radix tree code that supported it.

  I have 40+ followup patches to convert many other users of the radix
  tree over to the XArray, but I'd like to get this part in first. The
  other conversions haven't been in linux-next and aren't suitable for
  applying yet, but you can see them in the xarray-conv branch if you're
  interested"

* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
  radix tree: Remove multiorder support
  radix tree test: Convert multiorder tests to XArray
  radix tree tests: Convert item_delete_rcu to XArray
  radix tree tests: Convert item_kill_tree to XArray
  radix tree tests: Move item_insert_order
  radix tree test suite: Remove multiorder benchmarking
  radix tree test suite: Remove __item_insert
  memremap: Convert to XArray
  xarray: Add range store functionality
  xarray: Move multiorder_check to in-kernel tests
  xarray: Move multiorder_shrink to kernel tests
  xarray: Move multiorder account test in-kernel
  radix tree test suite: Convert iteration test to XArray
  radix tree test suite: Convert tag_tagged_items to XArray
  radix tree: Remove radix_tree_clear_tags
  radix tree: Remove radix_tree_maybe_preload_order
  radix tree: Remove split/join code
  radix tree: Remove radix_tree_update_node_t
  page cache: Finish XArray conversion
  dax: Convert page fault handlers to XArray
  ...
This commit is contained in:
Linus Torvalds 2018-10-28 11:35:40 -07:00
commit dad4f140ed
93 changed files with 7056 additions and 3825 deletions

View File

@ -323,7 +323,6 @@ ForEachMacros:
- 'protocol_for_each_card'
- 'protocol_for_each_dev'
- 'queue_for_each_hw_ctx'
- 'radix_tree_for_each_contig'
- 'radix_tree_for_each_slot'
- 'radix_tree_for_each_tagged'
- 'rbtree_postorder_for_each_entry_safe'

View File

@ -119,6 +119,13 @@ Mark Brown <broonie@sirena.org.uk>
Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
Matthew Wilcox <willy@infradead.org> <matthew@wil.cx>
Matthew Wilcox <willy@infradead.org> <mawilcox@linuxonhyperv.com>
Matthew Wilcox <willy@infradead.org> <mawilcox@microsoft.com>
Matthew Wilcox <willy@infradead.org> <willy@debian.org>
Matthew Wilcox <willy@infradead.org> <willy@linux.intel.com>
Matthew Wilcox <willy@infradead.org> <willy@parisc-linux.org>
Matthieu CASTET <castet.matthieu@free.fr>
Mauro Carvalho Chehab <mchehab@kernel.org> <mchehab@brturbo.com.br>
Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com>

View File

@ -21,6 +21,7 @@ Core utilities
local_ops
workqueue
genericirq
xarray
flexible-arrays
librs
genalloc

View File

@ -0,0 +1,435 @@
.. SPDX-License-Identifier: GPL-2.0+
======
XArray
======
:Author: Matthew Wilcox
Overview
========
The XArray is an abstract data type which behaves like a very large array
of pointers. It meets many of the same needs as a hash or a conventional
resizable array. Unlike a hash, it allows you to sensibly go to the
next or previous entry in a cache-efficient manner. In contrast to a
resizable array, there is no need to copy data or change MMU mappings in
order to grow the array. It is more memory-efficient, parallelisable
and cache friendly than a doubly-linked list. It takes advantage of
RCU to perform lookups without locking.
The XArray implementation is efficient when the indices used are densely
clustered; hashing the object and using the hash as the index will not
perform well. The XArray is optimised for small indices, but still has
good performance with large indices. If your index can be larger than
``ULONG_MAX`` then the XArray is not the data type for you. The most
important user of the XArray is the page cache.
Each non-``NULL`` entry in the array has three bits associated with
it called marks. Each mark may be set or cleared independently of
the others. You can iterate over entries which are marked.
Normal pointers may be stored in the XArray directly. They must be 4-byte
aligned, which is true for any pointer returned from :c:func:`kmalloc` and
:c:func:`alloc_page`. It isn't true for arbitrary user-space pointers,
nor for function pointers. You can store pointers to statically allocated
objects, as long as those objects have an alignment of at least 4.
You can also store integers between 0 and ``LONG_MAX`` in the XArray.
You must first convert it into an entry using :c:func:`xa_mk_value`.
When you retrieve an entry from the XArray, you can check whether it is
a value entry by calling :c:func:`xa_is_value`, and convert it back to
an integer by calling :c:func:`xa_to_value`.
Some users want to store tagged pointers instead of using the marks
described above. They can call :c:func:`xa_tag_pointer` to create an
entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
the tag of an entry. Tagged pointers use the same bits that are used
to distinguish value entries from normal pointers, so each user must
decide whether they want to store value entries or tagged pointers in
any particular XArray.
The XArray does not support storing :c:func:`IS_ERR` pointers as some
conflict with value entries or internal entries.
An unusual feature of the XArray is the ability to create entries which
occupy a range of indices. Once stored to, looking up any index in
the range will return the same entry as looking up any other index in
the range. Setting a mark on one index will set it on all of them.
Storing to any index will store to all of them. Multi-index entries can
be explicitly split into smaller entries, or storing ``NULL`` into any
entry will cause the XArray to forget about the range.
Normal API
==========
Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
for statically allocated XArrays or :c:func:`xa_init` for dynamically
allocated ones. A freshly-initialised XArray contains a ``NULL``
pointer at every index.
You can then set entries using :c:func:`xa_store` and get entries
using :c:func:`xa_load`. xa_store will overwrite any entry with the
new entry and return the previous entry stored at that index. You can
use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
``NULL`` entry. There is no difference between an entry that has never
been stored to and one that has most recently had ``NULL`` stored to it.
You can conditionally replace an entry at an index by using
:c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if
the entry at that index has the 'old' value. It also returns the entry
which was at that index; if it returns the same entry which was passed as
'old', then :c:func:`xa_cmpxchg` succeeded.
If you want to only store a new entry to an index if the current entry
at that index is ``NULL``, you can use :c:func:`xa_insert` which
returns ``-EEXIST`` if the entry is not empty.
You can enquire whether a mark is set on an entry by using
:c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark
on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
calling :c:func:`xa_clear_mark`. You can ask whether any entry in the
XArray has a particular mark set by calling :c:func:`xa_marked`.
You can copy entries out of the XArray into a plain array by calling
:c:func:`xa_extract`. Or you can iterate over the present entries in
the XArray by calling :c:func:`xa_for_each`. You may prefer to use
:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
entry in the XArray.
Calling :c:func:`xa_store_range` stores the same entry in a range
of indices. If you do this, some of the other operations will behave
in a slightly odd way. For example, marking the entry at one index
may result in the entry being marked at some, but not all of the other
indices. Storing into one index may result in the entry retrieved by
some, but not all of the other indices changing.
Finally, you can remove all entries from an XArray by calling
:c:func:`xa_destroy`. If the XArray entries are pointers, you may wish
to free the entries first. You can do this by iterating over all present
entries in the XArray using the :c:func:`xa_for_each` iterator.
ID assignment
-------------
You can call :c:func:`xa_alloc` to store the entry at any unused index
in the XArray. If you need to modify the array from interrupt context,
you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable
interrupts while allocating the ID. Unlike :c:func:`xa_store`, allocating
a ``NULL`` pointer does not delete an entry. Instead it reserves an
entry like :c:func:`xa_reserve` and you can release it using either
:c:func:`xa_erase` or :c:func:`xa_release`. To use ID assignment, the
XArray must be defined with :c:func:`DEFINE_XARRAY_ALLOC`, or initialised
by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`,
Memory allocation
-----------------
The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`,
:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t
parameter in case the XArray needs to allocate memory to store this entry.
If the entry is being deleted, no memory allocation needs to be performed,
and the GFP flags specified will be ignored.
It is possible for no memory to be allocatable, particularly if you pass
a restrictive set of GFP flags. In that case, the functions return a
special value which can be turned into an errno using :c:func:`xa_err`.
If you don't need to know exactly which error occurred, using
:c:func:`xa_is_err` is slightly more efficient.
Locking
-------
When using the Normal API, you do not have to worry about locking.
The XArray uses RCU and an internal spinlock to synchronise access:
No lock needed:
* :c:func:`xa_empty`
* :c:func:`xa_marked`
Takes RCU read lock:
* :c:func:`xa_load`
* :c:func:`xa_for_each`
* :c:func:`xa_find`
* :c:func:`xa_find_after`
* :c:func:`xa_extract`
* :c:func:`xa_get_mark`
Takes xa_lock internally:
* :c:func:`xa_store`
* :c:func:`xa_insert`
* :c:func:`xa_erase`
* :c:func:`xa_erase_bh`
* :c:func:`xa_erase_irq`
* :c:func:`xa_cmpxchg`
* :c:func:`xa_store_range`
* :c:func:`xa_alloc`
* :c:func:`xa_alloc_bh`
* :c:func:`xa_alloc_irq`
* :c:func:`xa_destroy`
* :c:func:`xa_set_mark`
* :c:func:`xa_clear_mark`
Assumes xa_lock held on entry:
* :c:func:`__xa_store`
* :c:func:`__xa_insert`
* :c:func:`__xa_erase`
* :c:func:`__xa_cmpxchg`
* :c:func:`__xa_alloc`
* :c:func:`__xa_set_mark`
* :c:func:`__xa_clear_mark`
If you want to take advantage of the lock to protect the data structures
that you are storing in the XArray, you can call :c:func:`xa_lock`
before calling :c:func:`xa_load`, then take a reference count on the
object you have found before calling :c:func:`xa_unlock`. This will
prevent stores from removing the object from the array between looking
up the object and incrementing the refcount. You can also use RCU to
avoid dereferencing freed memory, but an explanation of that is beyond
the scope of this document.
The XArray does not disable interrupts or softirqs while modifying
the array. It is safe to read the XArray from interrupt or softirq
context as the RCU lock provides enough protection.
If, for example, you want to store entries in the XArray in process
context and then erase them in softirq context, you can do that this way::
void foo_init(struct foo *foo)
{
xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
}
int foo_store(struct foo *foo, unsigned long index, void *entry)
{
int err;
xa_lock_bh(&foo->array);
err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
if (!err)
foo->count++;
xa_unlock_bh(&foo->array);
return err;
}
/* foo_erase() is only called from softirq context */
void foo_erase(struct foo *foo, unsigned long index)
{
xa_lock(&foo->array);
__xa_erase(&foo->array, index);
foo->count--;
xa_unlock(&foo->array);
}
If you are going to modify the XArray from interrupt or softirq context,
you need to initialise the array using :c:func:`xa_init_flags`, passing
``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
The above example also shows a common pattern of wanting to extend the
coverage of the xa_lock on the store side to protect some statistics
associated with the array.
Sharing the XArray with interrupt context is also possible, either
using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
in the interrupt handler. Some of the more common patterns have helper
functions such as :c:func:`xa_erase_bh` and :c:func:`xa_erase_irq`.
Sometimes you need to protect access to the XArray with a mutex because
that lock sits above another mutex in the locking hierarchy. That does
not entitle you to use functions like :c:func:`__xa_erase` without taking
the xa_lock; the xa_lock is used for lockdep validation and will be used
for other purposes in the future.
The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
available for situations where you look up an entry and want to atomically
set or clear a mark. It may be more efficient to use the advanced API
in this case, as it will save you from walking the tree twice.
Advanced API
============
The advanced API offers more flexibility and better performance at the
cost of an interface which can be harder to use and has fewer safeguards.
No locking is done for you by the advanced API, and you are required
to use the xa_lock while modifying the array. You can choose whether
to use the xa_lock or the RCU lock while doing read-only operations on
the array. You can mix advanced and normal operations on the same array;
indeed the normal API is implemented in terms of the advanced API. The
advanced API is only available to modules with a GPL-compatible license.
The advanced API is based around the xa_state. This is an opaque data
structure which you declare on the stack using the :c:func:`XA_STATE`
macro. This macro initialises the xa_state ready to start walking
around the XArray. It is used as a cursor to maintain the position
in the XArray and let you compose various operations together without
having to restart from the top every time.
The xa_state is also used to store errors. You can call
:c:func:`xas_error` to retrieve the error. All operations check whether
the xa_state is in an error state before proceeding, so there's no need
for you to check for an error after each call; you can make multiple
calls in succession and only check at a convenient point. The only
errors currently generated by the XArray code itself are ``ENOMEM`` and
``EINVAL``, but it supports arbitrary errors in case you want to call
:c:func:`xas_set_err` yourself.
If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
will attempt to allocate more memory using the specified gfp flags and
cache it in the xa_state for the next attempt. The idea is that you take
the xa_lock, attempt the operation and drop the lock. The operation
attempts to allocate memory while holding the lock, but it is more
likely to fail. Once you have dropped the lock, :c:func:`xas_nomem`
can try harder to allocate more memory. It will return ``true`` if it
is worth retrying the operation (i.e. that there was a memory error *and*
more memory was allocated). If it has previously allocated memory, and
that memory wasn't used, and there is no error (or some error that isn't
``ENOMEM``), then it will free the memory previously allocated.
Internal Entries
----------------
The XArray reserves some entries for its own purposes. These are never
exposed through the normal API, but when using the advanced API, it's
possible to see them. Usually the best way to handle them is to pass them
to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
.. flat-table::
:widths: 1 1 6
* - Name
- Test
- Usage
* - Node
- :c:func:`xa_is_node`
- An XArray node. May be visible when using a multi-index xa_state.
* - Sibling
- :c:func:`xa_is_sibling`
- A non-canonical entry for a multi-index entry. The value indicates
which slot in this node has the canonical entry.
* - Retry
- :c:func:`xa_is_retry`
- This entry is currently being modified by a thread which has the
xa_lock. The node containing this entry may be freed at the end
of this RCU period. You should restart the lookup from the head
of the array.
* - Zero
- :c:func:`xa_is_zero`
- Zero entries appear as ``NULL`` through the Normal API, but occupy
an entry in the XArray which can be used to reserve the index for
future use.
Other internal entries may be added in the future. As far as possible, they
will be handled by :c:func:`xas_retry`.
Additional functionality
------------------------
The :c:func:`xas_create_range` function allocates all the necessary memory
to store every entry in a range. It will set ENOMEM in the xa_state if
it cannot allocate memory.
You can use :c:func:`xas_init_marks` to reset the marks on an entry
to their default state. This is usually all marks clear, unless the
XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
and all other marks are clear. Replacing one entry with another using
:c:func:`xas_store` will not reset the marks on that entry; if you want
the marks reset, you should do that explicitly.
The :c:func:`xas_load` will walk the xa_state as close to the entry
as it can. If you know the xa_state has already been walked to the
entry and need to check that the entry hasn't changed, you can use
:c:func:`xas_reload` to save a function call.
If you need to move to a different index in the XArray, call
:c:func:`xas_set`. This resets the cursor to the top of the tree, which
will generally make the next operation walk the cursor to the desired
spot in the tree. If you want to move to the next or previous index,
call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does
not walk the cursor around the array so does not require a lock to be
held, while moving to the next or previous index does.
You can search for the next present entry using :c:func:`xas_find`. This
is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
if the cursor has been walked to an entry, then it will find the next
entry after the one currently referenced. If not, it will return the
entry at the index of the xa_state. Using :c:func:`xas_next_entry` to
move to the next present entry instead of :c:func:`xas_find` will save
a function call in the majority of cases at the expense of emitting more
inline code.
The :c:func:`xas_find_marked` function is similar. If the xa_state has
not been walked, it will return the entry at the index of the xa_state,
if it is marked. Otherwise, it will return the first marked entry after
the entry referenced by the xa_state. The :c:func:`xas_next_marked`
function is the equivalent of :c:func:`xas_next_entry`.
When iterating over a range of the XArray using :c:func:`xas_for_each`
or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
the iteration. The :c:func:`xas_pause` function exists for this purpose.
After you have done the necessary work and wish to resume, the xa_state
is in an appropriate state to continue the iteration after the entry
you last processed. If you have interrupts disabled while iterating,
then it is good manners to pause the iteration and reenable interrupts
every ``XA_CHECK_SCHED`` entries.
The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
:c:func:`xas_clear_mark` functions require the xa_state cursor to have
been moved to the appropriate location in the xarray; they will do
nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
immediately before.
You can call :c:func:`xas_set_update` to have a callback function
called each time the XArray updates a node. This is used by the page
cache workingset code to maintain its list of nodes which contain only
shadow entries.
Multi-Index Entries
-------------------
The XArray has the ability to tie multiple indices together so that
operations on one index affect all indices. For example, storing into
any index will change the value of the entry retrieved from any index.
Setting or clearing a mark on any index will set or clear the mark
on every index that is tied together. The current implementation
only allows tying ranges which are aligned powers of two together;
eg indices 64-127 may be tied together, but 2-6 may not be. This may
save substantial quantities of memory; for example tying 512 entries
together will save over 4kB.
You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
Calling :c:func:`xas_load` with a multi-index xa_state will walk the
xa_state to the right location in the tree, but the return value is not
meaningful, potentially being an internal entry or ``NULL`` even when there
is an entry stored within the range. Calling :c:func:`xas_find_conflict`
will return the first entry within the range or ``NULL`` if there are no
entries in the range. The :c:func:`xas_for_each_conflict` iterator will
iterate over every entry which overlaps the specified range.
If :c:func:`xas_load` encounters a multi-index entry, the xa_index
in the xa_state will not be changed. When iterating over an XArray
or calling :c:func:`xas_find`, if the initial index is in the middle
of a multi-index entry, it will not be altered. Subsequent calls
or iterations will move the index to the first index in the range.
Each entry will only be returned once, no matter how many indices it
occupies.
Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
is not supported. Using either of these functions on a multi-index entry
will reveal sibling entries; these should be skipped over by the caller.
Storing ``NULL`` into any index of a multi-index entry will set the entry
at every index to ``NULL`` and dissolve the tie. Splitting a multi-index
entry into entries occupying smaller ranges is not yet supported.
Functions and structures
========================
.. kernel-doc:: include/linux/xarray.h
.. kernel-doc:: lib/xarray.c

View File

@ -535,7 +535,7 @@ F: Documentation/hwmon/adt7475
F: drivers/hwmon/adt7475.c
ADVANSYS SCSI DRIVER
M: Matthew Wilcox <matthew@wil.cx>
M: Matthew Wilcox <willy@infradead.org>
M: Hannes Reinecke <hare@suse.com>
L: linux-scsi@vger.kernel.org
S: Maintained
@ -4393,7 +4393,7 @@ S: Maintained
F: drivers/i2c/busses/i2c-diolan-u2c.c
FILESYSTEM DIRECT ACCESS (DAX)
M: Matthew Wilcox <mawilcox@microsoft.com>
M: Matthew Wilcox <willy@infradead.org>
M: Ross Zwisler <zwisler@kernel.org>
M: Jan Kara <jack@suse.cz>
L: linux-fsdevel@vger.kernel.org
@ -8697,7 +8697,7 @@ F: drivers/message/fusion/
F: drivers/scsi/mpt3sas/
LSILOGIC/SYMBIOS/NCR 53C8XX and 53C1010 PCI-SCSI drivers
M: Matthew Wilcox <matthew@wil.cx>
M: Matthew Wilcox <willy@infradead.org>
L: linux-scsi@vger.kernel.org
S: Maintained
F: drivers/scsi/sym53c8xx_2/
@ -16137,6 +16137,17 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/vdso
S: Maintained
F: arch/x86/entry/vdso/
XARRAY
M: Matthew Wilcox <willy@infradead.org>
L: linux-fsdevel@vger.kernel.org
S: Supported
F: Documentation/core-api/xarray.rst
F: lib/idr.c
F: lib/xarray.c
F: include/linux/idr.h
F: include/linux/xarray.h
F: tools/testing/radix-tree
XC2028/3028 TUNER DRIVER
M: Mauro Carvalho Chehab <mchehab@kernel.org>
L: linux-media@vger.kernel.org

View File

@ -2,7 +2,7 @@
* Linux/PA-RISC Project (http://www.parisc-linux.org/)
*
* System call entry code / Linux gateway page
* Copyright (c) Matthew Wilcox 1999 <willy@bofh.ai>
* Copyright (c) Matthew Wilcox 1999 <willy@infradead.org>
* Licensed under the GNU GPL.
* thanks to Philipp Rumpf, Mike Shaver and various others
* sorry about the wall, puffin..

View File

@ -716,9 +716,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY); \
} while (0)
/*
* on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
*/
#define SWP_TYPE_BITS 5
#define __swp_type(x) (((x).val >> _PAGE_BIT_SWAP_TYPE) \
& ((1UL << SWP_TYPE_BITS) - 1))

View File

@ -350,9 +350,7 @@ static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
#define MAX_SWAPFILES_CHECK() do { \
BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
} while (0)
/*
* on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
*/
#define SWP_TYPE_BITS 5
#define __swp_type(x) (((x).val >> _PAGE_BIT_SWAP_TYPE) \
& ((1UL << SWP_TYPE_BITS) - 1))

View File

@ -5996,7 +5996,8 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
count = __sg_page_count(sg);
while (idx + count <= n) {
unsigned long exception, i;
void *entry;
unsigned long i;
int ret;
/* If we cannot allocate and insert this entry, or the
@ -6011,12 +6012,9 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
if (ret && ret != -EEXIST)
goto scan;
exception =
RADIX_TREE_EXCEPTIONAL_ENTRY |
idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
entry = xa_mk_value(idx);
for (i = 1; i < count; i++) {
ret = radix_tree_insert(&iter->radix, idx + i,
(void *)exception);
ret = radix_tree_insert(&iter->radix, idx + i, entry);
if (ret && ret != -EEXIST)
goto scan;
}
@ -6054,15 +6052,14 @@ i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
GEM_BUG_ON(!sg);
/* If this index is in the middle of multi-page sg entry,
* the radixtree will contain an exceptional entry that points
* the radix tree will contain a value entry that points
* to the start of that range. We will return the pointer to
* the base page and the offset of this page within the
* sg entry's range.
*/
*offset = 0;
if (unlikely(radix_tree_exception(sg))) {
unsigned long base =
(unsigned long)sg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
if (unlikely(xa_is_value(sg))) {
unsigned long base = xa_to_value(sg);
sg = radix_tree_lookup(&iter->radix, base);
GEM_BUG_ON(!sg);

View File

@ -2,7 +2,7 @@
* linux/drivers/hil/hilkbd.c
*
* Copyright (C) 1998 Philip Blundell <philb@gnu.org>
* Copyright (C) 1999 Matthew Wilcox <willy@bofh.ai>
* Copyright (C) 1999 Matthew Wilcox <willy@infradead.org>
* Copyright (C) 1999-2007 Helge Deller <deller@gmx.de>
*
* Very basic HP Human Interface Loop (HIL) driver.

View File

@ -8,7 +8,7 @@
* Copyright (C) 2002 Hiroshi Aono (h-aono@ap.jp.nec.com)
* Copyright (C) 2002,2003 Takayoshi Kochi (t-kochi@bq.jp.nec.com)
* Copyright (C) 2002,2003 NEC Corporation
* Copyright (C) 2003-2005 Matthew Wilcox (matthew.wilcox@hp.com)
* Copyright (C) 2003-2005 Matthew Wilcox (willy@infradead.org)
* Copyright (C) 2003-2005 Hewlett Packard
*
* All rights reserved.

View File

@ -8,7 +8,7 @@
* Copyright (C) 2002 Hiroshi Aono (h-aono@ap.jp.nec.com)
* Copyright (C) 2002,2003 Takayoshi Kochi (t-kochi@bq.jp.nec.com)
* Copyright (C) 2002,2003 NEC Corporation
* Copyright (C) 2003-2005 Matthew Wilcox (matthew.wilcox@hp.com)
* Copyright (C) 2003-2005 Matthew Wilcox (willy@infradead.org)
* Copyright (C) 2003-2005 Hewlett Packard
*
* All rights reserved.
@ -40,7 +40,7 @@ bool acpiphp_disabled;
static struct acpiphp_attention_info *attention_info;
#define DRIVER_VERSION "0.5"
#define DRIVER_AUTHOR "Greg Kroah-Hartman <gregkh@us.ibm.com>, Takayoshi Kochi <t-kochi@bq.jp.nec.com>, Matthew Wilcox <willy@hp.com>"
#define DRIVER_AUTHOR "Greg Kroah-Hartman <gregkh@us.ibm.com>, Takayoshi Kochi <t-kochi@bq.jp.nec.com>, Matthew Wilcox <willy@infradead.org>"
#define DRIVER_DESC "ACPI Hot Plug PCI Controller Driver"
MODULE_AUTHOR(DRIVER_AUTHOR);

View File

@ -5,7 +5,7 @@
* Copyright (C) 2002,2003 Takayoshi Kochi (t-kochi@bq.jp.nec.com)
* Copyright (C) 2002 Hiroshi Aono (h-aono@ap.jp.nec.com)
* Copyright (C) 2002,2003 NEC Corporation
* Copyright (C) 2003-2005 Matthew Wilcox (matthew.wilcox@hp.com)
* Copyright (C) 2003-2005 Matthew Wilcox (willy@infradead.org)
* Copyright (C) 2003-2005 Hewlett Packard
* Copyright (C) 2005 Rajesh Shah (rajesh.shah@intel.com)
* Copyright (C) 2005 Intel Corporation

View File

@ -35,7 +35,6 @@ static atomic_long_t erofs_global_shrink_cnt;
#ifdef CONFIG_EROFS_FS_ZIP
/* radix_tree and the future XArray both don't use tagptr_t yet */
struct erofs_workgroup *erofs_find_workgroup(
struct super_block *sb, pgoff_t index, bool *tag)
{
@ -47,9 +46,8 @@ struct erofs_workgroup *erofs_find_workgroup(
rcu_read_lock();
grp = radix_tree_lookup(&sbi->workstn_tree, index);
if (grp != NULL) {
*tag = radix_tree_exceptional_entry(grp);
grp = (void *)((unsigned long)grp &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
*tag = xa_pointer_tag(grp);
grp = xa_untag_pointer(grp);
if (erofs_workgroup_get(grp, &oldcount)) {
/* prefer to relax rcu read side */
@ -83,9 +81,7 @@ int erofs_register_workgroup(struct super_block *sb,
sbi = EROFS_SB(sb);
erofs_workstn_lock(sbi);
if (tag)
grp = (void *)((unsigned long)grp |
1UL << RADIX_TREE_EXCEPTIONAL_SHIFT);
grp = xa_tag_pointer(grp, tag);
err = radix_tree_insert(&sbi->workstn_tree,
grp->index, grp);
@ -131,9 +127,7 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi,
for (i = 0; i < found; ++i) {
int cnt;
struct erofs_workgroup *grp = (void *)
((unsigned long)batch[i] &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1;
@ -150,8 +144,8 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi,
#endif
continue;
if (radix_tree_delete(&sbi->workstn_tree,
grp->index) != grp) {
if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree,
grp->index)) != grp) {
#ifdef EROFS_FS_HAS_MANAGED_CACHE
skip:
erofs_workgroup_unfreeze(grp, 1);

View File

@ -437,10 +437,8 @@ static noinline int add_ra_bio_pages(struct inode *inode,
if (pg_index > end_index)
break;
rcu_read_lock();
page = radix_tree_lookup(&mapping->i_pages, pg_index);
rcu_read_unlock();
if (page && !radix_tree_exceptional_entry(page)) {
page = xa_load(&mapping->i_pages, pg_index);
if (page && !xa_is_value(page)) {
misses++;
if (misses > 4)
break;

View File

@ -3784,7 +3784,7 @@ int btree_write_cache_pages(struct address_space *mapping,
pgoff_t index;
pgoff_t end; /* Inclusive */
int scanned = 0;
int tag;
xa_mark_t tag;
pagevec_init(&pvec);
if (wbc->range_cyclic) {
@ -3909,7 +3909,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
pgoff_t done_index;
int range_whole = 0;
int scanned = 0;
int tag;
xa_mark_t tag;
/*
* We have to hold onto the inode so that ordered extents can do their
@ -5159,11 +5159,9 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb)
clear_page_dirty_for_io(page);
xa_lock_irq(&page->mapping->i_pages);
if (!PageDirty(page)) {
radix_tree_tag_clear(&page->mapping->i_pages,
page_index(page),
PAGECACHE_TAG_DIRTY);
}
if (!PageDirty(page))
__xa_clear_mark(&page->mapping->i_pages,
page_index(page), PAGECACHE_TAG_DIRTY);
xa_unlock_irq(&page->mapping->i_pages);
ClearPageError(page);
unlock_page(page);

View File

@ -562,7 +562,7 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
EXPORT_SYMBOL(mark_buffer_dirty_inode);
/*
* Mark the page dirty, and set it dirty in the radix tree, and mark the inode
* Mark the page dirty, and set it dirty in the page cache, and mark the inode
* dirty.
*
* If warn is true, then emit a warning if the page is not uptodate and has
@ -579,8 +579,8 @@ void __set_page_dirty(struct page *page, struct address_space *mapping,
if (page->mapping) { /* Race with truncate? */
WARN_ON_ONCE(warn && !PageUptodate(page));
account_page_dirtied(page, mapping);
radix_tree_tag_set(&mapping->i_pages,
page_index(page), PAGECACHE_TAG_DIRTY);
__xa_set_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_DIRTY);
}
xa_unlock_irqrestore(&mapping->i_pages, flags);
}
@ -1050,7 +1050,7 @@ __getblk_slow(struct block_device *bdev, sector_t block,
* The relationship between dirty buffers and dirty pages:
*
* Whenever a page has any dirty buffers, the page's dirty bit is set, and
* the page is tagged dirty in its radix tree.
* the page is tagged dirty in the page cache.
*
* At all times, the dirtiness of the buffers represents the dirtiness of
* subsections of the page. If the page has buffers, the page dirty bit is
@ -1073,9 +1073,9 @@ __getblk_slow(struct block_device *bdev, sector_t block,
* mark_buffer_dirty - mark a buffer_head as needing writeout
* @bh: the buffer_head to mark dirty
*
* mark_buffer_dirty() will set the dirty bit against the buffer, then set its
* backing page dirty, then tag the page as dirty in its address_space's radix
* tree and then attach the address_space's inode to its superblock's dirty
* mark_buffer_dirty() will set the dirty bit against the buffer, then set
* its backing page dirty, then tag the page as dirty in the page cache
* and then attach the address_space's inode to its superblock's dirty
* inode list.
*
* mark_buffer_dirty() is atomic. It takes bh->b_page->mapping->private_lock,

925
fs/dax.c

File diff suppressed because it is too large Load Diff

View File

@ -2643,7 +2643,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
long left = mpd->wbc->nr_to_write;
pgoff_t index = mpd->first_page;
pgoff_t end = mpd->last_page;
int tag;
xa_mark_t tag;
int i, err = 0;
int blkbits = mpd->inode->i_blkbits;
ext4_lblk_t lblk;

View File

@ -2071,7 +2071,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
pgoff_t done_index;
int cycled;
int range_whole = 0;
int tag;
xa_mark_t tag;
int nwritten = 0;
pagevec_init(&pvec);
@ -2787,13 +2787,13 @@ const struct address_space_operations f2fs_dblock_aops = {
#endif
};
void f2fs_clear_radix_tree_dirty_tag(struct page *page)
void f2fs_clear_page_cache_dirty_tag(struct page *page)
{
struct address_space *mapping = page_mapping(page);
unsigned long flags;
xa_lock_irqsave(&mapping->i_pages, flags);
radix_tree_tag_clear(&mapping->i_pages, page_index(page),
__xa_clear_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(&mapping->i_pages, flags);
}

View File

@ -726,7 +726,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
if (bit_pos == NR_DENTRY_IN_BLOCK &&
!f2fs_truncate_hole(dir, page->index, page->index + 1)) {
f2fs_clear_radix_tree_dirty_tag(page);
f2fs_clear_page_cache_dirty_tag(page);
clear_page_dirty_for_io(page);
ClearPagePrivate(page);
ClearPageUptodate(page);

View File

@ -3108,7 +3108,7 @@ int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
#endif
bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
void f2fs_clear_radix_tree_dirty_tag(struct page *page);
void f2fs_clear_page_cache_dirty_tag(struct page *page);
/*
* gc.c

View File

@ -243,7 +243,7 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page)
kunmap_atomic(src_addr);
set_page_dirty(dn.inode_page);
f2fs_clear_radix_tree_dirty_tag(page);
f2fs_clear_page_cache_dirty_tag(page);
set_inode_flag(inode, FI_APPEND_WRITE);
set_inode_flag(inode, FI_DATA_EXIST);

View File

@ -101,7 +101,7 @@ bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
static void clear_node_page_dirty(struct page *page)
{
if (PageDirty(page)) {
f2fs_clear_radix_tree_dirty_tag(page);
f2fs_clear_page_cache_dirty_tag(page);
clear_page_dirty_for_io(page);
dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES);
}
@ -1306,9 +1306,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
if (f2fs_check_nid_range(sbi, nid))
return;
rcu_read_lock();
apage = radix_tree_lookup(&NODE_MAPPING(sbi)->i_pages, nid);
rcu_read_unlock();
apage = xa_load(&NODE_MAPPING(sbi)->i_pages, nid);
if (apage)
return;

View File

@ -339,9 +339,9 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
struct address_space *mapping = inode->i_mapping;
struct bdi_writeback *old_wb = inode->i_wb;
struct bdi_writeback *new_wb = isw->new_wb;
struct radix_tree_iter iter;
XA_STATE(xas, &mapping->i_pages, 0);
struct page *page;
bool switched = false;
void **slot;
/*
* By the time control reaches here, RCU grace period has passed
@ -375,25 +375,18 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
* to possibly dirty pages while PAGECACHE_TAG_WRITEBACK points to
* pages actually under writeback.
*/
radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, 0,
PAGECACHE_TAG_DIRTY) {
struct page *page = radix_tree_deref_slot_protected(slot,
&mapping->i_pages.xa_lock);
if (likely(page) && PageDirty(page)) {
xas_for_each_marked(&xas, page, ULONG_MAX, PAGECACHE_TAG_DIRTY) {
if (PageDirty(page)) {
dec_wb_stat(old_wb, WB_RECLAIMABLE);
inc_wb_stat(new_wb, WB_RECLAIMABLE);
}
}
radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, 0,
PAGECACHE_TAG_WRITEBACK) {
struct page *page = radix_tree_deref_slot_protected(slot,
&mapping->i_pages.xa_lock);
if (likely(page)) {
WARN_ON_ONCE(!PageWriteback(page));
dec_wb_stat(old_wb, WB_WRITEBACK);
inc_wb_stat(new_wb, WB_WRITEBACK);
}
xas_set(&xas, 0);
xas_for_each_marked(&xas, page, ULONG_MAX, PAGECACHE_TAG_WRITEBACK) {
WARN_ON_ONCE(!PageWriteback(page));
dec_wb_stat(old_wb, WB_WRITEBACK);
inc_wb_stat(new_wb, WB_WRITEBACK);
}
wb_get(new_wb);

View File

@ -366,7 +366,7 @@ static int gfs2_write_cache_jdata(struct address_space *mapping,
pgoff_t done_index;
int cycled;
int range_whole = 0;
int tag;
xa_mark_t tag;
pagevec_init(&pvec);
if (wbc->range_cyclic) {

View File

@ -349,7 +349,7 @@ EXPORT_SYMBOL(inc_nlink);
static void __address_space_init_once(struct address_space *mapping)
{
INIT_RADIX_TREE(&mapping->i_pages, GFP_ATOMIC | __GFP_ACCOUNT);
xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ);
init_rwsem(&mapping->i_mmap_rwsem);
INIT_LIST_HEAD(&mapping->private_list);
spin_lock_init(&mapping->private_lock);

View File

@ -46,7 +46,7 @@ int isofs_name_translate(struct iso_directory_record *de, char *new, struct inod
return i;
}
/* Acorn extensions written by Matthew Wilcox <willy@bofh.ai> 1998 */
/* Acorn extensions written by Matthew Wilcox <willy@infradead.org> 1998 */
int get_acorn_filename(struct iso_directory_record *de,
char *retname, struct inode *inode)
{

View File

@ -896,7 +896,7 @@ static u64 pnfs_num_cont_bytes(struct inode *inode, pgoff_t idx)
end = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
if (end != inode->i_mapping->nrpages) {
rcu_read_lock();
end = page_cache_next_hole(mapping, idx + 1, ULONG_MAX);
end = page_cache_next_miss(mapping, idx + 1, ULONG_MAX);
rcu_read_unlock();
}

View File

@ -168,24 +168,18 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
ctxt->newbh = NULL;
if (inode->i_blkbits == PAGE_SHIFT) {
lock_page(obh->b_page);
/*
* We cannot call radix_tree_preload for the kernels older
* than 2.6.23, because it is not exported for modules.
*/
struct page *opage = obh->b_page;
lock_page(opage);
retry:
err = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM);
if (err)
goto failed_unlock;
/* BUG_ON(oldkey != obh->b_page->index); */
if (unlikely(oldkey != obh->b_page->index))
NILFS_PAGE_BUG(obh->b_page,
if (unlikely(oldkey != opage->index))
NILFS_PAGE_BUG(opage,
"invalid oldkey %lld (newkey=%lld)",
(unsigned long long)oldkey,
(unsigned long long)newkey);
xa_lock_irq(&btnc->i_pages);
err = radix_tree_insert(&btnc->i_pages, newkey, obh->b_page);
err = __xa_insert(&btnc->i_pages, newkey, opage, GFP_NOFS);
xa_unlock_irq(&btnc->i_pages);
/*
* Note: page->index will not change to newkey until
@ -193,7 +187,6 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
* To protect the page in intermediate state, the page lock
* is held.
*/
radix_tree_preload_end();
if (!err)
return 0;
else if (err != -EEXIST)
@ -203,7 +196,7 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
if (!err)
goto retry;
/* fallback to copy mode */
unlock_page(obh->b_page);
unlock_page(opage);
}
nbh = nilfs_btnode_create_block(btnc, newkey);
@ -243,9 +236,8 @@ void nilfs_btnode_commit_change_key(struct address_space *btnc,
mark_buffer_dirty(obh);
xa_lock_irq(&btnc->i_pages);
radix_tree_delete(&btnc->i_pages, oldkey);
radix_tree_tag_set(&btnc->i_pages, newkey,
PAGECACHE_TAG_DIRTY);
__xa_erase(&btnc->i_pages, oldkey);
__xa_set_mark(&btnc->i_pages, newkey, PAGECACHE_TAG_DIRTY);
xa_unlock_irq(&btnc->i_pages);
opage->index = obh->b_blocknr = newkey;
@ -275,7 +267,7 @@ void nilfs_btnode_abort_change_key(struct address_space *btnc,
if (nbh == NULL) { /* blocksize == pagesize */
xa_lock_irq(&btnc->i_pages);
radix_tree_delete(&btnc->i_pages, newkey);
__xa_erase(&btnc->i_pages, newkey);
xa_unlock_irq(&btnc->i_pages);
unlock_page(ctxt->bh->b_page);
} else

View File

@ -289,7 +289,7 @@ int nilfs_copy_dirty_pages(struct address_space *dmap,
* @dmap: destination page cache
* @smap: source page cache
*
* No pages must no be added to the cache during this process.
* No pages must be added to the cache during this process.
* This must be ensured by the caller.
*/
void nilfs_copy_back_pages(struct address_space *dmap,
@ -298,7 +298,6 @@ void nilfs_copy_back_pages(struct address_space *dmap,
struct pagevec pvec;
unsigned int i, n;
pgoff_t index = 0;
int err;
pagevec_init(&pvec);
repeat:
@ -313,35 +312,34 @@ void nilfs_copy_back_pages(struct address_space *dmap,
lock_page(page);
dpage = find_lock_page(dmap, offset);
if (dpage) {
/* override existing page on the destination cache */
/* overwrite existing page in the destination cache */
WARN_ON(PageDirty(dpage));
nilfs_copy_page(dpage, page, 0);
unlock_page(dpage);
put_page(dpage);
/* Do we not need to remove page from smap here? */
} else {
struct page *page2;
struct page *p;
/* move the page to the destination cache */
xa_lock_irq(&smap->i_pages);
page2 = radix_tree_delete(&smap->i_pages, offset);
WARN_ON(page2 != page);
p = __xa_erase(&smap->i_pages, offset);
WARN_ON(page != p);
smap->nrpages--;
xa_unlock_irq(&smap->i_pages);
xa_lock_irq(&dmap->i_pages);
err = radix_tree_insert(&dmap->i_pages, offset, page);
if (unlikely(err < 0)) {
WARN_ON(err == -EEXIST);
p = __xa_store(&dmap->i_pages, offset, page, GFP_NOFS);
if (unlikely(p)) {
/* Probably -ENOMEM */
page->mapping = NULL;
put_page(page); /* for cache */
put_page(page);
} else {
page->mapping = dmap;
dmap->nrpages++;
if (PageDirty(page))
radix_tree_tag_set(&dmap->i_pages,
offset,
PAGECACHE_TAG_DIRTY);
__xa_set_mark(&dmap->i_pages, offset,
PAGECACHE_TAG_DIRTY);
}
xa_unlock_irq(&dmap->i_pages);
}
@ -467,8 +465,7 @@ int __nilfs_clear_page_dirty(struct page *page)
if (mapping) {
xa_lock_irq(&mapping->i_pages);
if (test_bit(PG_dirty, &page->flags)) {
radix_tree_tag_clear(&mapping->i_pages,
page_index(page),
__xa_clear_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_DIRTY);
xa_unlock_irq(&mapping->i_pages);
return clear_page_dirty_for_io(page);

View File

@ -521,7 +521,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
if (!page)
return;
if (radix_tree_exceptional_entry(page))
if (xa_is_value(page))
mss->swap += PAGE_SIZE;
else
put_page(page);

View File

@ -403,24 +403,40 @@ int pagecache_write_end(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
/**
* struct address_space - Contents of a cacheable, mappable object.
* @host: Owner, either the inode or the block_device.
* @i_pages: Cached pages.
* @gfp_mask: Memory allocation flags to use for allocating pages.
* @i_mmap_writable: Number of VM_SHARED mappings.
* @i_mmap: Tree of private and shared mappings.
* @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
* @nrpages: Number of page entries, protected by the i_pages lock.
* @nrexceptional: Shadow or DAX entries, protected by the i_pages lock.
* @writeback_index: Writeback starts here.
* @a_ops: Methods.
* @flags: Error bits and flags (AS_*).
* @wb_err: The most recent error which has occurred.
* @private_lock: For use by the owner of the address_space.
* @private_list: For use by the owner of the address_space.
* @private_data: For use by the owner of the address_space.
*/
struct address_space {
struct inode *host; /* owner: inode, block_device */
struct radix_tree_root i_pages; /* cached pages */
atomic_t i_mmap_writable;/* count VM_SHARED mappings */
struct rb_root_cached i_mmap; /* tree of private and shared mappings */
struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */
/* Protected by the i_pages lock */
unsigned long nrpages; /* number of total pages */
/* number of shadow or DAX exceptional entries */
struct inode *host;
struct xarray i_pages;
gfp_t gfp_mask;
atomic_t i_mmap_writable;
struct rb_root_cached i_mmap;
struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages;
unsigned long nrexceptional;
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
unsigned long flags; /* error bits */
spinlock_t private_lock; /* for use by the address_space */
gfp_t gfp_mask; /* implicit gfp mask for allocations */
struct list_head private_list; /* for use by the address_space */
void *private_data; /* ditto */
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
unsigned long flags;
errseq_t wb_err;
spinlock_t private_lock;
struct list_head private_list;
void *private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
* On most architectures that alignment is already the case; but
@ -467,15 +483,18 @@ struct block_device {
struct mutex bd_fsfreeze_mutex;
} __randomize_layout;
/*
* Radix-tree tags, for tagging dirty and writeback pages within the pagecache
* radix trees
*/
#define PAGECACHE_TAG_DIRTY 0
#define PAGECACHE_TAG_WRITEBACK 1
#define PAGECACHE_TAG_TOWRITE 2
/* XArray tags, for tagging dirty and writeback pages in the pagecache. */
#define PAGECACHE_TAG_DIRTY XA_MARK_0
#define PAGECACHE_TAG_WRITEBACK XA_MARK_1
#define PAGECACHE_TAG_TOWRITE XA_MARK_2
int mapping_tagged(struct address_space *mapping, int tag);
/*
* Returns true if any of the pages in the mapping are marked with the tag.
*/
static inline bool mapping_tagged(struct address_space *mapping, xa_mark_t tag)
{
return xa_marked(&mapping->i_pages, tag);
}
static inline void i_mmap_lock_write(struct address_space *mapping)
{

View File

@ -214,8 +214,7 @@ static inline void idr_preload_end(void)
++id, (entry) = idr_get_next((idr), &(id)))
/*
* IDA - IDR based id allocator, use when translation from id to
* pointer isn't necessary.
* IDA - ID Allocator, use when translation from id to pointer isn't necessary.
*/
#define IDA_CHUNK_SIZE 128 /* 128 bytes per chunk */
#define IDA_BITMAP_LONGS (IDA_CHUNK_SIZE / sizeof(long))
@ -225,14 +224,14 @@ struct ida_bitmap {
unsigned long bitmap[IDA_BITMAP_LONGS];
};
DECLARE_PER_CPU(struct ida_bitmap *, ida_bitmap);
struct ida {
struct radix_tree_root ida_rt;
struct xarray xa;
};
#define IDA_INIT_FLAGS (XA_FLAGS_LOCK_IRQ | XA_FLAGS_ALLOC)
#define IDA_INIT(name) { \
.ida_rt = RADIX_TREE_INIT(name, IDR_RT_MARKER | GFP_NOWAIT), \
.xa = XARRAY_INIT(name, IDA_INIT_FLAGS) \
}
#define DEFINE_IDA(name) struct ida name = IDA_INIT(name)
@ -292,7 +291,7 @@ static inline int ida_alloc_max(struct ida *ida, unsigned int max, gfp_t gfp)
static inline void ida_init(struct ida *ida)
{
INIT_RADIX_TREE(&ida->ida_rt, IDR_RT_MARKER | GFP_NOWAIT);
xa_init_flags(&ida->xa, IDA_INIT_FLAGS);
}
#define ida_simple_get(ida, start, end, gfp) \
@ -301,9 +300,6 @@ static inline void ida_init(struct ida *ida)
static inline bool ida_is_empty(const struct ida *ida)
{
return radix_tree_empty(&ida->ida_rt);
return xa_empty(&ida->xa);
}
/* in lib/radix-tree.c */
int ida_pre_get(struct ida *ida, gfp_t gfp_mask);
#endif /* __IDR_H__ */

View File

@ -241,9 +241,9 @@ static inline gfp_t readahead_gfp_mask(struct address_space *x)
typedef int filler_t(void *, struct page *);
pgoff_t page_cache_next_hole(struct address_space *mapping,
pgoff_t page_cache_next_miss(struct address_space *mapping,
pgoff_t index, unsigned long max_scan);
pgoff_t page_cache_prev_hole(struct address_space *mapping,
pgoff_t page_cache_prev_miss(struct address_space *mapping,
pgoff_t index, unsigned long max_scan);
#define FGP_ACCESSED 0x00000001
@ -363,17 +363,17 @@ static inline unsigned find_get_pages(struct address_space *mapping,
unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start,
unsigned int nr_pages, struct page **pages);
unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
pgoff_t end, int tag, unsigned int nr_pages,
pgoff_t end, xa_mark_t tag, unsigned int nr_pages,
struct page **pages);
static inline unsigned find_get_pages_tag(struct address_space *mapping,
pgoff_t *index, int tag, unsigned int nr_pages,
pgoff_t *index, xa_mark_t tag, unsigned int nr_pages,
struct page **pages)
{
return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag,
nr_pages, pages);
}
unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
int tag, unsigned int nr_entries,
xa_mark_t tag, unsigned int nr_entries,
struct page **entries, pgoff_t *indices);
struct page *grab_cache_page_write_begin(struct address_space *mapping,

View File

@ -9,6 +9,8 @@
#ifndef _LINUX_PAGEVEC_H
#define _LINUX_PAGEVEC_H
#include <linux/xarray.h>
/* 15 pointers + header align the pagevec structure to a power of two */
#define PAGEVEC_SIZE 15
@ -40,12 +42,12 @@ static inline unsigned pagevec_lookup(struct pagevec *pvec,
unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
int tag);
xa_mark_t tag);
unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
int tag, unsigned max_pages);
xa_mark_t tag, unsigned max_pages);
static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, int tag)
struct address_space *mapping, pgoff_t *index, xa_mark_t tag)
{
return pagevec_lookup_range_tag(pvec, mapping, index, (pgoff_t)-1, tag);
}

View File

@ -28,34 +28,30 @@
#include <linux/rcupdate.h>
#include <linux/spinlock.h>
#include <linux/types.h>
#include <linux/xarray.h>
/* Keep unconverted code working */
#define radix_tree_root xarray
#define radix_tree_node xa_node
/*
* The bottom two bits of the slot determine how the remaining bits in the
* slot are interpreted:
*
* 00 - data pointer
* 01 - internal entry
* 10 - exceptional entry
* 11 - this bit combination is currently unused/reserved
* 10 - internal entry
* x1 - value entry
*
* The internal entry may be a pointer to the next level in the tree, a
* sibling entry, or an indicator that the entry in this slot has been moved
* to another location in the tree and the lookup should be restarted. While
* NULL fits the 'data pointer' pattern, it means that there is no entry in
* the tree for this index (no matter what level of the tree it is found at).
* This means that you cannot store NULL in the tree as a value for the index.
* This means that storing a NULL entry in the tree is the same as deleting
* the entry from the tree.
*/
#define RADIX_TREE_ENTRY_MASK 3UL
#define RADIX_TREE_INTERNAL_NODE 1UL
/*
* Most users of the radix tree store pointers but shmem/tmpfs stores swap
* entries in the same tree. They are marked as exceptional entries to
* distinguish them from pointers to struct page.
* EXCEPTIONAL_ENTRY tests the bit, EXCEPTIONAL_SHIFT shifts content past it.
*/
#define RADIX_TREE_EXCEPTIONAL_ENTRY 2
#define RADIX_TREE_EXCEPTIONAL_SHIFT 2
#define RADIX_TREE_INTERNAL_NODE 2UL
static inline bool radix_tree_is_internal_node(void *ptr)
{
@ -65,75 +61,32 @@ static inline bool radix_tree_is_internal_node(void *ptr)
/*** radix-tree API starts here ***/
#define RADIX_TREE_MAX_TAGS 3
#ifndef RADIX_TREE_MAP_SHIFT
#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
#endif
#define RADIX_TREE_MAP_SHIFT XA_CHUNK_SHIFT
#define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT)
#define RADIX_TREE_MAP_MASK (RADIX_TREE_MAP_SIZE-1)
#define RADIX_TREE_TAG_LONGS \
((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)
#define RADIX_TREE_MAX_TAGS XA_MAX_MARKS
#define RADIX_TREE_TAG_LONGS XA_MARK_LONGS
#define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long))
#define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
RADIX_TREE_MAP_SHIFT))
/*
* @count is the count of every non-NULL element in the ->slots array
* whether that is an exceptional entry, a retry entry, a user pointer,
* a sibling entry or a pointer to the next level of the tree.
* @exceptional is the count of every element in ->slots which is
* either radix_tree_exceptional_entry() or is a sibling entry for an
* exceptional entry.
*/
struct radix_tree_node {
unsigned char shift; /* Bits remaining in each slot */
unsigned char offset; /* Slot offset in parent */
unsigned char count; /* Total entry count */
unsigned char exceptional; /* Exceptional entry count */
struct radix_tree_node *parent; /* Used when ascending tree */
struct radix_tree_root *root; /* The tree we belong to */
union {
struct list_head private_list; /* For tree user */
struct rcu_head rcu_head; /* Used when freeing node */
};
void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};
/* The IDR tag is stored in the low bits of the GFP flags */
/* The IDR tag is stored in the low bits of xa_flags */
#define ROOT_IS_IDR ((__force gfp_t)4)
/* The top bits of gfp_mask are used to store the root tags */
/* The top bits of xa_flags are used to store the root tags */
#define ROOT_TAG_SHIFT (__GFP_BITS_SHIFT)
struct radix_tree_root {
spinlock_t xa_lock;
gfp_t gfp_mask;
struct radix_tree_node __rcu *rnode;
};
#define RADIX_TREE_INIT(name, mask) { \
.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock), \
.gfp_mask = (mask), \
.rnode = NULL, \
}
#define RADIX_TREE_INIT(name, mask) XARRAY_INIT(name, mask)
#define RADIX_TREE(name, mask) \
struct radix_tree_root name = RADIX_TREE_INIT(name, mask)
#define INIT_RADIX_TREE(root, mask) \
do { \
spin_lock_init(&(root)->xa_lock); \
(root)->gfp_mask = (mask); \
(root)->rnode = NULL; \
} while (0)
#define INIT_RADIX_TREE(root, mask) xa_init_flags(root, mask)
static inline bool radix_tree_empty(const struct radix_tree_root *root)
{
return root->rnode == NULL;
return root->xa_head == NULL;
}
/**
@ -143,7 +96,6 @@ static inline bool radix_tree_empty(const struct radix_tree_root *root)
* @next_index: one beyond the last index for this chunk
* @tags: bit-mask for tag-iterating
* @node: node that contains current slot
* @shift: shift for the node that holds our slots
*
* This radix tree iterator works in terms of "chunks" of slots. A chunk is a
* subinterval of slots contained within one radix tree leaf node. It is
@ -157,20 +109,8 @@ struct radix_tree_iter {
unsigned long next_index;
unsigned long tags;
struct radix_tree_node *node;
#ifdef CONFIG_RADIX_TREE_MULTIORDER
unsigned int shift;
#endif
};
static inline unsigned int iter_shift(const struct radix_tree_iter *iter)
{
#ifdef CONFIG_RADIX_TREE_MULTIORDER
return iter->shift;
#else
return 0;
#endif
}
/**
* Radix-tree synchronization
*
@ -194,12 +134,11 @@ static inline unsigned int iter_shift(const struct radix_tree_iter *iter)
* radix_tree_lookup_slot
* radix_tree_tag_get
* radix_tree_gang_lookup
* radix_tree_gang_lookup_slot
* radix_tree_gang_lookup_tag
* radix_tree_gang_lookup_tag_slot
* radix_tree_tagged
*
* The first 8 functions are able to be called locklessly, using RCU. The
* The first 7 functions are able to be called locklessly, using RCU. The
* caller must ensure calls to these functions are made within rcu_read_lock()
* regions. Other readers (lock-free or otherwise) and modifications may be
* running concurrently.
@ -268,17 +207,6 @@ static inline int radix_tree_deref_retry(void *arg)
return unlikely(radix_tree_is_internal_node(arg));
}
/**
* radix_tree_exceptional_entry - radix_tree_deref_slot gave exceptional entry?
* @arg: value returned by radix_tree_deref_slot
* Returns: 0 if well-aligned pointer, non-0 if exceptional entry.
*/
static inline int radix_tree_exceptional_entry(void *arg)
{
/* Not unlikely because radix_tree_exception often tested first */
return (unsigned long)arg & RADIX_TREE_EXCEPTIONAL_ENTRY;
}
/**
* radix_tree_exception - radix_tree_deref_slot returned either exception?
* @arg: value returned by radix_tree_deref_slot
@ -289,47 +217,28 @@ static inline int radix_tree_exception(void *arg)
return unlikely((unsigned long)arg & RADIX_TREE_ENTRY_MASK);
}
int __radix_tree_create(struct radix_tree_root *, unsigned long index,
unsigned order, struct radix_tree_node **nodep,
void __rcu ***slotp);
int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
unsigned order, void *);
static inline int radix_tree_insert(struct radix_tree_root *root,
unsigned long index, void *entry)
{
return __radix_tree_insert(root, index, 0, entry);
}
int radix_tree_insert(struct radix_tree_root *, unsigned long index,
void *);
void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index,
struct radix_tree_node **nodep, void __rcu ***slotp);
void *radix_tree_lookup(const struct radix_tree_root *, unsigned long);
void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *,
unsigned long index);
typedef void (*radix_tree_update_node_t)(struct radix_tree_node *);
void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *,
void __rcu **slot, void *entry,
radix_tree_update_node_t update_node);
void __rcu **slot, void *entry);
void radix_tree_iter_replace(struct radix_tree_root *,
const struct radix_tree_iter *, void __rcu **slot, void *entry);
void radix_tree_replace_slot(struct radix_tree_root *,
void __rcu **slot, void *entry);
void __radix_tree_delete_node(struct radix_tree_root *,
struct radix_tree_node *,
radix_tree_update_node_t update_node);
void radix_tree_iter_delete(struct radix_tree_root *,
struct radix_tree_iter *iter, void __rcu **slot);
void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
void *radix_tree_delete(struct radix_tree_root *, unsigned long);
void radix_tree_clear_tags(struct radix_tree_root *, struct radix_tree_node *,
void __rcu **slot);
unsigned int radix_tree_gang_lookup(const struct radix_tree_root *,
void **results, unsigned long first_index,
unsigned int max_items);
unsigned int radix_tree_gang_lookup_slot(const struct radix_tree_root *,
void __rcu ***results, unsigned long *indices,
unsigned long first_index, unsigned int max_items);
int radix_tree_preload(gfp_t gfp_mask);
int radix_tree_maybe_preload(gfp_t gfp_mask);
int radix_tree_maybe_preload_order(gfp_t gfp_mask, int order);
void radix_tree_init(void);
void *radix_tree_tag_set(struct radix_tree_root *,
unsigned long index, unsigned int tag);
@ -337,8 +246,6 @@ void *radix_tree_tag_clear(struct radix_tree_root *,
unsigned long index, unsigned int tag);
int radix_tree_tag_get(const struct radix_tree_root *,
unsigned long index, unsigned int tag);
void radix_tree_iter_tag_set(struct radix_tree_root *,
const struct radix_tree_iter *iter, unsigned int tag);
void radix_tree_iter_tag_clear(struct radix_tree_root *,
const struct radix_tree_iter *iter, unsigned int tag);
unsigned int radix_tree_gang_lookup_tag(const struct radix_tree_root *,
@ -354,12 +261,6 @@ static inline void radix_tree_preload_end(void)
preempt_enable();
}
int radix_tree_split_preload(unsigned old_order, unsigned new_order, gfp_t);
int radix_tree_split(struct radix_tree_root *, unsigned long index,
unsigned new_order);
int radix_tree_join(struct radix_tree_root *, unsigned long index,
unsigned new_order, void *);
void __rcu **idr_get_free(struct radix_tree_root *root,
struct radix_tree_iter *iter, gfp_t gfp,
unsigned long max);
@ -465,7 +366,7 @@ void __rcu **radix_tree_iter_retry(struct radix_tree_iter *iter)
static inline unsigned long
__radix_tree_iter_add(struct radix_tree_iter *iter, unsigned long slots)
{
return iter->index + (slots << iter_shift(iter));
return iter->index + slots;
}
/**
@ -490,21 +391,9 @@ void __rcu **__must_check radix_tree_iter_resume(void __rcu **slot,
static __always_inline long
radix_tree_chunk_size(struct radix_tree_iter *iter)
{
return (iter->next_index - iter->index) >> iter_shift(iter);
return iter->next_index - iter->index;
}
#ifdef CONFIG_RADIX_TREE_MULTIORDER
void __rcu **__radix_tree_next_slot(void __rcu **slot,
struct radix_tree_iter *iter, unsigned flags);
#else
/* Can't happen without sibling entries, but the compiler can't tell that */
static inline void __rcu **__radix_tree_next_slot(void __rcu **slot,
struct radix_tree_iter *iter, unsigned flags)
{
return slot;
}
#endif
/**
* radix_tree_next_slot - find next slot in chunk
*
@ -563,8 +452,6 @@ static __always_inline void __rcu **radix_tree_next_slot(void __rcu **slot,
return NULL;
found:
if (unlikely(radix_tree_is_internal_node(rcu_dereference_raw(*slot))))
return __radix_tree_next_slot(slot, iter, flags);
return slot;
}
@ -583,23 +470,6 @@ static __always_inline void __rcu **radix_tree_next_slot(void __rcu **slot,
slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
slot = radix_tree_next_slot(slot, iter, 0))
/**
* radix_tree_for_each_contig - iterate over contiguous slots
*
* @slot: the void** variable for pointer to slot
* @root: the struct radix_tree_root pointer
* @iter: the struct radix_tree_iter pointer
* @start: iteration starting index
*
* @slot points to radix tree slot, @iter->index contains its index.
*/
#define radix_tree_for_each_contig(slot, root, iter, start) \
for (slot = radix_tree_iter_init(iter, start) ; \
slot || (slot = radix_tree_next_chunk(root, iter, \
RADIX_TREE_ITER_CONTIG)) ; \
slot = radix_tree_next_slot(slot, iter, \
RADIX_TREE_ITER_CONTIG))
/**
* radix_tree_for_each_tagged - iterate over tagged slots
*

View File

@ -300,17 +300,12 @@ void *workingset_eviction(struct address_space *mapping, struct page *page);
void workingset_refault(struct page *page, void *shadow);
void workingset_activation(struct page *page);
/* Do not use directly, use workingset_lookup_update */
void workingset_update_node(struct radix_tree_node *node);
/* Returns workingset_update_node() if the mapping has shadow entries. */
#define workingset_lookup_update(mapping) \
({ \
radix_tree_update_node_t __helper = workingset_update_node; \
if (dax_mapping(mapping) || shmem_mapping(mapping)) \
__helper = NULL; \
__helper; \
})
/* Only track the nodes of mappings with shadow entries */
void workingset_update_node(struct xa_node *node);
#define mapping_set_update(xas, mapping) do { \
if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \
xas_set_update(xas, workingset_update_node); \
} while (0)
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
@ -409,7 +404,7 @@ extern void show_swap_cache_info(void);
extern int add_to_swap(struct page *page);
extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
extern void __delete_from_swap_cache(struct page *);
extern void __delete_from_swap_cache(struct page *, swp_entry_t entry);
extern void delete_from_swap_cache(struct page *);
extern void free_page_and_swap_cache(struct page *);
extern void free_pages_and_swap_cache(struct page **, int);
@ -563,7 +558,8 @@ static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
return -1;
}
static inline void __delete_from_swap_cache(struct page *page)
static inline void __delete_from_swap_cache(struct page *page,
swp_entry_t entry)
{
}

View File

@ -18,9 +18,8 @@
*
* swp_entry_t's are *never* stored anywhere in their arch-dependent format.
*/
#define SWP_TYPE_SHIFT(e) ((sizeof(e.val) * 8) - \
(MAX_SWAPFILES_SHIFT + RADIX_TREE_EXCEPTIONAL_SHIFT))
#define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1)
#define SWP_TYPE_SHIFT (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT)
#define SWP_OFFSET_MASK ((1UL << SWP_TYPE_SHIFT) - 1)
/*
* Store a type+offset into a swp_entry_t in an arch-independent format
@ -29,8 +28,7 @@ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
{
swp_entry_t ret;
ret.val = (type << SWP_TYPE_SHIFT(ret)) |
(offset & SWP_OFFSET_MASK(ret));
ret.val = (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK);
return ret;
}
@ -40,7 +38,7 @@ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
*/
static inline unsigned swp_type(swp_entry_t entry)
{
return (entry.val >> SWP_TYPE_SHIFT(entry));
return (entry.val >> SWP_TYPE_SHIFT);
}
/*
@ -49,7 +47,7 @@ static inline unsigned swp_type(swp_entry_t entry)
*/
static inline pgoff_t swp_offset(swp_entry_t entry)
{
return entry.val & SWP_OFFSET_MASK(entry);
return entry.val & SWP_OFFSET_MASK;
}
#ifdef CONFIG_MMU
@ -90,16 +88,13 @@ static inline swp_entry_t radix_to_swp_entry(void *arg)
{
swp_entry_t entry;
entry.val = (unsigned long)arg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
entry.val = xa_to_value(arg);
return entry;
}
static inline void *swp_to_radix_entry(swp_entry_t entry)
{
unsigned long value;
value = entry.val << RADIX_TREE_EXCEPTIONAL_SHIFT;
return (void *)(value | RADIX_TREE_EXCEPTIONAL_ENTRY);
return xa_mk_value(entry.val);
}
#if IS_ENABLED(CONFIG_DEVICE_PRIVATE)

File diff suppressed because it is too large Load Diff

View File

@ -1,47 +1,21 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2015 Intel Corporation. All rights reserved. */
#include <linux/radix-tree.h>
#include <linux/device.h>
#include <linux/types.h>
#include <linux/pfn_t.h>
#include <linux/io.h>
#include <linux/kasan.h>
#include <linux/mm.h>
#include <linux/memory_hotplug.h>
#include <linux/mm.h>
#include <linux/pfn_t.h>
#include <linux/swap.h>
#include <linux/swapops.h>
#include <linux/types.h>
#include <linux/wait_bit.h>
#include <linux/xarray.h>
static DEFINE_MUTEX(pgmap_lock);
static RADIX_TREE(pgmap_radix, GFP_KERNEL);
static DEFINE_XARRAY(pgmap_array);
#define SECTION_MASK ~((1UL << PA_SECTION_SHIFT) - 1)
#define SECTION_SIZE (1UL << PA_SECTION_SHIFT)
static unsigned long order_at(struct resource *res, unsigned long pgoff)
{
unsigned long phys_pgoff = PHYS_PFN(res->start) + pgoff;
unsigned long nr_pages, mask;
nr_pages = PHYS_PFN(resource_size(res));
if (nr_pages == pgoff)
return ULONG_MAX;
/*
* What is the largest aligned power-of-2 range available from
* this resource pgoff to the end of the resource range,
* considering the alignment of the current pgoff?
*/
mask = phys_pgoff | rounddown_pow_of_two(nr_pages - pgoff);
if (!mask)
return ULONG_MAX;
return find_first_bit(&mask, BITS_PER_LONG);
}
#define foreach_order_pgoff(res, order, pgoff) \
for (pgoff = 0, order = order_at((res), pgoff); order < ULONG_MAX; \
pgoff += 1UL << order, order = order_at((res), pgoff))
#if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
unsigned long addr,
@ -70,18 +44,10 @@ vm_fault_t device_private_entry_fault(struct vm_area_struct *vma,
EXPORT_SYMBOL(device_private_entry_fault);
#endif /* CONFIG_DEVICE_PRIVATE */
static void pgmap_radix_release(struct resource *res, unsigned long end_pgoff)
static void pgmap_array_delete(struct resource *res)
{
unsigned long pgoff, order;
mutex_lock(&pgmap_lock);
foreach_order_pgoff(res, order, pgoff) {
if (pgoff >= end_pgoff)
break;
radix_tree_delete(&pgmap_radix, PHYS_PFN(res->start) + pgoff);
}
mutex_unlock(&pgmap_lock);
xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end),
NULL, GFP_KERNEL);
synchronize_rcu();
}
@ -142,7 +108,7 @@ static void devm_memremap_pages_release(void *data)
mem_hotplug_done();
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
pgmap_radix_release(res, -1);
pgmap_array_delete(res);
dev_WARN_ONCE(dev, pgmap->altmap.alloc,
"%s: failed to free all reserved pages\n", __func__);
}
@ -177,7 +143,6 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
struct resource *res = &pgmap->res;
struct dev_pagemap *conflict_pgmap;
pgprot_t pgprot = PAGE_KERNEL;
unsigned long pgoff, order;
int error, nid, is_ram;
align_start = res->start & ~(SECTION_SIZE - 1);
@ -216,20 +181,10 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
pgmap->dev = dev;
mutex_lock(&pgmap_lock);
error = 0;
foreach_order_pgoff(res, order, pgoff) {
error = __radix_tree_insert(&pgmap_radix,
PHYS_PFN(res->start) + pgoff, order, pgmap);
if (error) {
dev_err(dev, "%s: failed: %d\n", __func__, error);
break;
}
}
mutex_unlock(&pgmap_lock);
error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start),
PHYS_PFN(res->end), pgmap, GFP_KERNEL));
if (error)
goto err_radix;
goto err_array;
nid = dev_to_node(dev);
if (nid < 0)
@ -274,8 +229,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
err_kasan:
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
err_pfn_remap:
err_radix:
pgmap_radix_release(res, pgoff);
pgmap_array_delete(res);
err_array:
return ERR_PTR(error);
}
EXPORT_SYMBOL(devm_memremap_pages);
@ -315,7 +270,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
/* fall back to slow path lookup */
rcu_read_lock();
pgmap = radix_tree_lookup(&pgmap_radix, PHYS_PFN(phys));
pgmap = xa_load(&pgmap_array, PHYS_PFN(phys));
if (pgmap && !percpu_ref_tryget_live(pgmap->ref))
pgmap = NULL;
rcu_read_unlock();

View File

@ -399,8 +399,11 @@ config INTERVAL_TREE
for more information.
config RADIX_TREE_MULTIORDER
config XARRAY_MULTI
bool
help
Support entries which occupy multiple consecutive indices in the
XArray.
config ASSOCIATIVE_ARRAY
bool

View File

@ -1813,6 +1813,9 @@ config TEST_BITFIELD
config TEST_UUID
tristate "Test functions located in the uuid module at runtime"
config TEST_XARRAY
tristate "Test the XArray code at runtime"
config TEST_OVERFLOW
tristate "Test check_*_overflow() functions at runtime"

View File

@ -18,7 +18,7 @@ KCOV_INSTRUMENT_debugobjects.o := n
KCOV_INSTRUMENT_dynamic_debug.o := n
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o timerqueue.o\
rbtree.o radix-tree.o timerqueue.o xarray.o \
idr.o int_sqrt.o extable.o \
sha1.o chacha20.o irq_regs.o argv_split.o \
flex_proportions.o ratelimit.o show_mem.o \
@ -68,6 +68,7 @@ obj-$(CONFIG_TEST_PRINTF) += test_printf.o
obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
obj-$(CONFIG_TEST_BITFIELD) += test_bitfield.o
obj-$(CONFIG_TEST_UUID) += test_uuid.o
obj-$(CONFIG_TEST_XARRAY) += test_xarray.o
obj-$(CONFIG_TEST_PARMAN) += test_parman.o
obj-$(CONFIG_TEST_KMOD) += test_kmod.o
obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o

411
lib/idr.c
View File

@ -6,8 +6,6 @@
#include <linux/spinlock.h>
#include <linux/xarray.h>
DEFINE_PER_CPU(struct ida_bitmap *, ida_bitmap);
/**
* idr_alloc_u32() - Allocate an ID.
* @idr: IDR handle.
@ -39,10 +37,8 @@ int idr_alloc_u32(struct idr *idr, void *ptr, u32 *nextid,
unsigned int base = idr->idr_base;
unsigned int id = *nextid;
if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
return -EINVAL;
if (WARN_ON_ONCE(!(idr->idr_rt.gfp_mask & ROOT_IS_IDR)))
idr->idr_rt.gfp_mask |= IDR_RT_MARKER;
if (WARN_ON_ONCE(!(idr->idr_rt.xa_flags & ROOT_IS_IDR)))
idr->idr_rt.xa_flags |= IDR_RT_MARKER;
id = (id < base) ? 0 : id - base;
radix_tree_iter_init(&iter, id);
@ -295,15 +291,13 @@ void *idr_replace(struct idr *idr, void *ptr, unsigned long id)
void __rcu **slot = NULL;
void *entry;
if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
return ERR_PTR(-EINVAL);
id -= idr->idr_base;
entry = __radix_tree_lookup(&idr->idr_rt, id, &node, &slot);
if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
return ERR_PTR(-ENOENT);
__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
__radix_tree_replace(&idr->idr_rt, node, slot, ptr);
return entry;
}
@ -324,6 +318,9 @@ EXPORT_SYMBOL(idr_replace);
* free the individual IDs in it. You can use ida_is_empty() to find
* out whether the IDA has any IDs currently allocated.
*
* The IDA handles its own locking. It is safe to call any of the IDA
* functions without synchronisation in your code.
*
* IDs are currently limited to the range [0-INT_MAX]. If this is an awkward
* limitation, it should be quite straightforward to raise the maximum.
*/
@ -331,190 +328,38 @@ EXPORT_SYMBOL(idr_replace);
/*
* Developer's notes:
*
* The IDA uses the functionality provided by the IDR & radix tree to store
* bitmaps in each entry. The IDR_FREE tag means there is at least one bit
* free, unlike the IDR where it means at least one entry is free.
* The IDA uses the functionality provided by the XArray to store bitmaps in
* each entry. The XA_FREE_MARK is only cleared when all bits in the bitmap
* have been set.
*
* I considered telling the radix tree that each slot is an order-10 node
* and storing the bit numbers in the radix tree, but the radix tree can't
* allow a single multiorder entry at index 0, which would significantly
* increase memory consumption for the IDA. So instead we divide the index
* by the number of bits in the leaf bitmap before doing a radix tree lookup.
* I considered telling the XArray that each slot is an order-10 node
* and indexing by bit number, but the XArray can't allow a single multi-index
* entry in the head, which would significantly increase memory consumption
* for the IDA. So instead we divide the index by the number of bits in the
* leaf bitmap before doing a radix tree lookup.
*
* As an optimisation, if there are only a few low bits set in any given
* leaf, instead of allocating a 128-byte bitmap, we use the 'exceptional
* entry' functionality of the radix tree to store BITS_PER_LONG - 2 bits
* directly in the entry. By being really tricksy, we could store
* BITS_PER_LONG - 1 bits, but there're diminishing returns after optimising
* for 0-3 allocated IDs.
* leaf, instead of allocating a 128-byte bitmap, we store the bits
* as a value entry. Value entries never have the XA_FREE_MARK cleared
* because we can always convert them into a bitmap entry.
*
* We allow the radix tree 'exceptional' count to get out of date. Nothing
* in the IDA nor the radix tree code checks it. If it becomes important
* to maintain an accurate exceptional count, switch the rcu_assign_pointer()
* calls to radix_tree_iter_replace() which will correct the exceptional
* count.
* It would be possible to optimise further; once we've run out of a
* single 128-byte bitmap, we currently switch to a 576-byte node, put
* the 128-byte bitmap in the first entry and then start allocating extra
* 128-byte entries. We could instead use the 512 bytes of the node's
* data as a bitmap before moving to that scheme. I do not believe this
* is a worthwhile optimisation; Rasmus Villemoes surveyed the current
* users of the IDA and almost none of them use more than 1024 entries.
* Those that do use more than the 8192 IDs that the 512 bytes would
* provide.
*
* The IDA always requires a lock to alloc/free. If we add a 'test_bit'
* The IDA always uses a lock to alloc/free. If we add a 'test_bit'
* equivalent, it will still need locking. Going to RCU lookup would require
* using RCU to free bitmaps, and that's not trivial without embedding an
* RCU head in the bitmap, which adds a 2-pointer overhead to each 128-byte
* bitmap, which is excessive.
*/
#define IDA_MAX (0x80000000U / IDA_BITMAP_BITS - 1)
static int ida_get_new_above(struct ida *ida, int start)
{
struct radix_tree_root *root = &ida->ida_rt;
void __rcu **slot;
struct radix_tree_iter iter;
struct ida_bitmap *bitmap;
unsigned long index;
unsigned bit, ebit;
int new;
index = start / IDA_BITMAP_BITS;
bit = start % IDA_BITMAP_BITS;
ebit = bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
slot = radix_tree_iter_init(&iter, index);
for (;;) {
if (slot)
slot = radix_tree_next_slot(slot, &iter,
RADIX_TREE_ITER_TAGGED);
if (!slot) {
slot = idr_get_free(root, &iter, GFP_NOWAIT, IDA_MAX);
if (IS_ERR(slot)) {
if (slot == ERR_PTR(-ENOMEM))
return -EAGAIN;
return PTR_ERR(slot);
}
}
if (iter.index > index) {
bit = 0;
ebit = RADIX_TREE_EXCEPTIONAL_SHIFT;
}
new = iter.index * IDA_BITMAP_BITS;
bitmap = rcu_dereference_raw(*slot);
if (radix_tree_exception(bitmap)) {
unsigned long tmp = (unsigned long)bitmap;
ebit = find_next_zero_bit(&tmp, BITS_PER_LONG, ebit);
if (ebit < BITS_PER_LONG) {
tmp |= 1UL << ebit;
rcu_assign_pointer(*slot, (void *)tmp);
return new + ebit -
RADIX_TREE_EXCEPTIONAL_SHIFT;
}
bitmap = this_cpu_xchg(ida_bitmap, NULL);
if (!bitmap)
return -EAGAIN;
bitmap->bitmap[0] = tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT;
rcu_assign_pointer(*slot, bitmap);
}
if (bitmap) {
bit = find_next_zero_bit(bitmap->bitmap,
IDA_BITMAP_BITS, bit);
new += bit;
if (new < 0)
return -ENOSPC;
if (bit == IDA_BITMAP_BITS)
continue;
__set_bit(bit, bitmap->bitmap);
if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
radix_tree_iter_tag_clear(root, &iter,
IDR_FREE);
} else {
new += bit;
if (new < 0)
return -ENOSPC;
if (ebit < BITS_PER_LONG) {
bitmap = (void *)((1UL << ebit) |
RADIX_TREE_EXCEPTIONAL_ENTRY);
radix_tree_iter_replace(root, &iter, slot,
bitmap);
return new;
}
bitmap = this_cpu_xchg(ida_bitmap, NULL);
if (!bitmap)
return -EAGAIN;
__set_bit(bit, bitmap->bitmap);
radix_tree_iter_replace(root, &iter, slot, bitmap);
}
return new;
}
}
static void ida_remove(struct ida *ida, int id)
{
unsigned long index = id / IDA_BITMAP_BITS;
unsigned offset = id % IDA_BITMAP_BITS;
struct ida_bitmap *bitmap;
unsigned long *btmp;
struct radix_tree_iter iter;
void __rcu **slot;
slot = radix_tree_iter_lookup(&ida->ida_rt, &iter, index);
if (!slot)
goto err;
bitmap = rcu_dereference_raw(*slot);
if (radix_tree_exception(bitmap)) {
btmp = (unsigned long *)slot;
offset += RADIX_TREE_EXCEPTIONAL_SHIFT;
if (offset >= BITS_PER_LONG)
goto err;
} else {
btmp = bitmap->bitmap;
}
if (!test_bit(offset, btmp))
goto err;
__clear_bit(offset, btmp);
radix_tree_iter_tag_set(&ida->ida_rt, &iter, IDR_FREE);
if (radix_tree_exception(bitmap)) {
if (rcu_dereference_raw(*slot) ==
(void *)RADIX_TREE_EXCEPTIONAL_ENTRY)
radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
} else if (bitmap_empty(btmp, IDA_BITMAP_BITS)) {
kfree(bitmap);
radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
}
return;
err:
WARN(1, "ida_free called for id=%d which is not allocated.\n", id);
}
/**
* ida_destroy() - Free all IDs.
* @ida: IDA handle.
*
* Calling this function frees all IDs and releases all resources used
* by an IDA. When this call returns, the IDA is empty and can be reused
* or freed. If the IDA is already empty, there is no need to call this
* function.
*
* Context: Any context.
*/
void ida_destroy(struct ida *ida)
{
unsigned long flags;
struct radix_tree_iter iter;
void __rcu **slot;
xa_lock_irqsave(&ida->ida_rt, flags);
radix_tree_for_each_slot(slot, &ida->ida_rt, &iter, 0) {
struct ida_bitmap *bitmap = rcu_dereference_raw(*slot);
if (!radix_tree_exception(bitmap))
kfree(bitmap);
radix_tree_iter_delete(&ida->ida_rt, &iter, slot);
}
xa_unlock_irqrestore(&ida->ida_rt, flags);
}
EXPORT_SYMBOL(ida_destroy);
/**
* ida_alloc_range() - Allocate an unused ID.
* @ida: IDA handle.
@ -532,8 +377,10 @@ EXPORT_SYMBOL(ida_destroy);
int ida_alloc_range(struct ida *ida, unsigned int min, unsigned int max,
gfp_t gfp)
{
int id = 0;
XA_STATE(xas, &ida->xa, min / IDA_BITMAP_BITS);
unsigned bit = min % IDA_BITMAP_BITS;
unsigned long flags;
struct ida_bitmap *bitmap, *alloc = NULL;
if ((int)min < 0)
return -ENOSPC;
@ -541,22 +388,87 @@ int ida_alloc_range(struct ida *ida, unsigned int min, unsigned int max,
if ((int)max < 0)
max = INT_MAX;
again:
xa_lock_irqsave(&ida->ida_rt, flags);
id = ida_get_new_above(ida, min);
if (id > (int)max) {
ida_remove(ida, id);
id = -ENOSPC;
}
xa_unlock_irqrestore(&ida->ida_rt, flags);
retry:
xas_lock_irqsave(&xas, flags);
next:
bitmap = xas_find_marked(&xas, max / IDA_BITMAP_BITS, XA_FREE_MARK);
if (xas.xa_index > min / IDA_BITMAP_BITS)
bit = 0;
if (xas.xa_index * IDA_BITMAP_BITS + bit > max)
goto nospc;
if (unlikely(id == -EAGAIN)) {
if (!ida_pre_get(ida, gfp))
return -ENOMEM;
goto again;
if (xa_is_value(bitmap)) {
unsigned long tmp = xa_to_value(bitmap);
if (bit < BITS_PER_XA_VALUE) {
bit = find_next_zero_bit(&tmp, BITS_PER_XA_VALUE, bit);
if (xas.xa_index * IDA_BITMAP_BITS + bit > max)
goto nospc;
if (bit < BITS_PER_XA_VALUE) {
tmp |= 1UL << bit;
xas_store(&xas, xa_mk_value(tmp));
goto out;
}
}
bitmap = alloc;
if (!bitmap)
bitmap = kzalloc(sizeof(*bitmap), GFP_NOWAIT);
if (!bitmap)
goto alloc;
bitmap->bitmap[0] = tmp;
xas_store(&xas, bitmap);
if (xas_error(&xas)) {
bitmap->bitmap[0] = 0;
goto out;
}
}
return id;
if (bitmap) {
bit = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, bit);
if (xas.xa_index * IDA_BITMAP_BITS + bit > max)
goto nospc;
if (bit == IDA_BITMAP_BITS)
goto next;
__set_bit(bit, bitmap->bitmap);
if (bitmap_full(bitmap->bitmap, IDA_BITMAP_BITS))
xas_clear_mark(&xas, XA_FREE_MARK);
} else {
if (bit < BITS_PER_XA_VALUE) {
bitmap = xa_mk_value(1UL << bit);
} else {
bitmap = alloc;
if (!bitmap)
bitmap = kzalloc(sizeof(*bitmap), GFP_NOWAIT);
if (!bitmap)
goto alloc;
__set_bit(bit, bitmap->bitmap);
}
xas_store(&xas, bitmap);
}
out:
xas_unlock_irqrestore(&xas, flags);
if (xas_nomem(&xas, gfp)) {
xas.xa_index = min / IDA_BITMAP_BITS;
bit = min % IDA_BITMAP_BITS;
goto retry;
}
if (bitmap != alloc)
kfree(alloc);
if (xas_error(&xas))
return xas_error(&xas);
return xas.xa_index * IDA_BITMAP_BITS + bit;
alloc:
xas_unlock_irqrestore(&xas, flags);
alloc = kzalloc(sizeof(*bitmap), gfp);
if (!alloc)
return -ENOMEM;
xas_set(&xas, min / IDA_BITMAP_BITS);
bit = min % IDA_BITMAP_BITS;
goto retry;
nospc:
xas_unlock_irqrestore(&xas, flags);
return -ENOSPC;
}
EXPORT_SYMBOL(ida_alloc_range);
@ -569,11 +481,112 @@ EXPORT_SYMBOL(ida_alloc_range);
*/
void ida_free(struct ida *ida, unsigned int id)
{
XA_STATE(xas, &ida->xa, id / IDA_BITMAP_BITS);
unsigned bit = id % IDA_BITMAP_BITS;
struct ida_bitmap *bitmap;
unsigned long flags;
BUG_ON((int)id < 0);
xa_lock_irqsave(&ida->ida_rt, flags);
ida_remove(ida, id);
xa_unlock_irqrestore(&ida->ida_rt, flags);
xas_lock_irqsave(&xas, flags);
bitmap = xas_load(&xas);
if (xa_is_value(bitmap)) {
unsigned long v = xa_to_value(bitmap);
if (bit >= BITS_PER_XA_VALUE)
goto err;
if (!(v & (1UL << bit)))
goto err;
v &= ~(1UL << bit);
if (!v)
goto delete;
xas_store(&xas, xa_mk_value(v));
} else {
if (!test_bit(bit, bitmap->bitmap))
goto err;
__clear_bit(bit, bitmap->bitmap);
xas_set_mark(&xas, XA_FREE_MARK);
if (bitmap_empty(bitmap->bitmap, IDA_BITMAP_BITS)) {
kfree(bitmap);
delete:
xas_store(&xas, NULL);
}
}
xas_unlock_irqrestore(&xas, flags);
return;
err:
xas_unlock_irqrestore(&xas, flags);
WARN(1, "ida_free called for id=%d which is not allocated.\n", id);
}
EXPORT_SYMBOL(ida_free);
/**
* ida_destroy() - Free all IDs.
* @ida: IDA handle.
*
* Calling this function frees all IDs and releases all resources used
* by an IDA. When this call returns, the IDA is empty and can be reused
* or freed. If the IDA is already empty, there is no need to call this
* function.
*
* Context: Any context.
*/
void ida_destroy(struct ida *ida)
{
XA_STATE(xas, &ida->xa, 0);
struct ida_bitmap *bitmap;
unsigned long flags;
xas_lock_irqsave(&xas, flags);
xas_for_each(&xas, bitmap, ULONG_MAX) {
if (!xa_is_value(bitmap))
kfree(bitmap);
xas_store(&xas, NULL);
}
xas_unlock_irqrestore(&xas, flags);
}
EXPORT_SYMBOL(ida_destroy);
#ifndef __KERNEL__
extern void xa_dump_index(unsigned long index, unsigned int shift);
#define IDA_CHUNK_SHIFT ilog2(IDA_BITMAP_BITS)
static void ida_dump_entry(void *entry, unsigned long index)
{
unsigned long i;
if (!entry)
return;
if (xa_is_node(entry)) {
struct xa_node *node = xa_to_node(entry);
unsigned int shift = node->shift + IDA_CHUNK_SHIFT +
XA_CHUNK_SHIFT;
xa_dump_index(index * IDA_BITMAP_BITS, shift);
xa_dump_node(node);
for (i = 0; i < XA_CHUNK_SIZE; i++)
ida_dump_entry(node->slots[i],
index | (i << node->shift));
} else if (xa_is_value(entry)) {
xa_dump_index(index * IDA_BITMAP_BITS, ilog2(BITS_PER_LONG));
pr_cont("value: data %lx [%px]\n", xa_to_value(entry), entry);
} else {
struct ida_bitmap *bitmap = entry;
xa_dump_index(index * IDA_BITMAP_BITS, IDA_CHUNK_SHIFT);
pr_cont("bitmap: %p data", bitmap);
for (i = 0; i < IDA_BITMAP_LONGS; i++)
pr_cont(" %lx", bitmap->bitmap[i]);
pr_cont("\n");
}
}
static void ida_dump(struct ida *ida)
{
struct xarray *xa = &ida->xa;
pr_debug("ida: %p node %p free %d\n", ida, xa->xa_head,
xa->xa_flags >> ROOT_TAG_SHIFT);
ida_dump_entry(xa->xa_head, 0);
}
#endif

File diff suppressed because it is too large Load Diff

1238
lib/test_xarray.c Normal file

File diff suppressed because it is too large Load Diff

2036
lib/xarray.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -379,7 +379,7 @@ config TRANSPARENT_HUGEPAGE
bool "Transparent Hugepage Support"
depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE
select COMPACTION
select RADIX_TREE_MULTIORDER
select XARRAY_MULTI
help
Transparent Hugepages allows the kernel to use huge pages and
huge tlb transparently to the applications whenever possible.
@ -671,7 +671,7 @@ config ZONE_DEVICE
depends on MEMORY_HOTREMOVE
depends on SPARSEMEM_VMEMMAP
depends on ARCH_HAS_ZONE_DEVICE
select RADIX_TREE_MULTIORDER
select XARRAY_MULTI
help
Device memory hotplug support allows for establishing pmem,

File diff suppressed because it is too large Load Diff

View File

@ -2450,13 +2450,13 @@ static void __split_huge_page(struct page *page, struct list_head *list,
ClearPageCompound(head);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to radix tree of swap cache */
/* Additional pin to swap cache */
if (PageSwapCache(head))
page_ref_add(head, 2);
else
page_ref_inc(head);
} else {
/* Additional pin to radix tree */
/* Additional pin to page cache */
page_ref_add(head, 2);
xa_unlock(&head->mapping->i_pages);
}
@ -2568,7 +2568,7 @@ bool can_split_huge_page(struct page *page, int *pextra_pins)
{
int extra_pins;
/* Additional pins from radix tree */
/* Additional pins from page cache */
if (PageAnon(page))
extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
else
@ -2664,17 +2664,14 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
spin_lock_irqsave(zone_lru_lock(page_zone(head)), flags);
if (mapping) {
void **pslot;
XA_STATE(xas, &mapping->i_pages, page_index(head));
xa_lock(&mapping->i_pages);
pslot = radix_tree_lookup_slot(&mapping->i_pages,
page_index(head));
/*
* Check if the head page is present in radix tree.
* Check if the head page is present in page cache.
* We assume all tail are present too, if head is there.
*/
if (radix_tree_deref_slot_protected(pslot,
&mapping->i_pages.xa_lock) != head)
xa_lock(&mapping->i_pages);
if (xas_load(&xas) != head)
goto fail;
}

View File

@ -1288,17 +1288,17 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
*
* Basic scheme is simple, details are more complex:
* - allocate and freeze a new huge page;
* - scan over radix tree replacing old pages the new one
* - scan page cache replacing old pages with the new one
* + swap in pages if necessary;
* + fill in gaps;
* + keep old pages around in case if rollback is required;
* - if replacing succeed:
* + keep old pages around in case rollback is required;
* - if replacing succeeds:
* + copy data over;
* + free old pages;
* + unfreeze huge page;
* - if replacing failed;
* + put all pages back and unfreeze them;
* + restore gaps in the radix-tree;
* + restore gaps in the page cache;
* + free huge page;
*/
static void collapse_shmem(struct mm_struct *mm,
@ -1306,12 +1306,11 @@ static void collapse_shmem(struct mm_struct *mm,
struct page **hpage, int node)
{
gfp_t gfp;
struct page *page, *new_page, *tmp;
struct page *new_page;
struct mem_cgroup *memcg;
pgoff_t index, end = start + HPAGE_PMD_NR;
LIST_HEAD(pagelist);
struct radix_tree_iter iter;
void **slot;
XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
int nr_none = 0, result = SCAN_SUCCEED;
VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
@ -1336,48 +1335,49 @@ static void collapse_shmem(struct mm_struct *mm,
__SetPageLocked(new_page);
BUG_ON(!page_ref_freeze(new_page, 1));
/*
* At this point the new_page is 'frozen' (page_count() is zero), locked
* and not up-to-date. It's safe to insert it into radix tree, because
* nobody would be able to map it or use it in other way until we
* unfreeze it.
* At this point the new_page is 'frozen' (page_count() is zero),
* locked and not up-to-date. It's safe to insert it into the page
* cache, because nobody would be able to map it or use it in other
* way until we unfreeze it.
*/
index = start;
xa_lock_irq(&mapping->i_pages);
radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) {
int n = min(iter.index, end) - index;
/*
* Handle holes in the radix tree: charge it from shmem and
* insert relevant subpage of new_page into the radix-tree.
*/
if (n && !shmem_charge(mapping->host, n)) {
result = SCAN_FAIL;
/* This will be less messy when we use multi-index entries */
do {
xas_lock_irq(&xas);
xas_create_range(&xas);
if (!xas_error(&xas))
break;
}
nr_none += n;
for (; index < min(iter.index, end); index++) {
radix_tree_insert(&mapping->i_pages, index,
new_page + (index % HPAGE_PMD_NR));
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL))
goto out;
} while (1);
xas_set(&xas, start);
for (index = start; index < end; index++) {
struct page *page = xas_next(&xas);
VM_BUG_ON(index != xas.xa_index);
if (!page) {
if (!shmem_charge(mapping->host, 1)) {
result = SCAN_FAIL;
break;
}
xas_store(&xas, new_page + (index % HPAGE_PMD_NR));
nr_none++;
continue;
}
/* We are done. */
if (index >= end)
break;
page = radix_tree_deref_slot_protected(slot,
&mapping->i_pages.xa_lock);
if (radix_tree_exceptional_entry(page) || !PageUptodate(page)) {
xa_unlock_irq(&mapping->i_pages);
if (xa_is_value(page) || !PageUptodate(page)) {
xas_unlock_irq(&xas);
/* swap in or instantiate fallocated page */
if (shmem_getpage(mapping->host, index, &page,
SGP_NOHUGE)) {
result = SCAN_FAIL;
goto tree_unlocked;
goto xa_unlocked;
}
xa_lock_irq(&mapping->i_pages);
xas_lock_irq(&xas);
xas_set(&xas, index);
} else if (trylock_page(page)) {
get_page(page);
} else {
@ -1397,7 +1397,7 @@ static void collapse_shmem(struct mm_struct *mm,
result = SCAN_TRUNCATED;
goto out_unlock;
}
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
if (isolate_lru_page(page)) {
result = SCAN_DEL_PAGE_LRU;
@ -1407,17 +1407,16 @@ static void collapse_shmem(struct mm_struct *mm,
if (page_mapped(page))
unmap_mapping_pages(mapping, index, 1, false);
xa_lock_irq(&mapping->i_pages);
xas_lock_irq(&xas);
xas_set(&xas, index);
slot = radix_tree_lookup_slot(&mapping->i_pages, index);
VM_BUG_ON_PAGE(page != radix_tree_deref_slot_protected(slot,
&mapping->i_pages.xa_lock), page);
VM_BUG_ON_PAGE(page != xas_load(&xas), page);
VM_BUG_ON_PAGE(page_mapped(page), page);
/*
* The page is expected to have page_count() == 3:
* - we hold a pin on it;
* - one reference from radix tree;
* - one reference from page cache;
* - one from isolate_lru_page;
*/
if (!page_ref_freeze(page, 3)) {
@ -1432,56 +1431,30 @@ static void collapse_shmem(struct mm_struct *mm,
list_add_tail(&page->lru, &pagelist);
/* Finally, replace with the new page. */
radix_tree_replace_slot(&mapping->i_pages, slot,
new_page + (index % HPAGE_PMD_NR));
slot = radix_tree_iter_resume(slot, &iter);
index++;
xas_store(&xas, new_page + (index % HPAGE_PMD_NR));
continue;
out_lru:
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
putback_lru_page(page);
out_isolate_failed:
unlock_page(page);
put_page(page);
goto tree_unlocked;
goto xa_unlocked;
out_unlock:
unlock_page(page);
put_page(page);
break;
}
xas_unlock_irq(&xas);
/*
* Handle hole in radix tree at the end of the range.
* This code only triggers if there's nothing in radix tree
* beyond 'end'.
*/
if (result == SCAN_SUCCEED && index < end) {
int n = end - index;
if (!shmem_charge(mapping->host, n)) {
result = SCAN_FAIL;
goto tree_locked;
}
for (; index < end; index++) {
radix_tree_insert(&mapping->i_pages, index,
new_page + (index % HPAGE_PMD_NR));
}
nr_none += n;
}
tree_locked:
xa_unlock_irq(&mapping->i_pages);
tree_unlocked:
xa_unlocked:
if (result == SCAN_SUCCEED) {
unsigned long flags;
struct page *page, *tmp;
struct zone *zone = page_zone(new_page);
/*
* Replacing old pages with new one has succeed, now we need to
* copy the content and free old pages.
* Replacing old pages with new one has succeeded, now we
* need to copy the content and free the old pages.
*/
list_for_each_entry_safe(page, tmp, &pagelist, lru) {
copy_highpage(new_page + (page->index % HPAGE_PMD_NR),
@ -1495,16 +1468,16 @@ static void collapse_shmem(struct mm_struct *mm,
put_page(page);
}
local_irq_save(flags);
local_irq_disable();
__inc_node_page_state(new_page, NR_SHMEM_THPS);
if (nr_none) {
__mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none);
__mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none);
}
local_irq_restore(flags);
local_irq_enable();
/*
* Remove pte page tables, so we can re-faulti
* Remove pte page tables, so we can re-fault
* the page as huge.
*/
retract_page_tables(mapping, start);
@ -1521,37 +1494,37 @@ static void collapse_shmem(struct mm_struct *mm,
khugepaged_pages_collapsed++;
} else {
/* Something went wrong: rollback changes to the radix-tree */
struct page *page;
/* Something went wrong: roll back page cache changes */
shmem_uncharge(mapping->host, nr_none);
xa_lock_irq(&mapping->i_pages);
radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) {
if (iter.index >= end)
break;
xas_lock_irq(&xas);
xas_set(&xas, start);
xas_for_each(&xas, page, end - 1) {
page = list_first_entry_or_null(&pagelist,
struct page, lru);
if (!page || iter.index < page->index) {
if (!page || xas.xa_index < page->index) {
if (!nr_none)
break;
nr_none--;
/* Put holes back where they were */
radix_tree_delete(&mapping->i_pages, iter.index);
xas_store(&xas, NULL);
continue;
}
VM_BUG_ON_PAGE(page->index != iter.index, page);
VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
/* Unfreeze the page. */
list_del(&page->lru);
page_ref_unfreeze(page, 2);
radix_tree_replace_slot(&mapping->i_pages, slot, page);
slot = radix_tree_iter_resume(slot, &iter);
xa_unlock_irq(&mapping->i_pages);
xas_store(&xas, page);
xas_pause(&xas);
xas_unlock_irq(&xas);
putback_lru_page(page);
unlock_page(page);
xa_lock_irq(&mapping->i_pages);
xas_lock_irq(&xas);
}
VM_BUG_ON(nr_none);
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
/* Unfreeze new_page, caller would take care about freeing it */
page_ref_unfreeze(new_page, 1);
@ -1569,8 +1542,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
pgoff_t start, struct page **hpage)
{
struct page *page = NULL;
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, &mapping->i_pages, start);
int present, swap;
int node = NUMA_NO_NODE;
int result = SCAN_SUCCEED;
@ -1579,17 +1551,11 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
swap = 0;
memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
rcu_read_lock();
radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) {
if (iter.index >= start + HPAGE_PMD_NR)
break;
page = radix_tree_deref_slot(slot);
if (radix_tree_deref_retry(page)) {
slot = radix_tree_iter_retry(&iter);
xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) {
if (xas_retry(&xas, page))
continue;
}
if (radix_tree_exception(page)) {
if (xa_is_value(page)) {
if (++swap > khugepaged_max_ptes_swap) {
result = SCAN_EXCEED_SWAP_PTE;
break;
@ -1628,7 +1594,7 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
present++;
if (need_resched()) {
slot = radix_tree_iter_resume(slot, &iter);
xas_pause(&xas);
cond_resched_rcu();
}
}

View File

@ -251,7 +251,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
page = find_get_entry(mapping, index);
if (!radix_tree_exceptional_entry(page)) {
if (!xa_is_value(page)) {
if (page)
put_page(page);
continue;

View File

@ -4728,7 +4728,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
/* shmem/tmpfs may report page out on swap: account for that too. */
if (shmem_mapping(mapping)) {
page = find_get_entry(mapping, pgoff);
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
swp_entry_t swp = radix_to_swp_entry(page);
if (do_memsw_account())
*entry = swp;

View File

@ -21,44 +21,36 @@
#include <uapi/linux/memfd.h>
/*
* We need a tag: a new tag would expand every radix_tree_node by 8 bytes,
* We need a tag: a new tag would expand every xa_node by 8 bytes,
* so reuse a tag which we firmly believe is never set or cleared on tmpfs
* or hugetlbfs because they are memory only filesystems.
*/
#define MEMFD_TAG_PINNED PAGECACHE_TAG_TOWRITE
#define LAST_SCAN 4 /* about 150ms max */
static void memfd_tag_pins(struct address_space *mapping)
static void memfd_tag_pins(struct xa_state *xas)
{
struct radix_tree_iter iter;
void __rcu **slot;
pgoff_t start;
struct page *page;
unsigned int tagged = 0;
lru_add_drain();
start = 0;
rcu_read_lock();
radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) {
page = radix_tree_deref_slot(slot);
if (!page || radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
slot = radix_tree_iter_retry(&iter);
continue;
}
} else if (page_count(page) - page_mapcount(page) > 1) {
xa_lock_irq(&mapping->i_pages);
radix_tree_tag_set(&mapping->i_pages, iter.index,
MEMFD_TAG_PINNED);
xa_unlock_irq(&mapping->i_pages);
}
xas_lock_irq(xas);
xas_for_each(xas, page, ULONG_MAX) {
if (xa_is_value(page))
continue;
if (page_count(page) - page_mapcount(page) > 1)
xas_set_mark(xas, MEMFD_TAG_PINNED);
if (need_resched()) {
slot = radix_tree_iter_resume(slot, &iter);
cond_resched_rcu();
}
if (++tagged % XA_CHECK_SCHED)
continue;
xas_pause(xas);
xas_unlock_irq(xas);
cond_resched();
xas_lock_irq(xas);
}
rcu_read_unlock();
xas_unlock_irq(xas);
}
/*
@ -72,17 +64,17 @@ static void memfd_tag_pins(struct address_space *mapping)
*/
static int memfd_wait_for_pins(struct address_space *mapping)
{
struct radix_tree_iter iter;
void __rcu **slot;
pgoff_t start;
XA_STATE(xas, &mapping->i_pages, 0);
struct page *page;
int error, scan;
memfd_tag_pins(mapping);
memfd_tag_pins(&xas);
error = 0;
for (scan = 0; scan <= LAST_SCAN; scan++) {
if (!radix_tree_tagged(&mapping->i_pages, MEMFD_TAG_PINNED))
unsigned int tagged = 0;
if (!xas_marked(&xas, MEMFD_TAG_PINNED))
break;
if (!scan)
@ -90,45 +82,34 @@ static int memfd_wait_for_pins(struct address_space *mapping)
else if (schedule_timeout_killable((HZ << scan) / 200))
scan = LAST_SCAN;
start = 0;
rcu_read_lock();
radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter,
start, MEMFD_TAG_PINNED) {
page = radix_tree_deref_slot(slot);
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
slot = radix_tree_iter_retry(&iter);
continue;
}
page = NULL;
}
if (page &&
page_count(page) - page_mapcount(page) != 1) {
if (scan < LAST_SCAN)
goto continue_resched;
xas_set(&xas, 0);
xas_lock_irq(&xas);
xas_for_each_marked(&xas, page, ULONG_MAX, MEMFD_TAG_PINNED) {
bool clear = true;
if (xa_is_value(page))
continue;
if (page_count(page) - page_mapcount(page) != 1) {
/*
* On the last scan, we clean up all those tags
* we inserted; but make a note that we still
* found pages pinned.
*/
error = -EBUSY;
if (scan == LAST_SCAN)
error = -EBUSY;
else
clear = false;
}
if (clear)
xas_clear_mark(&xas, MEMFD_TAG_PINNED);
if (++tagged % XA_CHECK_SCHED)
continue;
xa_lock_irq(&mapping->i_pages);
radix_tree_tag_clear(&mapping->i_pages,
iter.index, MEMFD_TAG_PINNED);
xa_unlock_irq(&mapping->i_pages);
continue_resched:
if (need_resched()) {
slot = radix_tree_iter_resume(slot, &iter);
cond_resched_rcu();
}
xas_pause(&xas);
xas_unlock_irq(&xas);
cond_resched();
xas_lock_irq(&xas);
}
rcu_read_unlock();
xas_unlock_irq(&xas);
}
return error;

View File

@ -326,7 +326,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
page = migration_entry_to_page(entry);
/*
* Once radix-tree replacement of page migration started, page_count
* Once page cache replacement of page migration started, page_count
* *must* be zero. And, we don't want to call wait_on_page_locked()
* against a page without get_page().
* So, we use get_page_unless_zero(), here. Even failed, page fault
@ -441,10 +441,10 @@ int migrate_page_move_mapping(struct address_space *mapping,
struct buffer_head *head, enum migrate_mode mode,
int extra_count)
{
XA_STATE(xas, &mapping->i_pages, page_index(page));
struct zone *oldzone, *newzone;
int dirty;
int expected_count = 1 + extra_count;
void **pslot;
/*
* Device public or private pages have an extra refcount as they are
@ -470,21 +470,16 @@ int migrate_page_move_mapping(struct address_space *mapping,
oldzone = page_zone(page);
newzone = page_zone(newpage);
xa_lock_irq(&mapping->i_pages);
pslot = radix_tree_lookup_slot(&mapping->i_pages,
page_index(page));
xas_lock_irq(&xas);
expected_count += hpage_nr_pages(page) + page_has_private(page);
if (page_count(page) != expected_count ||
radix_tree_deref_slot_protected(pslot,
&mapping->i_pages.xa_lock) != page) {
xa_unlock_irq(&mapping->i_pages);
if (page_count(page) != expected_count || xas_load(&xas) != page) {
xas_unlock_irq(&xas);
return -EAGAIN;
}
if (!page_ref_freeze(page, expected_count)) {
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
return -EAGAIN;
}
@ -498,7 +493,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
if (mode == MIGRATE_ASYNC && head &&
!buffer_migrate_lock_buffers(head, mode)) {
page_ref_unfreeze(page, expected_count);
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
return -EAGAIN;
}
@ -526,16 +521,13 @@ int migrate_page_move_mapping(struct address_space *mapping,
SetPageDirty(newpage);
}
radix_tree_replace_slot(&mapping->i_pages, pslot, newpage);
xas_store(&xas, newpage);
if (PageTransHuge(page)) {
int i;
int index = page_index(page);
for (i = 1; i < HPAGE_PMD_NR; i++) {
pslot = radix_tree_lookup_slot(&mapping->i_pages,
index + i);
radix_tree_replace_slot(&mapping->i_pages, pslot,
newpage + i);
xas_next(&xas);
xas_store(&xas, newpage + i);
}
}
@ -546,7 +538,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
*/
page_ref_unfreeze(page, expected_count - hpage_nr_pages(page));
xa_unlock(&mapping->i_pages);
xas_unlock(&xas);
/* Leave irq disabled to prevent preemption while updating stats */
/*
@ -586,22 +578,18 @@ EXPORT_SYMBOL(migrate_page_move_mapping);
int migrate_huge_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page)
{
XA_STATE(xas, &mapping->i_pages, page_index(page));
int expected_count;
void **pslot;
xa_lock_irq(&mapping->i_pages);
pslot = radix_tree_lookup_slot(&mapping->i_pages, page_index(page));
xas_lock_irq(&xas);
expected_count = 2 + page_has_private(page);
if (page_count(page) != expected_count ||
radix_tree_deref_slot_protected(pslot, &mapping->i_pages.xa_lock) != page) {
xa_unlock_irq(&mapping->i_pages);
if (page_count(page) != expected_count || xas_load(&xas) != page) {
xas_unlock_irq(&xas);
return -EAGAIN;
}
if (!page_ref_freeze(page, expected_count)) {
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
return -EAGAIN;
}
@ -610,11 +598,11 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
get_page(newpage);
radix_tree_replace_slot(&mapping->i_pages, pslot, newpage);
xas_store(&xas, newpage);
page_ref_unfreeze(page, expected_count - 1);
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
return MIGRATEPAGE_SUCCESS;
}

View File

@ -66,7 +66,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
* shmem/tmpfs may return swap: account for swapcache
* page too.
*/
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
swp_entry_t swp = radix_to_swp_entry(page);
page = find_get_page(swap_address_space(swp),
swp_offset(swp));

View File

@ -2097,34 +2097,25 @@ void __init page_writeback_init(void)
* dirty pages in the file (thus it is important for this function to be quick
* so that it can tag pages faster than a dirtying process can create them).
*/
/*
* We tag pages in batches of WRITEBACK_TAG_BATCH to reduce the i_pages lock
* latency.
*/
void tag_pages_for_writeback(struct address_space *mapping,
pgoff_t start, pgoff_t end)
{
#define WRITEBACK_TAG_BATCH 4096
unsigned long tagged = 0;
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, &mapping->i_pages, start);
unsigned int tagged = 0;
void *page;
xa_lock_irq(&mapping->i_pages);
radix_tree_for_each_tagged(slot, &mapping->i_pages, &iter, start,
PAGECACHE_TAG_DIRTY) {
if (iter.index > end)
break;
radix_tree_iter_tag_set(&mapping->i_pages, &iter,
PAGECACHE_TAG_TOWRITE);
tagged++;
if ((tagged % WRITEBACK_TAG_BATCH) != 0)
xas_lock_irq(&xas);
xas_for_each_marked(&xas, page, end, PAGECACHE_TAG_DIRTY) {
xas_set_mark(&xas, PAGECACHE_TAG_TOWRITE);
if (++tagged % XA_CHECK_SCHED)
continue;
slot = radix_tree_iter_resume(slot, &iter);
xa_unlock_irq(&mapping->i_pages);
xas_pause(&xas);
xas_unlock_irq(&xas);
cond_resched();
xa_lock_irq(&mapping->i_pages);
xas_lock_irq(&xas);
}
xa_unlock_irq(&mapping->i_pages);
xas_unlock_irq(&xas);
}
EXPORT_SYMBOL(tag_pages_for_writeback);
@ -2170,7 +2161,7 @@ int write_cache_pages(struct address_space *mapping,
pgoff_t end; /* Inclusive */
pgoff_t done_index;
int range_whole = 0;
int tag;
xa_mark_t tag;
pagevec_init(&pvec);
if (wbc->range_cyclic) {
@ -2442,7 +2433,7 @@ void account_page_cleaned(struct page *page, struct address_space *mapping,
/*
* For address_spaces which do not use buffers. Just tag the page as dirty in
* its radix tree.
* the xarray.
*
* This is also used when a single buffer is being dirtied: we want to set the
* page dirty in that case, but not all the buffers. This is a "bottom-up"
@ -2468,7 +2459,7 @@ int __set_page_dirty_nobuffers(struct page *page)
BUG_ON(page_mapping(page) != mapping);
WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
account_page_dirtied(page, mapping);
radix_tree_tag_set(&mapping->i_pages, page_index(page),
__xa_set_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_DIRTY);
xa_unlock_irqrestore(&mapping->i_pages, flags);
unlock_page_memcg(page);
@ -2631,13 +2622,13 @@ EXPORT_SYMBOL(__cancel_dirty_page);
* Returns true if the page was previously dirty.
*
* This is for preparing to put the page under writeout. We leave the page
* tagged as dirty in the radix tree so that a concurrent write-for-sync
* tagged as dirty in the xarray so that a concurrent write-for-sync
* can discover it via a PAGECACHE_TAG_DIRTY walk. The ->writepage
* implementation will run either set_page_writeback() or set_page_dirty(),
* at which stage we bring the page's dirty flag and radix-tree dirty tag
* at which stage we bring the page's dirty flag and xarray dirty tag
* back into sync.
*
* This incoherency between the page's dirty flag and radix-tree tag is
* This incoherency between the page's dirty flag and xarray tag is
* unfortunate, but it only exists while the page is locked.
*/
int clear_page_dirty_for_io(struct page *page)
@ -2718,7 +2709,7 @@ int test_clear_page_writeback(struct page *page)
xa_lock_irqsave(&mapping->i_pages, flags);
ret = TestClearPageWriteback(page);
if (ret) {
radix_tree_tag_clear(&mapping->i_pages, page_index(page),
__xa_clear_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_WRITEBACK);
if (bdi_cap_account_writeback(bdi)) {
struct bdi_writeback *wb = inode_to_wb(inode);
@ -2758,11 +2749,13 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
lock_page_memcg(page);
if (mapping && mapping_use_writeback_tags(mapping)) {
XA_STATE(xas, &mapping->i_pages, page_index(page));
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = inode_to_bdi(inode);
unsigned long flags;
xa_lock_irqsave(&mapping->i_pages, flags);
xas_lock_irqsave(&xas, flags);
xas_load(&xas);
ret = TestSetPageWriteback(page);
if (!ret) {
bool on_wblist;
@ -2770,8 +2763,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
on_wblist = mapping_tagged(mapping,
PAGECACHE_TAG_WRITEBACK);
radix_tree_tag_set(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_WRITEBACK);
xas_set_mark(&xas, PAGECACHE_TAG_WRITEBACK);
if (bdi_cap_account_writeback(bdi))
inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
@ -2784,12 +2776,10 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
sb_mark_inode_writeback(mapping->host);
}
if (!PageDirty(page))
radix_tree_tag_clear(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_DIRTY);
xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY);
if (!keep_write)
radix_tree_tag_clear(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_TOWRITE);
xa_unlock_irqrestore(&mapping->i_pages, flags);
xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE);
xas_unlock_irqrestore(&xas, flags);
} else {
ret = TestSetPageWriteback(page);
}
@ -2803,16 +2793,6 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
}
EXPORT_SYMBOL(__test_set_page_writeback);
/*
* Return true if any of the pages in the mapping are marked with the
* passed tag.
*/
int mapping_tagged(struct address_space *mapping, int tag)
{
return radix_tree_tagged(&mapping->i_pages, tag);
}
EXPORT_SYMBOL(mapping_tagged);
/**
* wait_for_stable_page() - wait for writeback to finish, if necessary.
* @page: The page to wait on.

View File

@ -176,10 +176,8 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
if (page_offset > end_index)
break;
rcu_read_lock();
page = radix_tree_lookup(&mapping->i_pages, page_offset);
rcu_read_unlock();
if (page && !radix_tree_exceptional_entry(page)) {
page = xa_load(&mapping->i_pages, page_offset);
if (page && !xa_is_value(page)) {
/*
* Page already present? Kick off the current batch of
* contiguous pages before continuing with the next
@ -336,7 +334,7 @@ static pgoff_t count_history_pages(struct address_space *mapping,
pgoff_t head;
rcu_read_lock();
head = page_cache_prev_hole(mapping, offset - 1, max);
head = page_cache_prev_miss(mapping, offset - 1, max);
rcu_read_unlock();
return offset - 1 - head;
@ -425,7 +423,7 @@ ondemand_readahead(struct address_space *mapping,
pgoff_t start;
rcu_read_lock();
start = page_cache_next_hole(mapping, offset + 1, max_pages);
start = page_cache_next_miss(mapping, offset + 1, max_pages);
rcu_read_unlock();
if (!start || start - offset > max_pages)

View File

@ -322,24 +322,20 @@ void shmem_uncharge(struct inode *inode, long pages)
}
/*
* Replace item expected in radix tree by a new item, while holding tree lock.
* Replace item expected in xarray by a new item, while holding xa_lock.
*/
static int shmem_radix_tree_replace(struct address_space *mapping,
static int shmem_replace_entry(struct address_space *mapping,
pgoff_t index, void *expected, void *replacement)
{
struct radix_tree_node *node;
void __rcu **pslot;
XA_STATE(xas, &mapping->i_pages, index);
void *item;
VM_BUG_ON(!expected);
VM_BUG_ON(!replacement);
item = __radix_tree_lookup(&mapping->i_pages, index, &node, &pslot);
if (!item)
return -ENOENT;
item = xas_load(&xas);
if (item != expected)
return -ENOENT;
__radix_tree_replace(&mapping->i_pages, node, pslot,
replacement, NULL);
xas_store(&xas, replacement);
return 0;
}
@ -353,12 +349,7 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
static bool shmem_confirm_swap(struct address_space *mapping,
pgoff_t index, swp_entry_t swap)
{
void *item;
rcu_read_lock();
item = radix_tree_lookup(&mapping->i_pages, index);
rcu_read_unlock();
return item == swp_to_radix_entry(swap);
return xa_load(&mapping->i_pages, index) == swp_to_radix_entry(swap);
}
/*
@ -586,9 +577,11 @@ static inline bool is_huge_enabled(struct shmem_sb_info *sbinfo)
*/
static int shmem_add_to_page_cache(struct page *page,
struct address_space *mapping,
pgoff_t index, void *expected)
pgoff_t index, void *expected, gfp_t gfp)
{
int error, nr = hpage_nr_pages(page);
XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page));
unsigned long i = 0;
unsigned long nr = 1UL << compound_order(page);
VM_BUG_ON_PAGE(PageTail(page), page);
VM_BUG_ON_PAGE(index != round_down(index, nr), page);
@ -600,47 +593,39 @@ static int shmem_add_to_page_cache(struct page *page,
page->mapping = mapping;
page->index = index;
xa_lock_irq(&mapping->i_pages);
if (PageTransHuge(page)) {
void __rcu **results;
pgoff_t idx;
int i;
error = 0;
if (radix_tree_gang_lookup_slot(&mapping->i_pages,
&results, &idx, index, 1) &&
idx < index + HPAGE_PMD_NR) {
error = -EEXIST;
do {
void *entry;
xas_lock_irq(&xas);
entry = xas_find_conflict(&xas);
if (entry != expected)
xas_set_err(&xas, -EEXIST);
xas_create_range(&xas);
if (xas_error(&xas))
goto unlock;
next:
xas_store(&xas, page + i);
if (++i < nr) {
xas_next(&xas);
goto next;
}
if (!error) {
for (i = 0; i < HPAGE_PMD_NR; i++) {
error = radix_tree_insert(&mapping->i_pages,
index + i, page + i);
VM_BUG_ON(error);
}
if (PageTransHuge(page)) {
count_vm_event(THP_FILE_ALLOC);
}
} else if (!expected) {
error = radix_tree_insert(&mapping->i_pages, index, page);
} else {
error = shmem_radix_tree_replace(mapping, index, expected,
page);
}
if (!error) {
mapping->nrpages += nr;
if (PageTransHuge(page))
__inc_node_page_state(page, NR_SHMEM_THPS);
}
mapping->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
__mod_node_page_state(page_pgdat(page), NR_SHMEM, nr);
xa_unlock_irq(&mapping->i_pages);
} else {
unlock:
xas_unlock_irq(&xas);
} while (xas_nomem(&xas, gfp));
if (xas_error(&xas)) {
page->mapping = NULL;
xa_unlock_irq(&mapping->i_pages);
page_ref_sub(page, nr);
return xas_error(&xas);
}
return error;
return 0;
}
/*
@ -654,7 +639,7 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap)
VM_BUG_ON_PAGE(PageCompound(page), page);
xa_lock_irq(&mapping->i_pages);
error = shmem_radix_tree_replace(mapping, page->index, page, radswap);
error = shmem_replace_entry(mapping, page->index, page, radswap);
page->mapping = NULL;
mapping->nrpages--;
__dec_node_page_state(page, NR_FILE_PAGES);
@ -665,7 +650,7 @@ static void shmem_delete_from_page_cache(struct page *page, void *radswap)
}
/*
* Remove swap entry from radix tree, free the swap and its page cache.
* Remove swap entry from page cache, free the swap and its page cache.
*/
static int shmem_free_swap(struct address_space *mapping,
pgoff_t index, void *radswap)
@ -673,7 +658,7 @@ static int shmem_free_swap(struct address_space *mapping,
void *old;
xa_lock_irq(&mapping->i_pages);
old = radix_tree_delete_item(&mapping->i_pages, index, radswap);
old = __xa_cmpxchg(&mapping->i_pages, index, radswap, NULL, 0);
xa_unlock_irq(&mapping->i_pages);
if (old != radswap)
return -ENOENT;
@ -691,29 +676,19 @@ static int shmem_free_swap(struct address_space *mapping,
unsigned long shmem_partial_swap_usage(struct address_space *mapping,
pgoff_t start, pgoff_t end)
{
struct radix_tree_iter iter;
void __rcu **slot;
XA_STATE(xas, &mapping->i_pages, start);
struct page *page;
unsigned long swapped = 0;
rcu_read_lock();
radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) {
if (iter.index >= end)
break;
page = radix_tree_deref_slot(slot);
if (radix_tree_deref_retry(page)) {
slot = radix_tree_iter_retry(&iter);
xas_for_each(&xas, page, end - 1) {
if (xas_retry(&xas, page))
continue;
}
if (radix_tree_exceptional_entry(page))
if (xa_is_value(page))
swapped++;
if (need_resched()) {
slot = radix_tree_iter_resume(slot, &iter);
xas_pause(&xas);
cond_resched_rcu();
}
}
@ -788,7 +763,7 @@ void shmem_unlock_mapping(struct address_space *mapping)
}
/*
* Remove range of pages and swap entries from radix tree, and free them.
* Remove range of pages and swap entries from page cache, and free them.
* If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
*/
static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
@ -824,7 +799,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (index >= end)
break;
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
if (unfalloc)
continue;
nr_swaps_freed += !shmem_free_swap(mapping,
@ -921,7 +896,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
if (index >= end)
break;
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
if (unfalloc)
continue;
if (shmem_free_swap(mapping, index, page)) {
@ -1110,34 +1085,27 @@ static void shmem_evict_inode(struct inode *inode)
clear_inode(inode);
}
static unsigned long find_swap_entry(struct radix_tree_root *root, void *item)
static unsigned long find_swap_entry(struct xarray *xa, void *item)
{
struct radix_tree_iter iter;
void __rcu **slot;
unsigned long found = -1;
XA_STATE(xas, xa, 0);
unsigned int checked = 0;
void *entry;
rcu_read_lock();
radix_tree_for_each_slot(slot, root, &iter, 0) {
void *entry = radix_tree_deref_slot(slot);
if (radix_tree_deref_retry(entry)) {
slot = radix_tree_iter_retry(&iter);
xas_for_each(&xas, entry, ULONG_MAX) {
if (xas_retry(&xas, entry))
continue;
}
if (entry == item) {
found = iter.index;
if (entry == item)
break;
}
checked++;
if ((checked % 4096) != 0)
if ((checked % XA_CHECK_SCHED) != 0)
continue;
slot = radix_tree_iter_resume(slot, &iter);
xas_pause(&xas);
cond_resched_rcu();
}
rcu_read_unlock();
return found;
return entry ? xas.xa_index : -1;
}
/*
@ -1175,10 +1143,10 @@ static int shmem_unuse_inode(struct shmem_inode_info *info,
* We needed to drop mutex to make that restrictive page
* allocation, but the inode might have been freed while we
* dropped it: although a racing shmem_evict_inode() cannot
* complete without emptying the radix_tree, our page lock
* complete without emptying the page cache, our page lock
* on this swapcache page is not enough to prevent that -
* free_swap_and_cache() of our swap entry will only
* trylock_page(), removing swap from radix_tree whatever.
* trylock_page(), removing swap from page cache whatever.
*
* We must not proceed to shmem_add_to_page_cache() if the
* inode has been freed, but of course we cannot rely on
@ -1200,7 +1168,7 @@ static int shmem_unuse_inode(struct shmem_inode_info *info,
*/
if (!error)
error = shmem_add_to_page_cache(*pagep, mapping, index,
radswap);
radswap, gfp);
if (error != -ENOMEM) {
/*
* Truncation and eviction use free_swap_and_cache(), which
@ -1244,7 +1212,7 @@ int shmem_unuse(swp_entry_t swap, struct page *page)
&memcg, false);
if (error)
goto out;
/* No radix_tree_preload: swap entry keeps a place for page in tree */
/* No memory allocation: swap entry occupies the slot for the page */
error = -EAGAIN;
mutex_lock(&shmem_swaplist_mutex);
@ -1453,23 +1421,17 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
struct shmem_inode_info *info, pgoff_t index)
{
struct vm_area_struct pvma;
struct inode *inode = &info->vfs_inode;
struct address_space *mapping = inode->i_mapping;
pgoff_t idx, hindex;
void __rcu **results;
struct address_space *mapping = info->vfs_inode.i_mapping;
pgoff_t hindex;
struct page *page;
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE))
return NULL;
hindex = round_down(index, HPAGE_PMD_NR);
rcu_read_lock();
if (radix_tree_gang_lookup_slot(&mapping->i_pages, &results, &idx,
hindex, 1) && idx < hindex + HPAGE_PMD_NR) {
rcu_read_unlock();
if (xa_find(&mapping->i_pages, &hindex, hindex + HPAGE_PMD_NR - 1,
XA_PRESENT))
return NULL;
}
rcu_read_unlock();
shmem_pseudo_vma_init(&pvma, info, hindex);
page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
@ -1578,8 +1540,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
* a nice clean interface for us to replace oldpage by newpage there.
*/
xa_lock_irq(&swap_mapping->i_pages);
error = shmem_radix_tree_replace(swap_mapping, swap_index, oldpage,
newpage);
error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage);
if (!error) {
__inc_node_page_state(newpage, NR_FILE_PAGES);
__dec_node_page_state(oldpage, NR_FILE_PAGES);
@ -1643,7 +1604,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
repeat:
swap.val = 0;
page = find_lock_entry(mapping, index);
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
swap = radix_to_swp_entry(page);
page = NULL;
}
@ -1718,7 +1679,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
false);
if (!error) {
error = shmem_add_to_page_cache(page, mapping, index,
swp_to_radix_entry(swap));
swp_to_radix_entry(swap), gfp);
/*
* We already confirmed swap under page lock, and make
* no memory allocation here, so usually no possibility
@ -1824,13 +1785,8 @@ alloc_nohuge: page = shmem_alloc_and_acct_page(gfp, inode,
PageTransHuge(page));
if (error)
goto unacct;
error = radix_tree_maybe_preload_order(gfp & GFP_RECLAIM_MASK,
compound_order(page));
if (!error) {
error = shmem_add_to_page_cache(page, mapping, hindex,
NULL);
radix_tree_preload_end();
}
error = shmem_add_to_page_cache(page, mapping, hindex,
NULL, gfp & GFP_RECLAIM_MASK);
if (error) {
mem_cgroup_cancel_charge(page, memcg,
PageTransHuge(page));
@ -1931,7 +1887,7 @@ alloc_nohuge: page = shmem_alloc_and_acct_page(gfp, inode,
spin_unlock_irq(&info->lock);
goto repeat;
}
if (error == -EEXIST) /* from above or from radix_tree_insert */
if (error == -EEXIST)
goto repeat;
return error;
}
@ -2299,11 +2255,8 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
if (ret)
goto out_release;
ret = radix_tree_maybe_preload(gfp & GFP_RECLAIM_MASK);
if (!ret) {
ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL);
radix_tree_preload_end();
}
ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
gfp & GFP_RECLAIM_MASK);
if (ret)
goto out_release_uncharge;
@ -2548,7 +2501,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
}
/*
* llseek SEEK_DATA or SEEK_HOLE through the radix_tree.
* llseek SEEK_DATA or SEEK_HOLE through the page cache.
*/
static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
pgoff_t index, pgoff_t end, int whence)
@ -2578,7 +2531,7 @@ static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
index = indices[i];
}
page = pvec.pages[i];
if (page && !radix_tree_exceptional_entry(page)) {
if (page && !xa_is_value(page)) {
if (!PageUptodate(page))
page = NULL;
}

View File

@ -964,7 +964,7 @@ void pagevec_remove_exceptionals(struct pagevec *pvec)
for (i = 0, j = 0; i < pagevec_count(pvec); i++) {
struct page *page = pvec->pages[i];
if (!radix_tree_exceptional_entry(page))
if (!xa_is_value(page))
pvec->pages[j++] = page;
}
pvec->nr = j;
@ -1001,7 +1001,7 @@ EXPORT_SYMBOL(pagevec_lookup_range);
unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
int tag)
xa_mark_t tag)
{
pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
PAGEVEC_SIZE, pvec->pages);
@ -1011,7 +1011,7 @@ EXPORT_SYMBOL(pagevec_lookup_range_tag);
unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
struct address_space *mapping, pgoff_t *index, pgoff_t end,
int tag, unsigned max_pages)
xa_mark_t tag, unsigned max_pages)
{
pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
min_t(unsigned int, max_pages, PAGEVEC_SIZE), pvec->pages);

View File

@ -107,14 +107,15 @@ void show_swap_cache_info(void)
}
/*
* __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
* add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
* but sets SwapCache flag and private instead of mapping and index.
*/
int __add_to_swap_cache(struct page *page, swp_entry_t entry)
int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
{
int error, i, nr = hpage_nr_pages(page);
struct address_space *address_space;
struct address_space *address_space = swap_address_space(entry);
pgoff_t idx = swp_offset(entry);
XA_STATE_ORDER(xas, &address_space->i_pages, idx, compound_order(page));
unsigned long i, nr = 1UL << compound_order(page);
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
@ -123,73 +124,52 @@ int __add_to_swap_cache(struct page *page, swp_entry_t entry)
page_ref_add(page, nr);
SetPageSwapCache(page);
address_space = swap_address_space(entry);
xa_lock_irq(&address_space->i_pages);
for (i = 0; i < nr; i++) {
set_page_private(page + i, entry.val + i);
error = radix_tree_insert(&address_space->i_pages,
idx + i, page + i);
if (unlikely(error))
break;
}
if (likely(!error)) {
do {
xas_lock_irq(&xas);
xas_create_range(&xas);
if (xas_error(&xas))
goto unlock;
for (i = 0; i < nr; i++) {
VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
set_page_private(page + i, entry.val + i);
xas_store(&xas, page + i);
xas_next(&xas);
}
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
ADD_CACHE_INFO(add_total, nr);
} else {
/*
* Only the context which have set SWAP_HAS_CACHE flag
* would call add_to_swap_cache().
* So add_to_swap_cache() doesn't returns -EEXIST.
*/
VM_BUG_ON(error == -EEXIST);
set_page_private(page + i, 0UL);
while (i--) {
radix_tree_delete(&address_space->i_pages, idx + i);
set_page_private(page + i, 0UL);
}
ClearPageSwapCache(page);
page_ref_sub(page, nr);
}
xa_unlock_irq(&address_space->i_pages);
unlock:
xas_unlock_irq(&xas);
} while (xas_nomem(&xas, gfp));
return error;
}
if (!xas_error(&xas))
return 0;
int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask)
{
int error;
error = radix_tree_maybe_preload_order(gfp_mask, compound_order(page));
if (!error) {
error = __add_to_swap_cache(page, entry);
radix_tree_preload_end();
}
return error;
ClearPageSwapCache(page);
page_ref_sub(page, nr);
return xas_error(&xas);
}
/*
* This must be called only on pages that have
* been verified to be in the swap cache.
*/
void __delete_from_swap_cache(struct page *page)
void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
{
struct address_space *address_space;
struct address_space *address_space = swap_address_space(entry);
int i, nr = hpage_nr_pages(page);
swp_entry_t entry;
pgoff_t idx;
pgoff_t idx = swp_offset(entry);
XA_STATE(xas, &address_space->i_pages, idx);
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
VM_BUG_ON_PAGE(PageWriteback(page), page);
entry.val = page_private(page);
address_space = swap_address_space(entry);
idx = swp_offset(entry);
for (i = 0; i < nr; i++) {
radix_tree_delete(&address_space->i_pages, idx + i);
void *entry = xas_store(&xas, NULL);
VM_BUG_ON_PAGE(entry != page + i, entry);
set_page_private(page + i, 0);
xas_next(&xas);
}
ClearPageSwapCache(page);
address_space->nrpages -= nr;
@ -217,7 +197,7 @@ int add_to_swap(struct page *page)
return 0;
/*
* Radix-tree node allocations from PF_MEMALLOC contexts could
* XArray node allocations from PF_MEMALLOC contexts could
* completely exhaust the page allocator. __GFP_NOMEMALLOC
* stops emergency reserves from being allocated.
*
@ -229,7 +209,6 @@ int add_to_swap(struct page *page)
*/
err = add_to_swap_cache(page, entry,
__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
/* -ENOMEM radix-tree allocation failure */
if (err)
/*
* add_to_swap_cache() doesn't return -EEXIST, so we can safely
@ -263,14 +242,11 @@ int add_to_swap(struct page *page)
*/
void delete_from_swap_cache(struct page *page)
{
swp_entry_t entry;
struct address_space *address_space;
swp_entry_t entry = { .val = page_private(page) };
struct address_space *address_space = swap_address_space(entry);
entry.val = page_private(page);
address_space = swap_address_space(entry);
xa_lock_irq(&address_space->i_pages);
__delete_from_swap_cache(page);
__delete_from_swap_cache(page, entry);
xa_unlock_irq(&address_space->i_pages);
put_swap_page(page, entry);
@ -413,19 +389,11 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
break; /* Out of memory */
}
/*
* call radix_tree_preload() while we can wait.
*/
err = radix_tree_maybe_preload(gfp_mask & GFP_KERNEL);
if (err)
break;
/*
* Swap entry may have been freed since our caller observed it.
*/
err = swapcache_prepare(entry);
if (err == -EEXIST) {
radix_tree_preload_end();
/*
* We might race against get_swap_page() and stumble
* across a SWAP_HAS_CACHE swap_map entry whose page
@ -433,27 +401,20 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
*/
cond_resched();
continue;
}
if (err) { /* swp entry is obsolete ? */
radix_tree_preload_end();
} else if (err) /* swp entry is obsolete ? */
break;
}
/* May fail (-ENOMEM) if radix-tree node allocation failed. */
/* May fail (-ENOMEM) if XArray node allocation failed. */
__SetPageLocked(new_page);
__SetPageSwapBacked(new_page);
err = __add_to_swap_cache(new_page, entry);
err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
if (likely(!err)) {
radix_tree_preload_end();
/*
* Initiate read into locked page and return.
*/
/* Initiate read into locked page */
SetPageWorkingset(new_page);
lru_cache_add_anon(new_page);
*new_page_allocated = true;
return new_page;
}
radix_tree_preload_end();
__ClearPageLocked(new_page);
/*
* add_to_swap_cache() doesn't return -EEXIST, so we can safely
@ -626,7 +587,7 @@ int init_swap_address_space(unsigned int type, unsigned long nr_pages)
return -ENOMEM;
for (i = 0; i < nr; i++) {
space = spaces + i;
INIT_RADIX_TREE(&space->i_pages, GFP_ATOMIC|__GFP_NOWARN);
xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ);
atomic_set(&space->i_mmap_writable, 0);
space->a_ops = &swap_aops;
/* swap cache doesn't use writeback related tags */

View File

@ -33,15 +33,12 @@
static inline void __clear_shadow_entry(struct address_space *mapping,
pgoff_t index, void *entry)
{
struct radix_tree_node *node;
void **slot;
XA_STATE(xas, &mapping->i_pages, index);
if (!__radix_tree_lookup(&mapping->i_pages, index, &node, &slot))
xas_set_update(&xas, workingset_update_node);
if (xas_load(&xas) != entry)
return;
if (*slot != entry)
return;
__radix_tree_replace(&mapping->i_pages, node, slot, NULL,
workingset_update_node);
xas_store(&xas, NULL);
mapping->nrexceptional--;
}
@ -70,7 +67,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
return;
for (j = 0; j < pagevec_count(pvec); j++)
if (radix_tree_exceptional_entry(pvec->pages[j]))
if (xa_is_value(pvec->pages[j]))
break;
if (j == pagevec_count(pvec))
@ -85,7 +82,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
struct page *page = pvec->pages[i];
pgoff_t index = indices[i];
if (!radix_tree_exceptional_entry(page)) {
if (!xa_is_value(page)) {
pvec->pages[j++] = page;
continue;
}
@ -347,7 +344,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
if (index >= end)
break;
if (radix_tree_exceptional_entry(page))
if (xa_is_value(page))
continue;
if (!trylock_page(page))
@ -442,7 +439,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
break;
}
if (radix_tree_exceptional_entry(page))
if (xa_is_value(page))
continue;
lock_page(page);
@ -561,7 +558,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
if (index > end)
break;
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
invalidate_exceptional_entry(mapping, index,
page);
continue;
@ -692,7 +689,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
if (index > end)
break;
if (radix_tree_exceptional_entry(page)) {
if (xa_is_value(page)) {
if (!invalidate_exceptional_entry2(mapping,
index, page))
ret = -EBUSY;
@ -738,10 +735,10 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
index++;
}
/*
* For DAX we invalidate page tables after invalidating radix tree. We
* For DAX we invalidate page tables after invalidating page cache. We
* could invalidate page tables while invalidating each entry however
* that would be expensive. And doing range unmapping before doesn't
* work as we have no cheap way to find whether radix tree entry didn't
* work as we have no cheap way to find whether page cache entry didn't
* get remapped later.
*/
if (dax_mapping(mapping)) {

View File

@ -751,12 +751,12 @@ static inline int is_page_cache_freeable(struct page *page)
{
/*
* A freeable page cache page is referenced only by the caller
* that isolated the page, the page cache radix tree and
* optional buffer heads at page->private.
* that isolated the page, the page cache and optional buffer
* heads at page->private.
*/
int radix_pins = PageTransHuge(page) && PageSwapCache(page) ?
int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
HPAGE_PMD_NR : 1;
return page_count(page) - page_has_private(page) == 1 + radix_pins;
return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
}
static int may_write_to_inode(struct inode *inode, struct scan_control *sc)
@ -932,7 +932,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) };
mem_cgroup_swapout(page, swap);
__delete_from_swap_cache(page);
__delete_from_swap_cache(page, swap);
xa_unlock_irqrestore(&mapping->i_pages, flags);
put_swap_page(page, swap);
} else {

View File

@ -160,20 +160,20 @@
* and activations is maintained (node->inactive_age).
*
* On eviction, a snapshot of this counter (along with some bits to
* identify the node) is stored in the now empty page cache radix tree
* identify the node) is stored in the now empty page cache
* slot of the evicted page. This is called a shadow entry.
*
* On cache misses for which there are shadow entries, an eligible
* refault distance will immediately activate the refaulting page.
*/
#define EVICTION_SHIFT (RADIX_TREE_EXCEPTIONAL_ENTRY + \
#define EVICTION_SHIFT ((BITS_PER_LONG - BITS_PER_XA_VALUE) + \
1 + NODES_SHIFT + MEM_CGROUP_ID_SHIFT)
#define EVICTION_MASK (~0UL >> EVICTION_SHIFT)
/*
* Eviction timestamps need to be able to cover the full range of
* actionable refaults. However, bits are tight in the radix tree
* actionable refaults. However, bits are tight in the xarray
* entry, and after storing the identifier for the lruvec there might
* not be enough left to represent every single actionable refault. In
* that case, we have to sacrifice granularity for distance, and group
@ -185,22 +185,21 @@ static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction,
bool workingset)
{
eviction >>= bucket_order;
eviction &= EVICTION_MASK;
eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
eviction = (eviction << 1) | workingset;
eviction = (eviction << RADIX_TREE_EXCEPTIONAL_SHIFT);
return (void *)(eviction | RADIX_TREE_EXCEPTIONAL_ENTRY);
return xa_mk_value(eviction);
}
static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
unsigned long *evictionp, bool *workingsetp)
{
unsigned long entry = (unsigned long)shadow;
unsigned long entry = xa_to_value(shadow);
int memcgid, nid;
bool workingset;
entry >>= RADIX_TREE_EXCEPTIONAL_SHIFT;
workingset = entry & 1;
entry >>= 1;
nid = entry & ((1UL << NODES_SHIFT) - 1);
@ -367,7 +366,7 @@ void workingset_activation(struct page *page)
static struct list_lru shadow_nodes;
void workingset_update_node(struct radix_tree_node *node)
void workingset_update_node(struct xa_node *node)
{
/*
* Track non-empty nodes that contain only shadow entries;
@ -379,7 +378,7 @@ void workingset_update_node(struct radix_tree_node *node)
*/
VM_WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */
if (node->count && node->count == node->exceptional) {
if (node->count && node->count == node->nr_values) {
if (list_empty(&node->private_list)) {
list_lru_add(&shadow_nodes, &node->private_list);
__inc_lruvec_page_state(virt_to_page(node),
@ -404,7 +403,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
nodes = list_lru_shrink_count(&shadow_nodes, sc);
/*
* Approximate a reasonable limit for the radix tree nodes
* Approximate a reasonable limit for the nodes
* containing shadow entries. We don't need to keep more
* shadow entries than possible pages on the active list,
* since refault distances bigger than that are dismissed.
@ -419,11 +418,11 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
* worst-case density of 1/8th. Below that, not all eligible
* refaults can be detected anymore.
*
* On 64-bit with 7 radix_tree_nodes per page and 64 slots
* On 64-bit with 7 xa_nodes per page and 64 slots
* each, this will reclaim shadow entries when they consume
* ~1.8% of available memory:
*
* PAGE_SIZE / radix_tree_nodes / node_entries * 8 / PAGE_SIZE
* PAGE_SIZE / xa_nodes / node_entries * 8 / PAGE_SIZE
*/
#ifdef CONFIG_MEMCG
if (sc->memcg) {
@ -438,7 +437,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
#endif
pages = node_present_pages(sc->nid);
max_nodes = pages >> (RADIX_TREE_MAP_SHIFT - 3);
max_nodes = pages >> (XA_CHUNK_SHIFT - 3);
if (!nodes)
return SHRINK_EMPTY;
@ -451,11 +450,11 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
static enum lru_status shadow_lru_isolate(struct list_head *item,
struct list_lru_one *lru,
spinlock_t *lru_lock,
void *arg)
void *arg) __must_hold(lru_lock)
{
struct xa_node *node = container_of(item, struct xa_node, private_list);
XA_STATE(xas, node->array, 0);
struct address_space *mapping;
struct radix_tree_node *node;
unsigned int i;
int ret;
/*
@ -463,15 +462,14 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
* the shadow node LRU under the i_pages lock and the
* lru_lock. Because the page cache tree is emptied before
* the inode can be destroyed, holding the lru_lock pins any
* address_space that has radix tree nodes on the LRU.
* address_space that has nodes on the LRU.
*
* We can then safely transition to the i_pages lock to
* pin only the address_space of the particular node we want
* to reclaim, take the node off-LRU, and drop the lru_lock.
*/
node = container_of(item, struct radix_tree_node, private_list);
mapping = container_of(node->root, struct address_space, i_pages);
mapping = container_of(node->array, struct address_space, i_pages);
/* Coming from the list, invert the lock order */
if (!xa_trylock(&mapping->i_pages)) {
@ -490,29 +488,21 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
* no pages, so we expect to be able to remove them all and
* delete and free the empty node afterwards.
*/
if (WARN_ON_ONCE(!node->exceptional))
if (WARN_ON_ONCE(!node->nr_values))
goto out_invalid;
if (WARN_ON_ONCE(node->count != node->exceptional))
goto out_invalid;
for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
if (node->slots[i]) {
if (WARN_ON_ONCE(!radix_tree_exceptional_entry(node->slots[i])))
goto out_invalid;
if (WARN_ON_ONCE(!node->exceptional))
goto out_invalid;
if (WARN_ON_ONCE(!mapping->nrexceptional))
goto out_invalid;
node->slots[i] = NULL;
node->exceptional--;
node->count--;
mapping->nrexceptional--;
}
}
if (WARN_ON_ONCE(node->exceptional))
if (WARN_ON_ONCE(node->count != node->nr_values))
goto out_invalid;
mapping->nrexceptional -= node->nr_values;
xas.xa_node = xa_parent_locked(&mapping->i_pages, node);
xas.xa_offset = node->offset;
xas.xa_shift = node->shift + XA_CHUNK_SHIFT;
xas_set_update(&xas, workingset_update_node);
/*
* We could store a shadow entry here which was the minimum of the
* shadow entries we were tracking ...
*/
xas_store(&xas, NULL);
__inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
__radix_tree_delete_node(&mapping->i_pages, node,
workingset_lookup_update(mapping));
out_invalid:
xa_unlock_irq(&mapping->i_pages);

View File

@ -27,5 +27,6 @@
#include <asm-generic/bitops/hweight.h>
#include <asm-generic/bitops/atomic.h>
#include <asm-generic/bitops/non-atomic.h>
#endif /* __TOOLS_ASM_GENERIC_BITOPS_H */

View File

@ -15,13 +15,4 @@ static inline void clear_bit(int nr, unsigned long *addr)
addr[nr / __BITS_PER_LONG] &= ~(1UL << (nr % __BITS_PER_LONG));
}
static __always_inline int test_bit(unsigned int nr, const unsigned long *addr)
{
return ((1UL << (nr % __BITS_PER_LONG)) &
(((unsigned long *)addr)[nr / __BITS_PER_LONG])) != 0;
}
#define __set_bit(nr, addr) set_bit(nr, addr)
#define __clear_bit(nr, addr) clear_bit(nr, addr)
#endif /* _TOOLS_LINUX_ASM_GENERIC_BITOPS_ATOMIC_H_ */

View File

@ -0,0 +1,109 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
#include <asm/types.h>
/**
* __set_bit - Set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
*
* Unlike set_bit(), this function is non-atomic and may be reordered.
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
static inline void __set_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
*p |= mask;
}
static inline void __clear_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
*p &= ~mask;
}
/**
* __change_bit - Toggle a bit in memory
* @nr: the bit to change
* @addr: the address to start counting from
*
* Unlike change_bit(), this function is non-atomic and may be reordered.
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
static inline void __change_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
*p ^= mask;
}
/**
* __test_and_set_bit - Set a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
*
* This operation is non-atomic and can be reordered.
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
*p = old | mask;
return (old & mask) != 0;
}
/**
* __test_and_clear_bit - Clear a bit and return its old value
* @nr: Bit to clear
* @addr: Address to count from
*
* This operation is non-atomic and can be reordered.
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
*p = old & ~mask;
return (old & mask) != 0;
}
/* WARNING: non atomic and it can be reordered! */
static inline int __test_and_change_bit(int nr,
volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
unsigned long old = *p;
*p = old ^ mask;
return (old & mask) != 0;
}
/**
* test_bit - Determine whether a bit is set
* @nr: bit number to test
* @addr: Address to start counting from
*/
static inline int test_bit(int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
#endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */

View File

@ -15,6 +15,7 @@ void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, int bits);
int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1,
const unsigned long *bitmap2, unsigned int bits);
void bitmap_clear(unsigned long *map, unsigned int start, int len);
#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))

View File

@ -70,6 +70,7 @@
#define BUG_ON(cond) assert(!(cond))
#endif
#endif
#define BUG() BUG_ON(1)
#if __BYTE_ORDER == __BIG_ENDIAN
#define cpu_to_le16 bswap_16

View File

@ -8,8 +8,14 @@
#define spinlock_t pthread_mutex_t
#define DEFINE_SPINLOCK(x) pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER
#define __SPIN_LOCK_UNLOCKED(x) (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER
#define spin_lock_init(x) pthread_mutex_init(x, NULL)
#define spin_lock_init(x) pthread_mutex_init(x, NULL)
#define spin_lock(x) pthread_mutex_lock(x)
#define spin_unlock(x) pthread_mutex_unlock(x)
#define spin_lock_bh(x) pthread_mutex_lock(x)
#define spin_unlock_bh(x) pthread_mutex_unlock(x)
#define spin_lock_irq(x) pthread_mutex_lock(x)
#define spin_unlock_irq(x) pthread_mutex_unlock(x)
#define spin_lock_irqsave(x, f) (void)f, pthread_mutex_lock(x)
#define spin_unlock_irqrestore(x, f) (void)f, pthread_mutex_unlock(x)
@ -31,4 +37,6 @@ static inline bool arch_spin_is_locked(arch_spinlock_t *mutex)
return true;
}
#include <linux/lockdep.h>
#endif

View File

@ -4,3 +4,4 @@ idr-test
main
multiorder
radix-tree.c
xarray

View File

@ -4,8 +4,8 @@ CFLAGS += -I. -I../../include -g -Og -Wall -D_LGPL_SOURCE -fsanitize=address \
-fsanitize=undefined
LDFLAGS += -fsanitize=address -fsanitize=undefined
LDLIBS+= -lpthread -lurcu
TARGETS = main idr-test multiorder
CORE_OFILES := radix-tree.o idr.o linux.o test.o find_bit.o
TARGETS = main idr-test multiorder xarray
CORE_OFILES := xarray.o radix-tree.o idr.o linux.o test.o find_bit.o bitmap.o
OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o
@ -25,6 +25,8 @@ main: $(OFILES)
idr-test.o: ../../../lib/test_ida.c
idr-test: idr-test.o $(CORE_OFILES)
xarray: $(CORE_OFILES)
multiorder: multiorder.o $(CORE_OFILES)
clean:
@ -35,6 +37,7 @@ vpath %.c ../../lib
$(OFILES): Makefile *.h */*.h generated/map-shift.h \
../../include/linux/*.h \
../../include/asm/*.h \
../../../include/linux/xarray.h \
../../../include/linux/radix-tree.h \
../../../include/linux/idr.h
@ -44,8 +47,10 @@ radix-tree.c: ../../../lib/radix-tree.c
idr.c: ../../../lib/idr.c
sed -e 's/^static //' -e 's/__always_inline //' -e 's/inline //' < $< > $@
xarray.o: ../../../lib/xarray.c ../../../lib/test_xarray.c
generated/map-shift.h:
@if ! grep -qws $(SHIFT) generated/map-shift.h; then \
echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" > \
echo "#define XA_CHUNK_SHIFT $(SHIFT)" > \
generated/map-shift.h; \
fi

View File

@ -17,9 +17,6 @@
#include <time.h>
#include "test.h"
#define for_each_index(i, base, order) \
for (i = base; i < base + (1 << order); i++)
#define NSEC_PER_SEC 1000000000L
static long long benchmark_iter(struct radix_tree_root *root, bool tagged)
@ -61,7 +58,7 @@ static long long benchmark_iter(struct radix_tree_root *root, bool tagged)
}
static void benchmark_insert(struct radix_tree_root *root,
unsigned long size, unsigned long step, int order)
unsigned long size, unsigned long step)
{
struct timespec start, finish;
unsigned long index;
@ -70,19 +67,19 @@ static void benchmark_insert(struct radix_tree_root *root,
clock_gettime(CLOCK_MONOTONIC, &start);
for (index = 0 ; index < size ; index += step)
item_insert_order(root, index, order);
item_insert(root, index);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
printv(2, "Size: %8ld, step: %8ld, order: %d, insertion: %15lld ns\n",
size, step, order, nsec);
printv(2, "Size: %8ld, step: %8ld, insertion: %15lld ns\n",
size, step, nsec);
}
static void benchmark_tagging(struct radix_tree_root *root,
unsigned long size, unsigned long step, int order)
unsigned long size, unsigned long step)
{
struct timespec start, finish;
unsigned long index;
@ -98,138 +95,53 @@ static void benchmark_tagging(struct radix_tree_root *root,
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
printv(2, "Size: %8ld, step: %8ld, order: %d, tagging: %17lld ns\n",
size, step, order, nsec);
printv(2, "Size: %8ld, step: %8ld, tagging: %17lld ns\n",
size, step, nsec);
}
static void benchmark_delete(struct radix_tree_root *root,
unsigned long size, unsigned long step, int order)
unsigned long size, unsigned long step)
{
struct timespec start, finish;
unsigned long index, i;
unsigned long index;
long long nsec;
clock_gettime(CLOCK_MONOTONIC, &start);
for (index = 0 ; index < size ; index += step)
for_each_index(i, index, order)
item_delete(root, i);
item_delete(root, index);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
printv(2, "Size: %8ld, step: %8ld, order: %d, deletion: %16lld ns\n",
size, step, order, nsec);
printv(2, "Size: %8ld, step: %8ld, deletion: %16lld ns\n",
size, step, nsec);
}
static void benchmark_size(unsigned long size, unsigned long step, int order)
static void benchmark_size(unsigned long size, unsigned long step)
{
RADIX_TREE(tree, GFP_KERNEL);
long long normal, tagged;
benchmark_insert(&tree, size, step, order);
benchmark_tagging(&tree, size, step, order);
benchmark_insert(&tree, size, step);
benchmark_tagging(&tree, size, step);
tagged = benchmark_iter(&tree, true);
normal = benchmark_iter(&tree, false);
printv(2, "Size: %8ld, step: %8ld, order: %d, tagged iteration: %8lld ns\n",
size, step, order, tagged);
printv(2, "Size: %8ld, step: %8ld, order: %d, normal iteration: %8lld ns\n",
size, step, order, normal);
printv(2, "Size: %8ld, step: %8ld, tagged iteration: %8lld ns\n",
size, step, tagged);
printv(2, "Size: %8ld, step: %8ld, normal iteration: %8lld ns\n",
size, step, normal);
benchmark_delete(&tree, size, step, order);
benchmark_delete(&tree, size, step);
item_kill_tree(&tree);
rcu_barrier();
}
static long long __benchmark_split(unsigned long index,
int old_order, int new_order)
{
struct timespec start, finish;
long long nsec;
RADIX_TREE(tree, GFP_ATOMIC);
item_insert_order(&tree, index, old_order);
clock_gettime(CLOCK_MONOTONIC, &start);
radix_tree_split(&tree, index, new_order);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
item_kill_tree(&tree);
return nsec;
}
static void benchmark_split(unsigned long size, unsigned long step)
{
int i, j, idx;
long long nsec = 0;
for (idx = 0; idx < size; idx += step) {
for (i = 3; i < 11; i++) {
for (j = 0; j < i; j++) {
nsec += __benchmark_split(idx, i, j);
}
}
}
printv(2, "Size %8ld, step %8ld, split time %10lld ns\n",
size, step, nsec);
}
static long long __benchmark_join(unsigned long index,
unsigned order1, unsigned order2)
{
unsigned long loc;
struct timespec start, finish;
long long nsec;
void *item, *item2 = item_create(index + 1, order1);
RADIX_TREE(tree, GFP_KERNEL);
item_insert_order(&tree, index, order2);
item = radix_tree_lookup(&tree, index);
clock_gettime(CLOCK_MONOTONIC, &start);
radix_tree_join(&tree, index + 1, order1, item2);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
loc = find_item(&tree, item);
if (loc == -1)
free(item);
item_kill_tree(&tree);
return nsec;
}
static void benchmark_join(unsigned long step)
{
int i, j, idx;
long long nsec = 0;
for (idx = 0; idx < 1 << 10; idx += step) {
for (i = 1; i < 15; i++) {
for (j = 0; j < i; j++) {
nsec += __benchmark_join(idx, i, j);
}
}
}
printv(2, "Size %8d, step %8ld, join time %10lld ns\n",
1 << 10, step, nsec);
}
void benchmark(void)
{
unsigned long size[] = {1 << 10, 1 << 20, 0};
@ -242,16 +154,5 @@ void benchmark(void)
for (c = 0; size[c]; c++)
for (s = 0; step[s]; s++)
benchmark_size(size[c], step[s], 0);
for (c = 0; size[c]; c++)
for (s = 0; step[s]; s++)
benchmark_size(size[c], step[s] << 9, 9);
for (c = 0; size[c]; c++)
for (s = 0; step[s]; s++)
benchmark_split(size[c], step[s]);
for (s = 0; step[s]; s++)
benchmark_join(step[s]);
benchmark_size(size[c], step[s]);
}

View File

@ -0,0 +1,23 @@
/* lib/bitmap.c pulls in at least two other files. */
#include <linux/bitmap.h>
void bitmap_clear(unsigned long *map, unsigned int start, int len)
{
unsigned long *p = map + BIT_WORD(start);
const unsigned int size = start + len;
int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
while (len - bits_to_clear >= 0) {
*p &= ~mask_to_clear;
len -= bits_to_clear;
bits_to_clear = BITS_PER_LONG;
mask_to_clear = ~0UL;
p++;
}
if (len) {
mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
*p &= ~mask_to_clear;
}
}

View File

@ -1 +1 @@
#define CONFIG_RADIX_TREE_MULTIORDER 1
#define CONFIG_XARRAY_MULTI 1

View File

@ -19,7 +19,7 @@
#include "test.h"
#define DUMMY_PTR ((void *)0x12)
#define DUMMY_PTR ((void *)0x10)
int item_idr_free(int id, void *p, void *data)
{
@ -227,6 +227,66 @@ void idr_u32_test(int base)
idr_u32_test1(&idr, 0xffffffff);
}
static void idr_align_test(struct idr *idr)
{
char name[] = "Motorola 68000";
int i, id;
void *entry;
for (i = 0; i < 9; i++) {
BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i);
idr_for_each_entry(idr, entry, id);
}
idr_destroy(idr);
for (i = 1; i < 10; i++) {
BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i - 1);
idr_for_each_entry(idr, entry, id);
}
idr_destroy(idr);
for (i = 2; i < 11; i++) {
BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i - 2);
idr_for_each_entry(idr, entry, id);
}
idr_destroy(idr);
for (i = 3; i < 12; i++) {
BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != i - 3);
idr_for_each_entry(idr, entry, id);
}
idr_destroy(idr);
for (i = 0; i < 8; i++) {
BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != 0);
BUG_ON(idr_alloc(idr, &name[i + 1], 0, 0, GFP_KERNEL) != 1);
idr_for_each_entry(idr, entry, id);
idr_remove(idr, 1);
idr_for_each_entry(idr, entry, id);
idr_remove(idr, 0);
BUG_ON(!idr_is_empty(idr));
}
for (i = 0; i < 8; i++) {
BUG_ON(idr_alloc(idr, NULL, 0, 0, GFP_KERNEL) != 0);
idr_for_each_entry(idr, entry, id);
idr_replace(idr, &name[i], 0);
idr_for_each_entry(idr, entry, id);
BUG_ON(idr_find(idr, 0) != &name[i]);
idr_remove(idr, 0);
}
for (i = 0; i < 8; i++) {
BUG_ON(idr_alloc(idr, &name[i], 0, 0, GFP_KERNEL) != 0);
BUG_ON(idr_alloc(idr, NULL, 0, 0, GFP_KERNEL) != 1);
idr_remove(idr, 1);
idr_for_each_entry(idr, entry, id);
idr_replace(idr, &name[i + 1], 0);
idr_for_each_entry(idr, entry, id);
idr_remove(idr, 0);
}
}
void idr_checks(void)
{
unsigned long i;
@ -307,6 +367,7 @@ void idr_checks(void)
idr_u32_test(4);
idr_u32_test(1);
idr_u32_test(0);
idr_align_test(&idr);
}
#define module_init(x)
@ -344,16 +405,16 @@ void ida_check_conv_user(void)
DEFINE_IDA(ida);
unsigned long i;
radix_tree_cpu_dead(1);
for (i = 0; i < 1000000; i++) {
int id = ida_alloc(&ida, GFP_NOWAIT);
if (id == -ENOMEM) {
IDA_BUG_ON(&ida, (i % IDA_BITMAP_BITS) !=
BITS_PER_LONG - 2);
IDA_BUG_ON(&ida, ((i % IDA_BITMAP_BITS) !=
BITS_PER_XA_VALUE) &&
((i % IDA_BITMAP_BITS) != 0));
id = ida_alloc(&ida, GFP_KERNEL);
} else {
IDA_BUG_ON(&ida, (i % IDA_BITMAP_BITS) ==
BITS_PER_LONG - 2);
BITS_PER_XA_VALUE);
}
IDA_BUG_ON(&ida, id != i);
}

View File

@ -1,5 +1,5 @@
/*
* iteration_check.c: test races having to do with radix tree iteration
* iteration_check.c: test races having to do with xarray iteration
* Copyright (c) 2016 Intel Corporation
* Author: Ross Zwisler <ross.zwisler@linux.intel.com>
*
@ -12,41 +12,54 @@
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#include <linux/radix-tree.h>
#include <pthread.h>
#include "test.h"
#define NUM_THREADS 5
#define MAX_IDX 100
#define TAG 0
#define NEW_TAG 1
#define TAG XA_MARK_0
#define NEW_TAG XA_MARK_1
static pthread_mutex_t tree_lock = PTHREAD_MUTEX_INITIALIZER;
static pthread_t threads[NUM_THREADS];
static unsigned int seeds[3];
static RADIX_TREE(tree, GFP_KERNEL);
static DEFINE_XARRAY(array);
static bool test_complete;
static int max_order;
/* relentlessly fill the tree with tagged entries */
void my_item_insert(struct xarray *xa, unsigned long index)
{
XA_STATE(xas, xa, index);
struct item *item = item_create(index, 0);
int order;
retry:
xas_lock(&xas);
for (order = max_order; order >= 0; order--) {
xas_set_order(&xas, index, order);
item->order = order;
if (xas_find_conflict(&xas))
continue;
xas_store(&xas, item);
xas_set_mark(&xas, TAG);
break;
}
xas_unlock(&xas);
if (xas_nomem(&xas, GFP_KERNEL))
goto retry;
if (order < 0)
free(item);
}
/* relentlessly fill the array with tagged entries */
static void *add_entries_fn(void *arg)
{
rcu_register_thread();
while (!test_complete) {
unsigned long pgoff;
int order;
for (pgoff = 0; pgoff < MAX_IDX; pgoff++) {
pthread_mutex_lock(&tree_lock);
for (order = max_order; order >= 0; order--) {
if (item_insert_order(&tree, pgoff, order)
== 0) {
item_tag_set(&tree, pgoff, TAG);
break;
}
}
pthread_mutex_unlock(&tree_lock);
my_item_insert(&array, pgoff);
}
}
@ -56,33 +69,25 @@ static void *add_entries_fn(void *arg)
}
/*
* Iterate over the tagged entries, doing a radix_tree_iter_retry() as we find
* things that have been removed and randomly resetting our iteration to the
* next chunk with radix_tree_iter_resume(). Both radix_tree_iter_retry() and
* radix_tree_iter_resume() cause radix_tree_next_slot() to be called with a
* NULL 'slot' variable.
* Iterate over tagged entries, retrying when we find ourselves in a deleted
* node and randomly pausing the iteration.
*/
static void *tagged_iteration_fn(void *arg)
{
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, &array, 0);
void *entry;
rcu_register_thread();
while (!test_complete) {
xas_set(&xas, 0);
rcu_read_lock();
radix_tree_for_each_tagged(slot, &tree, &iter, 0, TAG) {
void *entry = radix_tree_deref_slot(slot);
if (unlikely(!entry))
xas_for_each_marked(&xas, entry, ULONG_MAX, TAG) {
if (xas_retry(&xas, entry))
continue;
if (radix_tree_deref_retry(entry)) {
slot = radix_tree_iter_retry(&iter);
continue;
}
if (rand_r(&seeds[0]) % 50 == 0) {
slot = radix_tree_iter_resume(slot, &iter);
xas_pause(&xas);
rcu_read_unlock();
rcu_barrier();
rcu_read_lock();
@ -97,33 +102,25 @@ static void *tagged_iteration_fn(void *arg)
}
/*
* Iterate over the entries, doing a radix_tree_iter_retry() as we find things
* that have been removed and randomly resetting our iteration to the next
* chunk with radix_tree_iter_resume(). Both radix_tree_iter_retry() and
* radix_tree_iter_resume() cause radix_tree_next_slot() to be called with a
* NULL 'slot' variable.
* Iterate over the entries, retrying when we find ourselves in a deleted
* node and randomly pausing the iteration.
*/
static void *untagged_iteration_fn(void *arg)
{
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, &array, 0);
void *entry;
rcu_register_thread();
while (!test_complete) {
xas_set(&xas, 0);
rcu_read_lock();
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
void *entry = radix_tree_deref_slot(slot);
if (unlikely(!entry))
xas_for_each(&xas, entry, ULONG_MAX) {
if (xas_retry(&xas, entry))
continue;
if (radix_tree_deref_retry(entry)) {
slot = radix_tree_iter_retry(&iter);
continue;
}
if (rand_r(&seeds[1]) % 50 == 0) {
slot = radix_tree_iter_resume(slot, &iter);
xas_pause(&xas);
rcu_read_unlock();
rcu_barrier();
rcu_read_lock();
@ -138,7 +135,7 @@ static void *untagged_iteration_fn(void *arg)
}
/*
* Randomly remove entries to help induce radix_tree_iter_retry() calls in the
* Randomly remove entries to help induce retries in the
* two iteration functions.
*/
static void *remove_entries_fn(void *arg)
@ -147,12 +144,13 @@ static void *remove_entries_fn(void *arg)
while (!test_complete) {
int pgoff;
struct item *item;
pgoff = rand_r(&seeds[2]) % MAX_IDX;
pthread_mutex_lock(&tree_lock);
item_delete(&tree, pgoff);
pthread_mutex_unlock(&tree_lock);
item = xa_erase(&array, pgoff);
if (item)
item_free(item, pgoff);
}
rcu_unregister_thread();
@ -165,8 +163,7 @@ static void *tag_entries_fn(void *arg)
rcu_register_thread();
while (!test_complete) {
tag_tagged_items(&tree, &tree_lock, 0, MAX_IDX, 10, TAG,
NEW_TAG);
tag_tagged_items(&array, 0, MAX_IDX, 10, TAG, NEW_TAG);
}
rcu_unregister_thread();
return NULL;
@ -217,5 +214,5 @@ void iteration_test(unsigned order, unsigned test_duration)
}
}
item_kill_tree(&tree);
item_kill_tree(&array);
}

View File

@ -1 +1,2 @@
#include <stdio.h>
#include "asm/bug.h"

View File

@ -0,0 +1 @@
#include "../../../../include/linux/kconfig.h"

View File

@ -14,7 +14,12 @@
#include "../../../include/linux/kconfig.h"
#define printk printf
#define pr_info printk
#define pr_debug printk
#define pr_cont printk
#define __acquires(x)
#define __releases(x)
#define __must_hold(x)
#endif /* _KERNEL_H */

View File

@ -0,0 +1,11 @@
#ifndef _LINUX_LOCKDEP_H
#define _LINUX_LOCKDEP_H
struct lock_class_key {
unsigned int a;
};
static inline void lockdep_set_class(spinlock_t *lock,
struct lock_class_key *key)
{
}
#endif /* _LINUX_LOCKDEP_H */

View File

@ -2,7 +2,6 @@
#ifndef _TEST_RADIX_TREE_H
#define _TEST_RADIX_TREE_H
#include "generated/map-shift.h"
#include "../../../../include/linux/radix-tree.h"
extern int kmalloc_verbose;

View File

@ -6,5 +6,7 @@
#define rcu_dereference_raw(p) rcu_dereference(p)
#define rcu_dereference_protected(p, cond) rcu_dereference(p)
#define rcu_dereference_check(p, cond) rcu_dereference(p)
#define RCU_INIT_POINTER(p, v) (p) = (v)
#endif

View File

@ -214,7 +214,7 @@ void copy_tag_check(void)
}
// printf("\ncopying tags...\n");
tagged = tag_tagged_items(&tree, NULL, start, end, ITEMS, 0, 1);
tagged = tag_tagged_items(&tree, start, end, ITEMS, XA_MARK_0, XA_MARK_1);
// printf("checking copied tags\n");
assert(tagged == count);
@ -223,7 +223,7 @@ void copy_tag_check(void)
/* Copy tags in several rounds */
// printf("\ncopying tags...\n");
tmp = rand() % (count / 10 + 2);
tagged = tag_tagged_items(&tree, NULL, start, end, tmp, 0, 2);
tagged = tag_tagged_items(&tree, start, end, tmp, XA_MARK_0, XA_MARK_2);
assert(tagged == count);
// printf("%lu %lu %lu\n", tagged, tmp, count);
@ -236,63 +236,6 @@ void copy_tag_check(void)
item_kill_tree(&tree);
}
static void __locate_check(struct radix_tree_root *tree, unsigned long index,
unsigned order)
{
struct item *item;
unsigned long index2;
item_insert_order(tree, index, order);
item = item_lookup(tree, index);
index2 = find_item(tree, item);
if (index != index2) {
printv(2, "index %ld order %d inserted; found %ld\n",
index, order, index2);
abort();
}
}
static void __order_0_locate_check(void)
{
RADIX_TREE(tree, GFP_KERNEL);
int i;
for (i = 0; i < 50; i++)
__locate_check(&tree, rand() % INT_MAX, 0);
item_kill_tree(&tree);
}
static void locate_check(void)
{
RADIX_TREE(tree, GFP_KERNEL);
unsigned order;
unsigned long offset, index;
__order_0_locate_check();
for (order = 0; order < 20; order++) {
for (offset = 0; offset < (1 << (order + 3));
offset += (1UL << order)) {
for (index = 0; index < (1UL << (order + 5));
index += (1UL << order)) {
__locate_check(&tree, index + offset, order);
}
if (find_item(&tree, &tree) != -1)
abort();
item_kill_tree(&tree);
}
}
if (find_item(&tree, &tree) != -1)
abort();
__locate_check(&tree, -1, 0);
if (find_item(&tree, &tree) != -1)
abort();
item_kill_tree(&tree);
}
static void single_thread_tests(bool long_run)
{
int i;
@ -303,10 +246,6 @@ static void single_thread_tests(bool long_run)
rcu_barrier();
printv(2, "after multiorder_check: %d allocated, preempt %d\n",
nr_allocated, preempt_count);
locate_check();
rcu_barrier();
printv(2, "after locate_check: %d allocated, preempt %d\n",
nr_allocated, preempt_count);
tag_check();
rcu_barrier();
printv(2, "after tag_check: %d allocated, preempt %d\n",
@ -365,6 +304,7 @@ int main(int argc, char **argv)
rcu_register_thread();
radix_tree_init();
xarray_tests();
regression1_test();
regression2_test();
regression3_test();

View File

@ -20,230 +20,39 @@
#include "test.h"
#define for_each_index(i, base, order) \
for (i = base; i < base + (1 << order); i++)
static void __multiorder_tag_test(int index, int order)
static int item_insert_order(struct xarray *xa, unsigned long index,
unsigned order)
{
RADIX_TREE(tree, GFP_KERNEL);
int base, err, i;
XA_STATE_ORDER(xas, xa, index, order);
struct item *item = item_create(index, order);
/* our canonical entry */
base = index & ~((1 << order) - 1);
do {
xas_lock(&xas);
xas_store(&xas, item);
xas_unlock(&xas);
} while (xas_nomem(&xas, GFP_KERNEL));
printv(2, "Multiorder tag test with index %d, canonical entry %d\n",
index, base);
if (!xas_error(&xas))
return 0;
err = item_insert_order(&tree, index, order);
assert(!err);
/*
* Verify we get collisions for covered indices. We try and fail to
* insert an exceptional entry so we don't leak memory via
* item_insert_order().
*/
for_each_index(i, base, order) {
err = __radix_tree_insert(&tree, i, order,
(void *)(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY));
assert(err == -EEXIST);
}
for_each_index(i, base, order) {
assert(!radix_tree_tag_get(&tree, i, 0));
assert(!radix_tree_tag_get(&tree, i, 1));
}
assert(radix_tree_tag_set(&tree, index, 0));
for_each_index(i, base, order) {
assert(radix_tree_tag_get(&tree, i, 0));
assert(!radix_tree_tag_get(&tree, i, 1));
}
assert(tag_tagged_items(&tree, NULL, 0, ~0UL, 10, 0, 1) == 1);
assert(radix_tree_tag_clear(&tree, index, 0));
for_each_index(i, base, order) {
assert(!radix_tree_tag_get(&tree, i, 0));
assert(radix_tree_tag_get(&tree, i, 1));
}
assert(radix_tree_tag_clear(&tree, index, 1));
assert(!radix_tree_tagged(&tree, 0));
assert(!radix_tree_tagged(&tree, 1));
item_kill_tree(&tree);
free(item);
return xas_error(&xas);
}
static void __multiorder_tag_test2(unsigned order, unsigned long index2)
void multiorder_iteration(struct xarray *xa)
{
RADIX_TREE(tree, GFP_KERNEL);
unsigned long index = (1 << order);
index2 += index;
assert(item_insert_order(&tree, 0, order) == 0);
assert(item_insert(&tree, index2) == 0);
assert(radix_tree_tag_set(&tree, 0, 0));
assert(radix_tree_tag_set(&tree, index2, 0));
assert(tag_tagged_items(&tree, NULL, 0, ~0UL, 10, 0, 1) == 2);
item_kill_tree(&tree);
}
static void multiorder_tag_tests(void)
{
int i, j;
/* test multi-order entry for indices 0-7 with no sibling pointers */
__multiorder_tag_test(0, 3);
__multiorder_tag_test(5, 3);
/* test multi-order entry for indices 8-15 with no sibling pointers */
__multiorder_tag_test(8, 3);
__multiorder_tag_test(15, 3);
/*
* Our order 5 entry covers indices 0-31 in a tree with height=2.
* This is broken up as follows:
* 0-7: canonical entry
* 8-15: sibling 1
* 16-23: sibling 2
* 24-31: sibling 3
*/
__multiorder_tag_test(0, 5);
__multiorder_tag_test(29, 5);
/* same test, but with indices 32-63 */
__multiorder_tag_test(32, 5);
__multiorder_tag_test(44, 5);
/*
* Our order 8 entry covers indices 0-255 in a tree with height=3.
* This is broken up as follows:
* 0-63: canonical entry
* 64-127: sibling 1
* 128-191: sibling 2
* 192-255: sibling 3
*/
__multiorder_tag_test(0, 8);
__multiorder_tag_test(190, 8);
/* same test, but with indices 256-511 */
__multiorder_tag_test(256, 8);
__multiorder_tag_test(300, 8);
__multiorder_tag_test(0x12345678UL, 8);
for (i = 1; i < 10; i++)
for (j = 0; j < (10 << i); j++)
__multiorder_tag_test2(i, j);
}
static void multiorder_check(unsigned long index, int order)
{
unsigned long i;
unsigned long min = index & ~((1UL << order) - 1);
unsigned long max = min + (1UL << order);
void **slot;
struct item *item2 = item_create(min, order);
RADIX_TREE(tree, GFP_KERNEL);
printv(2, "Multiorder index %ld, order %d\n", index, order);
assert(item_insert_order(&tree, index, order) == 0);
for (i = min; i < max; i++) {
struct item *item = item_lookup(&tree, i);
assert(item != 0);
assert(item->index == index);
}
for (i = 0; i < min; i++)
item_check_absent(&tree, i);
for (i = max; i < 2*max; i++)
item_check_absent(&tree, i);
for (i = min; i < max; i++)
assert(radix_tree_insert(&tree, i, item2) == -EEXIST);
slot = radix_tree_lookup_slot(&tree, index);
free(*slot);
radix_tree_replace_slot(&tree, slot, item2);
for (i = min; i < max; i++) {
struct item *item = item_lookup(&tree, i);
assert(item != 0);
assert(item->index == min);
}
assert(item_delete(&tree, min) != 0);
for (i = 0; i < 2*max; i++)
item_check_absent(&tree, i);
}
static void multiorder_shrink(unsigned long index, int order)
{
unsigned long i;
unsigned long max = 1 << order;
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
printv(2, "Multiorder shrink index %ld, order %d\n", index, order);
assert(item_insert_order(&tree, 0, order) == 0);
node = tree.rnode;
assert(item_insert(&tree, index) == 0);
assert(node != tree.rnode);
assert(item_delete(&tree, index) != 0);
assert(node == tree.rnode);
for (i = 0; i < max; i++) {
struct item *item = item_lookup(&tree, i);
assert(item != 0);
assert(item->index == 0);
}
for (i = max; i < 2*max; i++)
item_check_absent(&tree, i);
if (!item_delete(&tree, 0)) {
printv(2, "failed to delete index %ld (order %d)\n", index, order);
abort();
}
for (i = 0; i < 2*max; i++)
item_check_absent(&tree, i);
}
static void multiorder_insert_bug(void)
{
RADIX_TREE(tree, GFP_KERNEL);
item_insert(&tree, 0);
radix_tree_tag_set(&tree, 0, 0);
item_insert_order(&tree, 3 << 6, 6);
item_kill_tree(&tree);
}
void multiorder_iteration(void)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, xa, 0);
struct item *item;
int i, j, err;
printv(1, "Multiorder iteration test\n");
#define NUM_ENTRIES 11
int index[NUM_ENTRIES] = {0, 2, 4, 8, 16, 32, 34, 36, 64, 72, 128};
int order[NUM_ENTRIES] = {1, 1, 2, 3, 4, 1, 0, 1, 3, 0, 7};
printv(1, "Multiorder iteration test\n");
for (i = 0; i < NUM_ENTRIES; i++) {
err = item_insert_order(&tree, index[i], order[i]);
err = item_insert_order(xa, index[i], order[i]);
assert(!err);
}
@ -252,14 +61,14 @@ void multiorder_iteration(void)
if (j <= (index[i] | ((1 << order[i]) - 1)))
break;
radix_tree_for_each_slot(slot, &tree, &iter, j) {
int height = order[i] / RADIX_TREE_MAP_SHIFT;
int shift = height * RADIX_TREE_MAP_SHIFT;
xas_set(&xas, j);
xas_for_each(&xas, item, ULONG_MAX) {
int height = order[i] / XA_CHUNK_SHIFT;
int shift = height * XA_CHUNK_SHIFT;
unsigned long mask = (1UL << order[i]) - 1;
struct item *item = *slot;
assert((iter.index | mask) == (index[i] | mask));
assert(iter.shift == shift);
assert((xas.xa_index | mask) == (index[i] | mask));
assert(xas.xa_node->shift == shift);
assert(!radix_tree_is_internal_node(item));
assert((item->index | mask) == (index[i] | mask));
assert(item->order == order[i]);
@ -267,18 +76,15 @@ void multiorder_iteration(void)
}
}
item_kill_tree(&tree);
item_kill_tree(xa);
}
void multiorder_tagged_iteration(void)
void multiorder_tagged_iteration(struct xarray *xa)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, xa, 0);
struct item *item;
int i, j;
printv(1, "Multiorder tagged iteration test\n");
#define MT_NUM_ENTRIES 9
int index[MT_NUM_ENTRIES] = {0, 2, 4, 16, 32, 40, 64, 72, 128};
int order[MT_NUM_ENTRIES] = {1, 0, 2, 4, 3, 1, 3, 0, 7};
@ -286,13 +92,15 @@ void multiorder_tagged_iteration(void)
#define TAG_ENTRIES 7
int tag_index[TAG_ENTRIES] = {0, 4, 16, 40, 64, 72, 128};
for (i = 0; i < MT_NUM_ENTRIES; i++)
assert(!item_insert_order(&tree, index[i], order[i]));
printv(1, "Multiorder tagged iteration test\n");
assert(!radix_tree_tagged(&tree, 1));
for (i = 0; i < MT_NUM_ENTRIES; i++)
assert(!item_insert_order(xa, index[i], order[i]));
assert(!xa_marked(xa, XA_MARK_1));
for (i = 0; i < TAG_ENTRIES; i++)
assert(radix_tree_tag_set(&tree, tag_index[i], 1));
xa_set_mark(xa, tag_index[i], XA_MARK_1);
for (j = 0; j < 256; j++) {
int k;
@ -304,23 +112,23 @@ void multiorder_tagged_iteration(void)
break;
}
radix_tree_for_each_tagged(slot, &tree, &iter, j, 1) {
xas_set(&xas, j);
xas_for_each_marked(&xas, item, ULONG_MAX, XA_MARK_1) {
unsigned long mask;
struct item *item = *slot;
for (k = i; index[k] < tag_index[i]; k++)
;
mask = (1UL << order[k]) - 1;
assert((iter.index | mask) == (tag_index[i] | mask));
assert(!radix_tree_is_internal_node(item));
assert((xas.xa_index | mask) == (tag_index[i] | mask));
assert(!xa_is_internal(item));
assert((item->index | mask) == (tag_index[i] | mask));
assert(item->order == order[k]);
i++;
}
}
assert(tag_tagged_items(&tree, NULL, 0, ~0UL, TAG_ENTRIES, 1, 2) ==
TAG_ENTRIES);
assert(tag_tagged_items(xa, 0, ULONG_MAX, TAG_ENTRIES, XA_MARK_1,
XA_MARK_2) == TAG_ENTRIES);
for (j = 0; j < 256; j++) {
int mask, k;
@ -332,297 +140,31 @@ void multiorder_tagged_iteration(void)
break;
}
radix_tree_for_each_tagged(slot, &tree, &iter, j, 2) {
struct item *item = *slot;
xas_set(&xas, j);
xas_for_each_marked(&xas, item, ULONG_MAX, XA_MARK_2) {
for (k = i; index[k] < tag_index[i]; k++)
;
mask = (1 << order[k]) - 1;
assert((iter.index | mask) == (tag_index[i] | mask));
assert(!radix_tree_is_internal_node(item));
assert((xas.xa_index | mask) == (tag_index[i] | mask));
assert(!xa_is_internal(item));
assert((item->index | mask) == (tag_index[i] | mask));
assert(item->order == order[k]);
i++;
}
}
assert(tag_tagged_items(&tree, NULL, 1, ~0UL, MT_NUM_ENTRIES * 2, 1, 0)
== TAG_ENTRIES);
assert(tag_tagged_items(xa, 1, ULONG_MAX, MT_NUM_ENTRIES * 2, XA_MARK_1,
XA_MARK_0) == TAG_ENTRIES);
i = 0;
radix_tree_for_each_tagged(slot, &tree, &iter, 0, 0) {
assert(iter.index == tag_index[i]);
xas_set(&xas, 0);
xas_for_each_marked(&xas, item, ULONG_MAX, XA_MARK_0) {
assert(xas.xa_index == tag_index[i]);
i++;
}
assert(i == TAG_ENTRIES);
item_kill_tree(&tree);
}
/*
* Basic join checks: make sure we can't find an entry in the tree after
* a larger entry has replaced it
*/
static void multiorder_join1(unsigned long index,
unsigned order1, unsigned order2)
{
unsigned long loc;
void *item, *item2 = item_create(index + 1, order1);
RADIX_TREE(tree, GFP_KERNEL);
item_insert_order(&tree, index, order2);
item = radix_tree_lookup(&tree, index);
radix_tree_join(&tree, index + 1, order1, item2);
loc = find_item(&tree, item);
if (loc == -1)
free(item);
item = radix_tree_lookup(&tree, index + 1);
assert(item == item2);
item_kill_tree(&tree);
}
/*
* Check that the accounting of exceptional entries is handled correctly
* by joining an exceptional entry to a normal pointer.
*/
static void multiorder_join2(unsigned order1, unsigned order2)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
void *item1 = item_create(0, order1);
void *item2;
item_insert_order(&tree, 0, order2);
radix_tree_insert(&tree, 1 << order2, (void *)0x12UL);
item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL);
assert(item2 == (void *)0x12UL);
assert(node->exceptional == 1);
item2 = radix_tree_lookup(&tree, 0);
free(item2);
radix_tree_join(&tree, 0, order1, item1);
item2 = __radix_tree_lookup(&tree, 1 << order2, &node, NULL);
assert(item2 == item1);
assert(node->exceptional == 0);
item_kill_tree(&tree);
}
/*
* This test revealed an accounting bug for exceptional entries at one point.
* Nodes were being freed back into the pool with an elevated exception count
* by radix_tree_join() and then radix_tree_split() was failing to zero the
* count of exceptional entries.
*/
static void multiorder_join3(unsigned int order)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
void **slot;
struct radix_tree_iter iter;
unsigned long i;
for (i = 0; i < (1 << order); i++) {
radix_tree_insert(&tree, i, (void *)0x12UL);
}
radix_tree_join(&tree, 0, order, (void *)0x16UL);
rcu_barrier();
radix_tree_split(&tree, 0, 0);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot, (void *)0x12UL);
}
__radix_tree_lookup(&tree, 0, &node, NULL);
assert(node->exceptional == node->count);
item_kill_tree(&tree);
}
static void multiorder_join(void)
{
int i, j, idx;
for (idx = 0; idx < 1024; idx = idx * 2 + 3) {
for (i = 1; i < 15; i++) {
for (j = 0; j < i; j++) {
multiorder_join1(idx, i, j);
}
}
}
for (i = 1; i < 15; i++) {
for (j = 0; j < i; j++) {
multiorder_join2(i, j);
}
}
for (i = 3; i < 10; i++) {
multiorder_join3(i);
}
}
static void check_mem(unsigned old_order, unsigned new_order, unsigned alloc)
{
struct radix_tree_preload *rtp = &radix_tree_preloads;
if (rtp->nr != 0)
printv(2, "split(%u %u) remaining %u\n", old_order, new_order,
rtp->nr);
/*
* Can't check for equality here as some nodes may have been
* RCU-freed while we ran. But we should never finish with more
* nodes allocated since they should have all been preloaded.
*/
if (nr_allocated > alloc)
printv(2, "split(%u %u) allocated %u %u\n", old_order, new_order,
alloc, nr_allocated);
}
static void __multiorder_split(int old_order, int new_order)
{
RADIX_TREE(tree, GFP_ATOMIC);
void **slot;
struct radix_tree_iter iter;
unsigned alloc;
struct item *item;
radix_tree_preload(GFP_KERNEL);
assert(item_insert_order(&tree, 0, old_order) == 0);
radix_tree_preload_end();
/* Wipe out the preloaded cache or it'll confuse check_mem() */
radix_tree_cpu_dead(0);
item = radix_tree_tag_set(&tree, 0, 2);
radix_tree_split_preload(old_order, new_order, GFP_KERNEL);
alloc = nr_allocated;
radix_tree_split(&tree, 0, new_order);
check_mem(old_order, new_order, alloc);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot,
item_create(iter.index, new_order));
}
radix_tree_preload_end();
item_kill_tree(&tree);
free(item);
}
static void __multiorder_split2(int old_order, int new_order)
{
RADIX_TREE(tree, GFP_KERNEL);
void **slot;
struct radix_tree_iter iter;
struct radix_tree_node *node;
void *item;
__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == (void *)0x12);
assert(node->exceptional > 0);
radix_tree_split(&tree, 0, new_order);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot,
item_create(iter.index, new_order));
}
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item != (void *)0x12);
assert(node->exceptional == 0);
item_kill_tree(&tree);
}
static void __multiorder_split3(int old_order, int new_order)
{
RADIX_TREE(tree, GFP_KERNEL);
void **slot;
struct radix_tree_iter iter;
struct radix_tree_node *node;
void *item;
__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == (void *)0x12);
assert(node->exceptional > 0);
radix_tree_split(&tree, 0, new_order);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_iter_replace(&tree, &iter, slot, (void *)0x16);
}
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == (void *)0x16);
assert(node->exceptional > 0);
item_kill_tree(&tree);
__radix_tree_insert(&tree, 0, old_order, (void *)0x12);
item = __radix_tree_lookup(&tree, 0, &node, NULL);
assert(item == (void *)0x12);
assert(node->exceptional > 0);
radix_tree_split(&tree, 0, new_order);
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
if (iter.index == (1 << new_order))
radix_tree_iter_replace(&tree, &iter, slot,
(void *)0x16);
else
radix_tree_iter_replace(&tree, &iter, slot, NULL);
}
item = __radix_tree_lookup(&tree, 1 << new_order, &node, NULL);
assert(item == (void *)0x16);
assert(node->count == node->exceptional);
do {
node = node->parent;
if (!node)
break;
assert(node->count == 1);
assert(node->exceptional == 0);
} while (1);
item_kill_tree(&tree);
}
static void multiorder_split(void)
{
int i, j;
for (i = 3; i < 11; i++)
for (j = 0; j < i; j++) {
__multiorder_split(i, j);
__multiorder_split2(i, j);
__multiorder_split3(i, j);
}
}
static void multiorder_account(void)
{
RADIX_TREE(tree, GFP_KERNEL);
struct radix_tree_node *node;
void **slot;
item_insert_order(&tree, 0, 5);
__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
__radix_tree_lookup(&tree, 0, &node, NULL);
assert(node->count == node->exceptional * 2);
radix_tree_delete(&tree, 1 << 5);
assert(node->exceptional == 0);
__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
assert(node->count == node->exceptional * 2);
__radix_tree_replace(&tree, node, slot, NULL, NULL);
assert(node->exceptional == 0);
item_kill_tree(&tree);
item_kill_tree(xa);
}
bool stop_iteration = false;
@ -645,68 +187,45 @@ static void *creator_func(void *ptr)
static void *iterator_func(void *ptr)
{
struct radix_tree_root *tree = ptr;
struct radix_tree_iter iter;
XA_STATE(xas, ptr, 0);
struct item *item;
void **slot;
while (!stop_iteration) {
rcu_read_lock();
radix_tree_for_each_slot(slot, tree, &iter, 0) {
item = radix_tree_deref_slot(slot);
if (!item)
xas_for_each(&xas, item, ULONG_MAX) {
if (xas_retry(&xas, item))
continue;
if (radix_tree_deref_retry(item)) {
slot = radix_tree_iter_retry(&iter);
continue;
}
item_sanity(item, iter.index);
item_sanity(item, xas.xa_index);
}
rcu_read_unlock();
}
return NULL;
}
static void multiorder_iteration_race(void)
static void multiorder_iteration_race(struct xarray *xa)
{
const int num_threads = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t worker_thread[num_threads];
RADIX_TREE(tree, GFP_KERNEL);
int i;
pthread_create(&worker_thread[0], NULL, &creator_func, &tree);
pthread_create(&worker_thread[0], NULL, &creator_func, xa);
for (i = 1; i < num_threads; i++)
pthread_create(&worker_thread[i], NULL, &iterator_func, &tree);
pthread_create(&worker_thread[i], NULL, &iterator_func, xa);
for (i = 0; i < num_threads; i++)
pthread_join(worker_thread[i], NULL);
item_kill_tree(&tree);
item_kill_tree(xa);
}
static DEFINE_XARRAY(array);
void multiorder_checks(void)
{
int i;
for (i = 0; i < 20; i++) {
multiorder_check(200, i);
multiorder_check(0, i);
multiorder_check((1UL << i) + 1, i);
}
for (i = 0; i < 15; i++)
multiorder_shrink((1UL << (i + RADIX_TREE_MAP_SHIFT)), i);
multiorder_insert_bug();
multiorder_tag_tests();
multiorder_iteration();
multiorder_tagged_iteration();
multiorder_join();
multiorder_split();
multiorder_account();
multiorder_iteration_race();
multiorder_iteration(&array);
multiorder_tagged_iteration(&array);
multiorder_iteration_race(&array);
radix_tree_cpu_dead(0);
}

View File

@ -44,7 +44,6 @@
#include "regression.h"
static RADIX_TREE(mt_tree, GFP_KERNEL);
static pthread_mutex_t mt_lock = PTHREAD_MUTEX_INITIALIZER;
struct page {
pthread_mutex_t lock;
@ -53,12 +52,12 @@ struct page {
unsigned long index;
};
static struct page *page_alloc(void)
static struct page *page_alloc(int index)
{
struct page *p;
p = malloc(sizeof(struct page));
p->count = 1;
p->index = 1;
p->index = index;
pthread_mutex_init(&p->lock, NULL);
return p;
@ -80,53 +79,33 @@ static void page_free(struct page *p)
static unsigned find_get_pages(unsigned long start,
unsigned int nr_pages, struct page **pages)
{
unsigned int i;
unsigned int ret;
unsigned int nr_found;
XA_STATE(xas, &mt_tree, start);
struct page *page;
unsigned int ret = 0;
rcu_read_lock();
restart:
nr_found = radix_tree_gang_lookup_slot(&mt_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
for (i = 0; i < nr_found; i++) {
struct page *page;
repeat:
page = radix_tree_deref_slot((void **)pages[i]);
if (unlikely(!page))
xas_for_each(&xas, page, ULONG_MAX) {
if (xas_retry(&xas, page))
continue;
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page)) {
/*
* Transient condition which can only trigger
* when entry at index 0 moves out of or back
* to root: none yet gotten, safe to restart.
*/
assert((start | i) == 0);
goto restart;
}
/*
* No exceptional entries are inserted in this test.
*/
assert(0);
}
pthread_mutex_lock(&page->lock);
if (!page->count) {
pthread_mutex_unlock(&page->lock);
goto repeat;
}
if (!page->count)
goto unlock;
/* don't actually update page refcount */
pthread_mutex_unlock(&page->lock);
/* Has the page moved? */
if (unlikely(page != *((void **)pages[i]))) {
goto repeat;
}
if (unlikely(page != xas_reload(&xas)))
goto put_page;
pages[ret] = page;
ret++;
continue;
unlock:
pthread_mutex_unlock(&page->lock);
put_page:
xas_reset(&xas);
}
rcu_read_unlock();
return ret;
@ -145,30 +124,30 @@ static void *regression1_fn(void *arg)
for (j = 0; j < 1000000; j++) {
struct page *p;
p = page_alloc();
pthread_mutex_lock(&mt_lock);
p = page_alloc(0);
xa_lock(&mt_tree);
radix_tree_insert(&mt_tree, 0, p);
pthread_mutex_unlock(&mt_lock);
xa_unlock(&mt_tree);
p = page_alloc();
pthread_mutex_lock(&mt_lock);
p = page_alloc(1);
xa_lock(&mt_tree);
radix_tree_insert(&mt_tree, 1, p);
pthread_mutex_unlock(&mt_lock);
xa_unlock(&mt_tree);
pthread_mutex_lock(&mt_lock);
xa_lock(&mt_tree);
p = radix_tree_delete(&mt_tree, 1);
pthread_mutex_lock(&p->lock);
p->count--;
pthread_mutex_unlock(&p->lock);
pthread_mutex_unlock(&mt_lock);
xa_unlock(&mt_tree);
page_free(p);
pthread_mutex_lock(&mt_lock);
xa_lock(&mt_tree);
p = radix_tree_delete(&mt_tree, 0);
pthread_mutex_lock(&p->lock);
p->count--;
pthread_mutex_unlock(&p->lock);
pthread_mutex_unlock(&mt_lock);
xa_unlock(&mt_tree);
page_free(p);
}
} else {

View File

@ -53,9 +53,9 @@
#include "regression.h"
#include "test.h"
#define PAGECACHE_TAG_DIRTY 0
#define PAGECACHE_TAG_WRITEBACK 1
#define PAGECACHE_TAG_TOWRITE 2
#define PAGECACHE_TAG_DIRTY XA_MARK_0
#define PAGECACHE_TAG_WRITEBACK XA_MARK_1
#define PAGECACHE_TAG_TOWRITE XA_MARK_2
static RADIX_TREE(mt_tree, GFP_KERNEL);
unsigned long page_count = 0;
@ -92,7 +92,7 @@ void regression2_test(void)
/* 1. */
start = 0;
end = max_slots - 2;
tag_tagged_items(&mt_tree, NULL, start, end, 1,
tag_tagged_items(&mt_tree, start, end, 1,
PAGECACHE_TAG_DIRTY, PAGECACHE_TAG_TOWRITE);
/* 2. */

View File

@ -69,21 +69,6 @@ void regression3_test(void)
continue;
}
}
radix_tree_delete(&root, 1);
first = true;
radix_tree_for_each_contig(slot, &root, &iter, 0) {
printv(2, "contig %ld %p\n", iter.index, *slot);
if (first) {
radix_tree_insert(&root, 1, ptr);
first = false;
}
if (radix_tree_deref_retry(*slot)) {
printv(2, "retry at %ld\n", iter.index);
slot = radix_tree_iter_retry(&iter);
continue;
}
}
radix_tree_for_each_slot(slot, &root, &iter, 0) {
printv(2, "slot %ld %p\n", iter.index, *slot);
@ -93,14 +78,6 @@ void regression3_test(void)
}
}
radix_tree_for_each_contig(slot, &root, &iter, 0) {
printv(2, "contig %ld %p\n", iter.index, *slot);
if (!iter.index) {
printv(2, "next at %ld\n", iter.index);
slot = radix_tree_iter_resume(slot, &iter);
}
}
radix_tree_tag_set(&root, 0, 0);
radix_tree_tag_set(&root, 1, 0);
radix_tree_for_each_tagged(slot, &root, &iter, 0, 0) {

View File

@ -24,7 +24,7 @@ __simple_checks(struct radix_tree_root *tree, unsigned long index, int tag)
item_tag_set(tree, index, tag);
ret = item_tag_get(tree, index, tag);
assert(ret != 0);
ret = tag_tagged_items(tree, NULL, first, ~0UL, 10, tag, !tag);
ret = tag_tagged_items(tree, first, ~0UL, 10, tag, !tag);
assert(ret == 1);
ret = item_tag_get(tree, index, !tag);
assert(ret != 0);
@ -321,7 +321,7 @@ static void single_check(void)
assert(ret == 0);
verify_tag_consistency(&tree, 0);
verify_tag_consistency(&tree, 1);
ret = tag_tagged_items(&tree, NULL, first, 10, 10, 0, 1);
ret = tag_tagged_items(&tree, first, 10, 10, XA_MARK_0, XA_MARK_1);
assert(ret == 1);
ret = radix_tree_gang_lookup_tag(&tree, (void **)items, 0, BATCH, 1);
assert(ret == 1);
@ -331,34 +331,6 @@ static void single_check(void)
item_kill_tree(&tree);
}
void radix_tree_clear_tags_test(void)
{
unsigned long index;
struct radix_tree_node *node;
struct radix_tree_iter iter;
void **slot;
RADIX_TREE(tree, GFP_KERNEL);
item_insert(&tree, 0);
item_tag_set(&tree, 0, 0);
__radix_tree_lookup(&tree, 0, &node, &slot);
radix_tree_clear_tags(&tree, node, slot);
assert(item_tag_get(&tree, 0, 0) == 0);
for (index = 0; index < 1000; index++) {
item_insert(&tree, index);
item_tag_set(&tree, index, 0);
}
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_clear_tags(&tree, iter.node, slot);
assert(item_tag_get(&tree, iter.index, 0) == 0);
}
item_kill_tree(&tree);
}
void tag_check(void)
{
single_check();
@ -376,5 +348,4 @@ void tag_check(void)
thrash_tags();
rcu_barrier();
printv(2, "after thrash_tags: %d allocated\n", nr_allocated);
radix_tree_clear_tags_test();
}

View File

@ -25,11 +25,6 @@ int item_tag_get(struct radix_tree_root *root, unsigned long index, int tag)
return radix_tree_tag_get(root, index, tag);
}
int __item_insert(struct radix_tree_root *root, struct item *item)
{
return __radix_tree_insert(root, item->index, item->order, item);
}
struct item *item_create(unsigned long index, unsigned int order)
{
struct item *ret = malloc(sizeof(*ret));
@ -39,21 +34,15 @@ struct item *item_create(unsigned long index, unsigned int order)
return ret;
}
int item_insert_order(struct radix_tree_root *root, unsigned long index,
unsigned order)
int item_insert(struct radix_tree_root *root, unsigned long index)
{
struct item *item = item_create(index, order);
int err = __item_insert(root, item);
struct item *item = item_create(index, 0);
int err = radix_tree_insert(root, item->index, item);
if (err)
free(item);
return err;
}
int item_insert(struct radix_tree_root *root, unsigned long index)
{
return item_insert_order(root, index, 0);
}
void item_sanity(struct item *item, unsigned long index)
{
unsigned long mask;
@ -63,16 +52,21 @@ void item_sanity(struct item *item, unsigned long index)
assert((item->index | mask) == (index | mask));
}
void item_free(struct item *item, unsigned long index)
{
item_sanity(item, index);
free(item);
}
int item_delete(struct radix_tree_root *root, unsigned long index)
{
struct item *item = radix_tree_delete(root, index);
if (item) {
item_sanity(item, index);
free(item);
return 1;
}
return 0;
if (!item)
return 0;
item_free(item, index);
return 1;
}
static void item_free_rcu(struct rcu_head *head)
@ -82,9 +76,9 @@ static void item_free_rcu(struct rcu_head *head)
free(item);
}
int item_delete_rcu(struct radix_tree_root *root, unsigned long index)
int item_delete_rcu(struct xarray *xa, unsigned long index)
{
struct item *item = radix_tree_delete(root, index);
struct item *item = xa_erase(xa, index);
if (item) {
item_sanity(item, index);
@ -176,61 +170,32 @@ void item_full_scan(struct radix_tree_root *root, unsigned long start,
}
/* Use the same pattern as tag_pages_for_writeback() in mm/page-writeback.c */
int tag_tagged_items(struct radix_tree_root *root, pthread_mutex_t *lock,
unsigned long start, unsigned long end, unsigned batch,
unsigned iftag, unsigned thentag)
int tag_tagged_items(struct xarray *xa, unsigned long start, unsigned long end,
unsigned batch, xa_mark_t iftag, xa_mark_t thentag)
{
unsigned long tagged = 0;
struct radix_tree_iter iter;
void **slot;
XA_STATE(xas, xa, start);
unsigned int tagged = 0;
struct item *item;
if (batch == 0)
batch = 1;
if (lock)
pthread_mutex_lock(lock);
radix_tree_for_each_tagged(slot, root, &iter, start, iftag) {
if (iter.index > end)
break;
radix_tree_iter_tag_set(root, &iter, thentag);
tagged++;
if ((tagged % batch) != 0)
xas_lock_irq(&xas);
xas_for_each_marked(&xas, item, end, iftag) {
xas_set_mark(&xas, thentag);
if (++tagged % batch)
continue;
slot = radix_tree_iter_resume(slot, &iter);
if (lock) {
pthread_mutex_unlock(lock);
rcu_barrier();
pthread_mutex_lock(lock);
}
xas_pause(&xas);
xas_unlock_irq(&xas);
rcu_barrier();
xas_lock_irq(&xas);
}
if (lock)
pthread_mutex_unlock(lock);
xas_unlock_irq(&xas);
return tagged;
}
/* Use the same pattern as find_swap_entry() in mm/shmem.c */
unsigned long find_item(struct radix_tree_root *root, void *item)
{
struct radix_tree_iter iter;
void **slot;
unsigned long found = -1;
unsigned long checked = 0;
radix_tree_for_each_slot(slot, root, &iter, 0) {
if (*slot == item) {
found = iter.index;
break;
}
checked++;
if ((checked % 4) != 0)
continue;
slot = radix_tree_iter_resume(slot, &iter);
}
return found;
}
static int verify_node(struct radix_tree_node *slot, unsigned int tag,
int tagged)
{
@ -281,43 +246,31 @@ static int verify_node(struct radix_tree_node *slot, unsigned int tag,
void verify_tag_consistency(struct radix_tree_root *root, unsigned int tag)
{
struct radix_tree_node *node = root->rnode;
struct radix_tree_node *node = root->xa_head;
if (!radix_tree_is_internal_node(node))
return;
verify_node(node, tag, !!root_tag_get(root, tag));
}
void item_kill_tree(struct radix_tree_root *root)
void item_kill_tree(struct xarray *xa)
{
struct radix_tree_iter iter;
void **slot;
struct item *items[32];
int nfound;
XA_STATE(xas, xa, 0);
void *entry;
radix_tree_for_each_slot(slot, root, &iter, 0) {
if (radix_tree_exceptional_entry(*slot))
radix_tree_delete(root, iter.index);
}
while ((nfound = radix_tree_gang_lookup(root, (void **)items, 0, 32))) {
int i;
for (i = 0; i < nfound; i++) {
void *ret;
ret = radix_tree_delete(root, items[i]->index);
assert(ret == items[i]);
free(items[i]);
xas_for_each(&xas, entry, ULONG_MAX) {
if (!xa_is_value(entry)) {
item_free(entry, xas.xa_index);
}
xas_store(&xas, NULL);
}
assert(radix_tree_gang_lookup(root, (void **)items, 0, 32) == 0);
assert(root->rnode == NULL);
assert(xa_empty(xa));
}
void tree_verify_min_height(struct radix_tree_root *root, int maxindex)
{
unsigned shift;
struct radix_tree_node *node = root->rnode;
struct radix_tree_node *node = root->xa_head;
if (!radix_tree_is_internal_node(node)) {
assert(maxindex == 0);
return;

View File

@ -11,13 +11,11 @@ struct item {
};
struct item *item_create(unsigned long index, unsigned int order);
int __item_insert(struct radix_tree_root *root, struct item *item);
int item_insert(struct radix_tree_root *root, unsigned long index);
void item_sanity(struct item *item, unsigned long index);
int item_insert_order(struct radix_tree_root *root, unsigned long index,
unsigned order);
void item_free(struct item *item, unsigned long index);
int item_delete(struct radix_tree_root *root, unsigned long index);
int item_delete_rcu(struct radix_tree_root *root, unsigned long index);
int item_delete_rcu(struct xarray *xa, unsigned long index);
struct item *item_lookup(struct radix_tree_root *root, unsigned long index);
void item_check_present(struct radix_tree_root *root, unsigned long index);
@ -29,11 +27,10 @@ void item_full_scan(struct radix_tree_root *root, unsigned long start,
unsigned long nr, int chunk);
void item_kill_tree(struct radix_tree_root *root);
int tag_tagged_items(struct radix_tree_root *, pthread_mutex_t *,
unsigned long start, unsigned long end, unsigned batch,
unsigned iftag, unsigned thentag);
unsigned long find_item(struct radix_tree_root *, void *item);
int tag_tagged_items(struct xarray *, unsigned long start, unsigned long end,
unsigned batch, xa_mark_t iftag, xa_mark_t thentag);
void xarray_tests(void);
void tag_check(void);
void multiorder_checks(void);
void iteration_test(unsigned order, unsigned duration);

View File

@ -0,0 +1,35 @@
// SPDX-License-Identifier: GPL-2.0+
/*
* xarray.c: Userspace shim for XArray test-suite
* Copyright (c) 2018 Matthew Wilcox <willy@infradead.org>
*/
#define XA_DEBUG
#include "test.h"
#define module_init(x)
#define module_exit(x)
#define MODULE_AUTHOR(x)
#define MODULE_LICENSE(x)
#define dump_stack() assert(0)
#include "../../../lib/xarray.c"
#undef XA_DEBUG
#include "../../../lib/test_xarray.c"
void xarray_tests(void)
{
xarray_checks();
xarray_exit();
}
int __weak main(void)
{
radix_tree_init();
xarray_tests();
radix_tree_cpu_dead(1);
rcu_barrier();
if (nr_allocated)
printf("nr_allocated = %d\n", nr_allocated);
return 0;
}