mirror of https://gitee.com/openkylin/linux.git
It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8 raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF 3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY= =D0eO -----END PGP SIGNATURE----- Merge tag 'docs-5.3' of git://git.lwn.net/linux Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
This commit is contained in:
commit
e9a83bd232
|
@ -137,7 +137,8 @@ Description: Discover cpuidle policy and mechanism
|
||||||
current_governor: (RW) displays current idle policy. Users can
|
current_governor: (RW) displays current idle policy. Users can
|
||||||
switch the governor at runtime by writing to this file.
|
switch the governor at runtime by writing to this file.
|
||||||
|
|
||||||
See files in Documentation/cpuidle/ for more information.
|
See Documentation/admin-guide/pm/cpuidle.rst and
|
||||||
|
Documentation/driver-api/pm/cpuidle.rst for more information.
|
||||||
|
|
||||||
|
|
||||||
What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/name
|
What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/name
|
||||||
|
|
|
@ -11,4 +11,4 @@ Description:
|
||||||
example would be, if User A has shares = 1024 and user
|
example would be, if User A has shares = 1024 and user
|
||||||
B has shares = 2048, User B will get twice the CPU
|
B has shares = 2048, User B will get twice the CPU
|
||||||
bandwidth user A will. For more details refer
|
bandwidth user A will. For more details refer
|
||||||
Documentation/scheduler/sched-design-CFS.txt
|
Documentation/scheduler/sched-design-CFS.rst
|
||||||
|
|
|
@ -198,7 +198,7 @@ call to set the mask to the value returned.
|
||||||
::
|
::
|
||||||
|
|
||||||
size_t
|
size_t
|
||||||
dma_direct_max_mapping_size(struct device *dev);
|
dma_max_mapping_size(struct device *dev);
|
||||||
|
|
||||||
Returns the maximum size of a mapping for the device. The size parameter
|
Returns the maximum size of a mapping for the device. The size parameter
|
||||||
of the mapping functions like dma_map_single(), dma_map_page() and
|
of the mapping functions like dma_map_single(), dma_map_page() and
|
||||||
|
|
|
@ -1,3 +1,9 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
====
|
||||||
|
EDID
|
||||||
|
====
|
||||||
|
|
||||||
In the good old days when graphics parameters were configured explicitly
|
In the good old days when graphics parameters were configured explicitly
|
||||||
in a file called xorg.conf, even broken hardware could be managed.
|
in a file called xorg.conf, even broken hardware could be managed.
|
||||||
|
|
||||||
|
@ -34,16 +40,19 @@ Makefile. Please note that the EDID data structure expects the timing
|
||||||
values in a different way as compared to the standard X11 format.
|
values in a different way as compared to the standard X11 format.
|
||||||
|
|
||||||
X11:
|
X11:
|
||||||
HTimings: hdisp hsyncstart hsyncend htotal
|
HTimings:
|
||||||
VTimings: vdisp vsyncstart vsyncend vtotal
|
hdisp hsyncstart hsyncend htotal
|
||||||
|
VTimings:
|
||||||
|
vdisp vsyncstart vsyncend vtotal
|
||||||
|
|
||||||
EDID:
|
EDID::
|
||||||
#define XPIX hdisp
|
|
||||||
#define XBLANK htotal-hdisp
|
|
||||||
#define XOFFSET hsyncstart-hdisp
|
|
||||||
#define XPULSE hsyncend-hsyncstart
|
|
||||||
|
|
||||||
#define YPIX vdisp
|
#define XPIX hdisp
|
||||||
#define YBLANK vtotal-vdisp
|
#define XBLANK htotal-hdisp
|
||||||
#define YOFFSET vsyncstart-vdisp
|
#define XOFFSET hsyncstart-hdisp
|
||||||
#define YPULSE vsyncend-vsyncstart
|
#define XPULSE hsyncend-hsyncstart
|
||||||
|
|
||||||
|
#define YPIX vdisp
|
||||||
|
#define YBLANK vtotal-vdisp
|
||||||
|
#define YOFFSET vsyncstart-vdisp
|
||||||
|
#define YPULSE vsyncend-vsyncstart
|
|
@ -0,0 +1,13 @@
|
||||||
|
config WARN_MISSING_DOCUMENTS
|
||||||
|
|
||||||
|
bool "Warn if there's a missing documentation file"
|
||||||
|
depends on COMPILE_TEST
|
||||||
|
help
|
||||||
|
It is not uncommon that a document gets renamed.
|
||||||
|
This option makes the Kernel to check for missing dependencies,
|
||||||
|
warning when something is missing. Works only if the Kernel
|
||||||
|
is built from a git tree.
|
||||||
|
|
||||||
|
If unsure, select 'N'.
|
||||||
|
|
||||||
|
|
|
@ -4,6 +4,11 @@
|
||||||
|
|
||||||
subdir-y := devicetree/bindings/
|
subdir-y := devicetree/bindings/
|
||||||
|
|
||||||
|
# Check for broken documentation file references
|
||||||
|
ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y)
|
||||||
|
$(shell $(srctree)/scripts/documentation-file-ref-check --warn)
|
||||||
|
endif
|
||||||
|
|
||||||
# You can set these variables from the command line.
|
# You can set these variables from the command line.
|
||||||
SPHINXBUILD = sphinx-build
|
SPHINXBUILD = sphinx-build
|
||||||
SPHINXOPTS =
|
SPHINXOPTS =
|
||||||
|
@ -23,11 +28,13 @@ ifeq ($(HAVE_SPHINX),0)
|
||||||
.DEFAULT:
|
.DEFAULT:
|
||||||
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
|
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
|
||||||
@echo
|
@echo
|
||||||
@./scripts/sphinx-pre-install
|
@$(srctree)/scripts/sphinx-pre-install
|
||||||
@echo " SKIP Sphinx $@ target."
|
@echo " SKIP Sphinx $@ target."
|
||||||
|
|
||||||
else # HAVE_SPHINX
|
else # HAVE_SPHINX
|
||||||
|
|
||||||
|
export SPHINXOPTS = $(shell perl -e 'open IN,"sphinx-build --version 2>&1 |"; while (<IN>) { if (m/([\d\.]+)/) { print "-jauto" if ($$1 >= "1.7") } ;} close IN')
|
||||||
|
|
||||||
# User-friendly check for pdflatex and latexmk
|
# User-friendly check for pdflatex and latexmk
|
||||||
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||||
HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
|
||||||
|
@ -70,12 +77,14 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
|
||||||
$(abspath $(BUILDDIR)/$3/$4)
|
$(abspath $(BUILDDIR)/$3/$4)
|
||||||
|
|
||||||
htmldocs:
|
htmldocs:
|
||||||
|
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||||
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
|
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
|
||||||
|
|
||||||
linkcheckdocs:
|
linkcheckdocs:
|
||||||
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
|
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
|
||||||
|
|
||||||
latexdocs:
|
latexdocs:
|
||||||
|
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||||
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
|
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
|
||||||
|
|
||||||
ifeq ($(HAVE_PDFLATEX),0)
|
ifeq ($(HAVE_PDFLATEX),0)
|
||||||
|
@ -87,14 +96,17 @@ pdfdocs:
|
||||||
else # HAVE_PDFLATEX
|
else # HAVE_PDFLATEX
|
||||||
|
|
||||||
pdfdocs: latexdocs
|
pdfdocs: latexdocs
|
||||||
|
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||||
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
|
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
|
||||||
|
|
||||||
endif # HAVE_PDFLATEX
|
endif # HAVE_PDFLATEX
|
||||||
|
|
||||||
epubdocs:
|
epubdocs:
|
||||||
|
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||||
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
|
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
|
||||||
|
|
||||||
xmldocs:
|
xmldocs:
|
||||||
|
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||||
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
|
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
|
||||||
|
|
||||||
endif # HAVE_SPHINX
|
endif # HAVE_SPHINX
|
||||||
|
|
|
@ -1,17 +1,19 @@
|
||||||
RCU on Uniprocessor Systems
|
.. _up_doc:
|
||||||
|
|
||||||
|
RCU on Uniprocessor Systems
|
||||||
|
===========================
|
||||||
|
|
||||||
A common misconception is that, on UP systems, the call_rcu() primitive
|
A common misconception is that, on UP systems, the call_rcu() primitive
|
||||||
may immediately invoke its function. The basis of this misconception
|
may immediately invoke its function. The basis of this misconception
|
||||||
is that since there is only one CPU, it should not be necessary to
|
is that since there is only one CPU, it should not be necessary to
|
||||||
wait for anything else to get done, since there are no other CPUs for
|
wait for anything else to get done, since there are no other CPUs for
|
||||||
anything else to be happening on. Although this approach will -sort- -of-
|
anything else to be happening on. Although this approach will *sort of*
|
||||||
work a surprising amount of the time, it is a very bad idea in general.
|
work a surprising amount of the time, it is a very bad idea in general.
|
||||||
This document presents three examples that demonstrate exactly how bad
|
This document presents three examples that demonstrate exactly how bad
|
||||||
an idea this is.
|
an idea this is.
|
||||||
|
|
||||||
|
|
||||||
Example 1: softirq Suicide
|
Example 1: softirq Suicide
|
||||||
|
--------------------------
|
||||||
|
|
||||||
Suppose that an RCU-based algorithm scans a linked list containing
|
Suppose that an RCU-based algorithm scans a linked list containing
|
||||||
elements A, B, and C in process context, and can delete elements from
|
elements A, B, and C in process context, and can delete elements from
|
||||||
|
@ -28,8 +30,8 @@ your kernel.
|
||||||
This same problem can occur if call_rcu() is invoked from a hardware
|
This same problem can occur if call_rcu() is invoked from a hardware
|
||||||
interrupt handler.
|
interrupt handler.
|
||||||
|
|
||||||
|
|
||||||
Example 2: Function-Call Fatality
|
Example 2: Function-Call Fatality
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
Of course, one could avert the suicide described in the preceding example
|
Of course, one could avert the suicide described in the preceding example
|
||||||
by having call_rcu() directly invoke its arguments only if it was called
|
by having call_rcu() directly invoke its arguments only if it was called
|
||||||
|
@ -46,11 +48,13 @@ its arguments would cause it to fail to make the fundamental guarantee
|
||||||
underlying RCU, namely that call_rcu() defers invoking its arguments until
|
underlying RCU, namely that call_rcu() defers invoking its arguments until
|
||||||
all RCU read-side critical sections currently executing have completed.
|
all RCU read-side critical sections currently executing have completed.
|
||||||
|
|
||||||
Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in
|
Quick Quiz #1:
|
||||||
this case?
|
Why is it *not* legal to invoke synchronize_rcu() in this case?
|
||||||
|
|
||||||
|
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
|
||||||
|
|
||||||
Example 3: Death by Deadlock
|
Example 3: Death by Deadlock
|
||||||
|
----------------------------
|
||||||
|
|
||||||
Suppose that call_rcu() is invoked while holding a lock, and that the
|
Suppose that call_rcu() is invoked while holding a lock, and that the
|
||||||
callback function must acquire this same lock. In this case, if
|
callback function must acquire this same lock. In this case, if
|
||||||
|
@ -76,25 +80,30 @@ there are cases where this can be quite ugly:
|
||||||
If call_rcu() directly invokes the callback, painful locking restrictions
|
If call_rcu() directly invokes the callback, painful locking restrictions
|
||||||
or API changes would be required.
|
or API changes would be required.
|
||||||
|
|
||||||
Quick Quiz #2: What locking restriction must RCU callbacks respect?
|
Quick Quiz #2:
|
||||||
|
What locking restriction must RCU callbacks respect?
|
||||||
|
|
||||||
|
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
|
||||||
|
|
||||||
Summary
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
Permitting call_rcu() to immediately invoke its arguments breaks RCU,
|
Permitting call_rcu() to immediately invoke its arguments breaks RCU,
|
||||||
even on a UP system. So do not do it! Even on a UP system, the RCU
|
even on a UP system. So do not do it! Even on a UP system, the RCU
|
||||||
infrastructure -must- respect grace periods, and -must- invoke callbacks
|
infrastructure *must* respect grace periods, and *must* invoke callbacks
|
||||||
from a known environment in which no locks are held.
|
from a known environment in which no locks are held.
|
||||||
|
|
||||||
Note that it -is- safe for synchronize_rcu() to return immediately on
|
Note that it *is* safe for synchronize_rcu() to return immediately on
|
||||||
UP systems, including !PREEMPT SMP builds running on UP systems.
|
UP systems, including PREEMPT SMP builds running on UP systems.
|
||||||
|
|
||||||
Quick Quiz #3: Why can't synchronize_rcu() return immediately on
|
Quick Quiz #3:
|
||||||
UP systems running preemptable RCU?
|
Why can't synchronize_rcu() return immediately on UP systems running
|
||||||
|
preemptable RCU?
|
||||||
|
|
||||||
|
.. _answer_quick_quiz_up:
|
||||||
|
|
||||||
Answer to Quick Quiz #1:
|
Answer to Quick Quiz #1:
|
||||||
Why is it -not- legal to invoke synchronize_rcu() in this case?
|
Why is it *not* legal to invoke synchronize_rcu() in this case?
|
||||||
|
|
||||||
Because the calling function is scanning an RCU-protected linked
|
Because the calling function is scanning an RCU-protected linked
|
||||||
list, and is therefore within an RCU read-side critical section.
|
list, and is therefore within an RCU read-side critical section.
|
||||||
|
@ -104,12 +113,13 @@ Answer to Quick Quiz #1:
|
||||||
Answer to Quick Quiz #2:
|
Answer to Quick Quiz #2:
|
||||||
What locking restriction must RCU callbacks respect?
|
What locking restriction must RCU callbacks respect?
|
||||||
|
|
||||||
Any lock that is acquired within an RCU callback must be
|
Any lock that is acquired within an RCU callback must be acquired
|
||||||
acquired elsewhere using an _irq variant of the spinlock
|
elsewhere using an _bh variant of the spinlock primitive.
|
||||||
primitive. For example, if "mylock" is acquired by an
|
For example, if "mylock" is acquired by an RCU callback, then
|
||||||
RCU callback, then a process-context acquisition of this
|
a process-context acquisition of this lock must use something
|
||||||
lock must use something like spin_lock_irqsave() to
|
like spin_lock_bh() to acquire the lock. Please note that
|
||||||
acquire the lock.
|
it is also OK to use _irq variants of spinlocks, for example,
|
||||||
|
spin_lock_irqsave().
|
||||||
|
|
||||||
If the process-context code were to simply use spin_lock(),
|
If the process-context code were to simply use spin_lock(),
|
||||||
then, since RCU callbacks can be invoked from softirq context,
|
then, since RCU callbacks can be invoked from softirq context,
|
||||||
|
@ -119,7 +129,7 @@ Answer to Quick Quiz #2:
|
||||||
|
|
||||||
This restriction might seem gratuitous, since very few RCU
|
This restriction might seem gratuitous, since very few RCU
|
||||||
callbacks acquire locks directly. However, a great many RCU
|
callbacks acquire locks directly. However, a great many RCU
|
||||||
callbacks do acquire locks -indirectly-, for example, via
|
callbacks do acquire locks *indirectly*, for example, via
|
||||||
the kfree() primitive.
|
the kfree() primitive.
|
||||||
|
|
||||||
Answer to Quick Quiz #3:
|
Answer to Quick Quiz #3:
|
|
@ -0,0 +1,19 @@
|
||||||
|
.. _rcu_concepts:
|
||||||
|
|
||||||
|
============
|
||||||
|
RCU concepts
|
||||||
|
============
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
rcu
|
||||||
|
listRCU
|
||||||
|
UP
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,5 +1,7 @@
|
||||||
Using RCU to Protect Read-Mostly Linked Lists
|
.. _list_rcu_doc:
|
||||||
|
|
||||||
|
Using RCU to Protect Read-Mostly Linked Lists
|
||||||
|
=============================================
|
||||||
|
|
||||||
One of the best applications of RCU is to protect read-mostly linked lists
|
One of the best applications of RCU is to protect read-mostly linked lists
|
||||||
("struct list_head" in list.h). One big advantage of this approach
|
("struct list_head" in list.h). One big advantage of this approach
|
||||||
|
@ -7,8 +9,8 @@ is that all of the required memory barriers are included for you in
|
||||||
the list macros. This document describes several applications of RCU,
|
the list macros. This document describes several applications of RCU,
|
||||||
with the best fits first.
|
with the best fits first.
|
||||||
|
|
||||||
|
|
||||||
Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
|
Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
The best applications are cases where, if reader-writer locking were
|
The best applications are cases where, if reader-writer locking were
|
||||||
used, the read-side lock would be dropped before taking any action
|
used, the read-side lock would be dropped before taking any action
|
||||||
|
@ -24,7 +26,7 @@ added or deleted, rather than being modified in place.
|
||||||
|
|
||||||
A straightforward example of this use of RCU may be found in the
|
A straightforward example of this use of RCU may be found in the
|
||||||
system-call auditing support. For example, a reader-writer locked
|
system-call auditing support. For example, a reader-writer locked
|
||||||
implementation of audit_filter_task() might be as follows:
|
implementation of audit_filter_task() might be as follows::
|
||||||
|
|
||||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||||
{
|
{
|
||||||
|
@ -48,7 +50,7 @@ the corresponding value is returned. By the time that this value is acted
|
||||||
on, the list may well have been modified. This makes sense, since if
|
on, the list may well have been modified. This makes sense, since if
|
||||||
you are turning auditing off, it is OK to audit a few extra system calls.
|
you are turning auditing off, it is OK to audit a few extra system calls.
|
||||||
|
|
||||||
This means that RCU can be easily applied to the read side, as follows:
|
This means that RCU can be easily applied to the read side, as follows::
|
||||||
|
|
||||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||||
{
|
{
|
||||||
|
@ -73,7 +75,7 @@ become list_for_each_entry_rcu(). The _rcu() list-traversal primitives
|
||||||
insert the read-side memory barriers that are required on DEC Alpha CPUs.
|
insert the read-side memory barriers that are required on DEC Alpha CPUs.
|
||||||
|
|
||||||
The changes to the update side are also straightforward. A reader-writer
|
The changes to the update side are also straightforward. A reader-writer
|
||||||
lock might be used as follows for deletion and insertion:
|
lock might be used as follows for deletion and insertion::
|
||||||
|
|
||||||
static inline int audit_del_rule(struct audit_rule *rule,
|
static inline int audit_del_rule(struct audit_rule *rule,
|
||||||
struct list_head *list)
|
struct list_head *list)
|
||||||
|
@ -106,7 +108,7 @@ lock might be used as follows for deletion and insertion:
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
Following are the RCU equivalents for these two functions:
|
Following are the RCU equivalents for these two functions::
|
||||||
|
|
||||||
static inline int audit_del_rule(struct audit_rule *rule,
|
static inline int audit_del_rule(struct audit_rule *rule,
|
||||||
struct list_head *list)
|
struct list_head *list)
|
||||||
|
@ -154,13 +156,13 @@ otherwise cause concurrent readers to fail spectacularly.
|
||||||
So, when readers can tolerate stale data and when entries are either added
|
So, when readers can tolerate stale data and when entries are either added
|
||||||
or deleted, without in-place modification, it is very easy to use RCU!
|
or deleted, without in-place modification, it is very easy to use RCU!
|
||||||
|
|
||||||
|
|
||||||
Example 2: Handling In-Place Updates
|
Example 2: Handling In-Place Updates
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
The system-call auditing code does not update auditing rules in place.
|
The system-call auditing code does not update auditing rules in place.
|
||||||
However, if it did, reader-writer-locked code to do so might look as
|
However, if it did, reader-writer-locked code to do so might look as
|
||||||
follows (presumably, the field_count is only permitted to decrease,
|
follows (presumably, the field_count is only permitted to decrease,
|
||||||
otherwise, the added fields would need to be filled in):
|
otherwise, the added fields would need to be filled in)::
|
||||||
|
|
||||||
static inline int audit_upd_rule(struct audit_rule *rule,
|
static inline int audit_upd_rule(struct audit_rule *rule,
|
||||||
struct list_head *list,
|
struct list_head *list,
|
||||||
|
@ -187,7 +189,7 @@ otherwise, the added fields would need to be filled in):
|
||||||
The RCU version creates a copy, updates the copy, then replaces the old
|
The RCU version creates a copy, updates the copy, then replaces the old
|
||||||
entry with the newly updated entry. This sequence of actions, allowing
|
entry with the newly updated entry. This sequence of actions, allowing
|
||||||
concurrent reads while doing a copy to perform an update, is what gives
|
concurrent reads while doing a copy to perform an update, is what gives
|
||||||
RCU ("read-copy update") its name. The RCU code is as follows:
|
RCU ("read-copy update") its name. The RCU code is as follows::
|
||||||
|
|
||||||
static inline int audit_upd_rule(struct audit_rule *rule,
|
static inline int audit_upd_rule(struct audit_rule *rule,
|
||||||
struct list_head *list,
|
struct list_head *list,
|
||||||
|
@ -216,8 +218,8 @@ RCU ("read-copy update") its name. The RCU code is as follows:
|
||||||
Again, this assumes that the caller holds audit_netlink_sem. Normally,
|
Again, this assumes that the caller holds audit_netlink_sem. Normally,
|
||||||
the reader-writer lock would become a spinlock in this sort of code.
|
the reader-writer lock would become a spinlock in this sort of code.
|
||||||
|
|
||||||
|
|
||||||
Example 3: Eliminating Stale Data
|
Example 3: Eliminating Stale Data
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
The auditing examples above tolerate stale data, as do most algorithms
|
The auditing examples above tolerate stale data, as do most algorithms
|
||||||
that are tracking external state. Because there is a delay from the
|
that are tracking external state. Because there is a delay from the
|
||||||
|
@ -231,13 +233,16 @@ per-entry spinlock, and, if the "deleted" flag is set, pretends that the
|
||||||
entry does not exist. For this to be helpful, the search function must
|
entry does not exist. For this to be helpful, the search function must
|
||||||
return holding the per-entry spinlock, as ipc_lock() does in fact do.
|
return holding the per-entry spinlock, as ipc_lock() does in fact do.
|
||||||
|
|
||||||
Quick Quiz: Why does the search function need to return holding the
|
Quick Quiz:
|
||||||
per-entry lock for this deleted-flag technique to be helpful?
|
Why does the search function need to return holding the per-entry lock for
|
||||||
|
this deleted-flag technique to be helpful?
|
||||||
|
|
||||||
|
:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
|
||||||
|
|
||||||
If the system-call audit module were to ever need to reject stale data,
|
If the system-call audit module were to ever need to reject stale data,
|
||||||
one way to accomplish this would be to add a "deleted" flag and a "lock"
|
one way to accomplish this would be to add a "deleted" flag and a "lock"
|
||||||
spinlock to the audit_entry structure, and modify audit_filter_task()
|
spinlock to the audit_entry structure, and modify audit_filter_task()
|
||||||
as follows:
|
as follows::
|
||||||
|
|
||||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||||
{
|
{
|
||||||
|
@ -268,7 +273,7 @@ audit_upd_rule() would need additional memory barriers to ensure
|
||||||
that the list_add_rcu() was really executed before the list_del_rcu().
|
that the list_add_rcu() was really executed before the list_del_rcu().
|
||||||
|
|
||||||
The audit_del_rule() function would need to set the "deleted"
|
The audit_del_rule() function would need to set the "deleted"
|
||||||
flag under the spinlock as follows:
|
flag under the spinlock as follows::
|
||||||
|
|
||||||
static inline int audit_del_rule(struct audit_rule *rule,
|
static inline int audit_del_rule(struct audit_rule *rule,
|
||||||
struct list_head *list)
|
struct list_head *list)
|
||||||
|
@ -290,8 +295,8 @@ flag under the spinlock as follows:
|
||||||
return -EFAULT; /* No matching rule */
|
return -EFAULT; /* No matching rule */
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
Summary
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
Read-mostly list-based data structures that can tolerate stale data are
|
Read-mostly list-based data structures that can tolerate stale data are
|
||||||
the most amenable to use of RCU. The simplest case is where entries are
|
the most amenable to use of RCU. The simplest case is where entries are
|
||||||
|
@ -302,8 +307,9 @@ If stale data cannot be tolerated, then a "deleted" flag may be used
|
||||||
in conjunction with a per-entry spinlock in order to allow the search
|
in conjunction with a per-entry spinlock in order to allow the search
|
||||||
function to reject newly deleted data.
|
function to reject newly deleted data.
|
||||||
|
|
||||||
|
.. _answer_quick_quiz_list:
|
||||||
|
|
||||||
Answer to Quick Quiz
|
Answer to Quick Quiz:
|
||||||
Why does the search function need to return holding the per-entry
|
Why does the search function need to return holding the per-entry
|
||||||
lock for this deleted-flag technique to be helpful?
|
lock for this deleted-flag technique to be helpful?
|
||||||
|
|
|
@ -0,0 +1,92 @@
|
||||||
|
.. _rcu_doc:
|
||||||
|
|
||||||
|
RCU Concepts
|
||||||
|
============
|
||||||
|
|
||||||
|
The basic idea behind RCU (read-copy update) is to split destructive
|
||||||
|
operations into two parts, one that prevents anyone from seeing the data
|
||||||
|
item being destroyed, and one that actually carries out the destruction.
|
||||||
|
A "grace period" must elapse between the two parts, and this grace period
|
||||||
|
must be long enough that any readers accessing the item being deleted have
|
||||||
|
since dropped their references. For example, an RCU-protected deletion
|
||||||
|
from a linked list would first remove the item from the list, wait for
|
||||||
|
a grace period to elapse, then free the element. See the
|
||||||
|
Documentation/RCU/listRCU.rst file for more information on using RCU with
|
||||||
|
linked lists.
|
||||||
|
|
||||||
|
Frequently Asked Questions
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
- Why would anyone want to use RCU?
|
||||||
|
|
||||||
|
The advantage of RCU's two-part approach is that RCU readers need
|
||||||
|
not acquire any locks, perform any atomic instructions, write to
|
||||||
|
shared memory, or (on CPUs other than Alpha) execute any memory
|
||||||
|
barriers. The fact that these operations are quite expensive
|
||||||
|
on modern CPUs is what gives RCU its performance advantages
|
||||||
|
in read-mostly situations. The fact that RCU readers need not
|
||||||
|
acquire locks can also greatly simplify deadlock-avoidance code.
|
||||||
|
|
||||||
|
- How can the updater tell when a grace period has completed
|
||||||
|
if the RCU readers give no indication when they are done?
|
||||||
|
|
||||||
|
Just as with spinlocks, RCU readers are not permitted to
|
||||||
|
block, switch to user-mode execution, or enter the idle loop.
|
||||||
|
Therefore, as soon as a CPU is seen passing through any of these
|
||||||
|
three states, we know that that CPU has exited any previous RCU
|
||||||
|
read-side critical sections. So, if we remove an item from a
|
||||||
|
linked list, and then wait until all CPUs have switched context,
|
||||||
|
executed in user mode, or executed in the idle loop, we can
|
||||||
|
safely free up that item.
|
||||||
|
|
||||||
|
Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
|
||||||
|
same effect, but require that the readers manipulate CPU-local
|
||||||
|
counters. These counters allow limited types of blocking within
|
||||||
|
RCU read-side critical sections. SRCU also uses CPU-local
|
||||||
|
counters, and permits general blocking within RCU read-side
|
||||||
|
critical sections. These variants of RCU detect grace periods
|
||||||
|
by sampling these counters.
|
||||||
|
|
||||||
|
- If I am running on a uniprocessor kernel, which can only do one
|
||||||
|
thing at a time, why should I wait for a grace period?
|
||||||
|
|
||||||
|
See the Documentation/RCU/UP.rst file for more information.
|
||||||
|
|
||||||
|
- How can I see where RCU is currently used in the Linux kernel?
|
||||||
|
|
||||||
|
Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
|
||||||
|
"rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
|
||||||
|
"srcu_read_unlock", "synchronize_rcu", "synchronize_net",
|
||||||
|
"synchronize_srcu", and the other RCU primitives. Or grab one
|
||||||
|
of the cscope databases from:
|
||||||
|
|
||||||
|
(http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
|
||||||
|
|
||||||
|
- What guidelines should I follow when writing code that uses RCU?
|
||||||
|
|
||||||
|
See the checklist.txt file in this directory.
|
||||||
|
|
||||||
|
- Why the name "RCU"?
|
||||||
|
|
||||||
|
"RCU" stands for "read-copy update". The file Documentation/RCU/listRCU.rst
|
||||||
|
has more information on where this name came from, search for
|
||||||
|
"read-copy update" to find it.
|
||||||
|
|
||||||
|
- I hear that RCU is patented? What is with that?
|
||||||
|
|
||||||
|
Yes, it is. There are several known patents related to RCU,
|
||||||
|
search for the string "Patent" in RTFP.txt to find them.
|
||||||
|
Of these, one was allowed to lapse by the assignee, and the
|
||||||
|
others have been contributed to the Linux kernel under GPL.
|
||||||
|
There are now also LGPL implementations of user-level RCU
|
||||||
|
available (http://liburcu.org/).
|
||||||
|
|
||||||
|
- I hear that RCU needs work in order to support realtime kernels?
|
||||||
|
|
||||||
|
Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
|
||||||
|
kernel configuration parameter.
|
||||||
|
|
||||||
|
- Where can I find more information on RCU?
|
||||||
|
|
||||||
|
See the RTFP.txt file in this directory.
|
||||||
|
Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
|
|
@ -1,89 +0,0 @@
|
||||||
RCU Concepts
|
|
||||||
|
|
||||||
|
|
||||||
The basic idea behind RCU (read-copy update) is to split destructive
|
|
||||||
operations into two parts, one that prevents anyone from seeing the data
|
|
||||||
item being destroyed, and one that actually carries out the destruction.
|
|
||||||
A "grace period" must elapse between the two parts, and this grace period
|
|
||||||
must be long enough that any readers accessing the item being deleted have
|
|
||||||
since dropped their references. For example, an RCU-protected deletion
|
|
||||||
from a linked list would first remove the item from the list, wait for
|
|
||||||
a grace period to elapse, then free the element. See the listRCU.txt
|
|
||||||
file for more information on using RCU with linked lists.
|
|
||||||
|
|
||||||
|
|
||||||
Frequently Asked Questions
|
|
||||||
|
|
||||||
o Why would anyone want to use RCU?
|
|
||||||
|
|
||||||
The advantage of RCU's two-part approach is that RCU readers need
|
|
||||||
not acquire any locks, perform any atomic instructions, write to
|
|
||||||
shared memory, or (on CPUs other than Alpha) execute any memory
|
|
||||||
barriers. The fact that these operations are quite expensive
|
|
||||||
on modern CPUs is what gives RCU its performance advantages
|
|
||||||
in read-mostly situations. The fact that RCU readers need not
|
|
||||||
acquire locks can also greatly simplify deadlock-avoidance code.
|
|
||||||
|
|
||||||
o How can the updater tell when a grace period has completed
|
|
||||||
if the RCU readers give no indication when they are done?
|
|
||||||
|
|
||||||
Just as with spinlocks, RCU readers are not permitted to
|
|
||||||
block, switch to user-mode execution, or enter the idle loop.
|
|
||||||
Therefore, as soon as a CPU is seen passing through any of these
|
|
||||||
three states, we know that that CPU has exited any previous RCU
|
|
||||||
read-side critical sections. So, if we remove an item from a
|
|
||||||
linked list, and then wait until all CPUs have switched context,
|
|
||||||
executed in user mode, or executed in the idle loop, we can
|
|
||||||
safely free up that item.
|
|
||||||
|
|
||||||
Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
|
|
||||||
same effect, but require that the readers manipulate CPU-local
|
|
||||||
counters. These counters allow limited types of blocking within
|
|
||||||
RCU read-side critical sections. SRCU also uses CPU-local
|
|
||||||
counters, and permits general blocking within RCU read-side
|
|
||||||
critical sections. These variants of RCU detect grace periods
|
|
||||||
by sampling these counters.
|
|
||||||
|
|
||||||
o If I am running on a uniprocessor kernel, which can only do one
|
|
||||||
thing at a time, why should I wait for a grace period?
|
|
||||||
|
|
||||||
See the UP.txt file in this directory.
|
|
||||||
|
|
||||||
o How can I see where RCU is currently used in the Linux kernel?
|
|
||||||
|
|
||||||
Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
|
|
||||||
"rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
|
|
||||||
"srcu_read_unlock", "synchronize_rcu", "synchronize_net",
|
|
||||||
"synchronize_srcu", and the other RCU primitives. Or grab one
|
|
||||||
of the cscope databases from:
|
|
||||||
|
|
||||||
http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html
|
|
||||||
|
|
||||||
o What guidelines should I follow when writing code that uses RCU?
|
|
||||||
|
|
||||||
See the checklist.txt file in this directory.
|
|
||||||
|
|
||||||
o Why the name "RCU"?
|
|
||||||
|
|
||||||
"RCU" stands for "read-copy update". The file listRCU.txt has
|
|
||||||
more information on where this name came from, search for
|
|
||||||
"read-copy update" to find it.
|
|
||||||
|
|
||||||
o I hear that RCU is patented? What is with that?
|
|
||||||
|
|
||||||
Yes, it is. There are several known patents related to RCU,
|
|
||||||
search for the string "Patent" in RTFP.txt to find them.
|
|
||||||
Of these, one was allowed to lapse by the assignee, and the
|
|
||||||
others have been contributed to the Linux kernel under GPL.
|
|
||||||
There are now also LGPL implementations of user-level RCU
|
|
||||||
available (http://liburcu.org/).
|
|
||||||
|
|
||||||
o I hear that RCU needs work in order to support realtime kernels?
|
|
||||||
|
|
||||||
Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
|
|
||||||
kernel configuration parameter.
|
|
||||||
|
|
||||||
o Where can I find more information on RCU?
|
|
||||||
|
|
||||||
See the RTFP.txt file in this directory.
|
|
||||||
Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
========================================================
|
========================================================
|
||||||
OpenCAPI (Open Coherent Accelerator Processor Interface)
|
OpenCAPI (Open Coherent Accelerator Processor Interface)
|
||||||
========================================================
|
========================================================
|
||||||
|
|
|
@ -96,4 +96,4 @@ where
|
||||||
<URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
|
<URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
|
||||||
referenced 2019-02-21.
|
referenced 2019-02-21.
|
||||||
|
|
||||||
[7] Documentation/acpi/dsd/data-node-reference.txt
|
[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
|
||||||
|
|
|
@ -227,7 +227,7 @@ Configuring the kernel
|
||||||
"make tinyconfig" Configure the tiniest possible kernel.
|
"make tinyconfig" Configure the tiniest possible kernel.
|
||||||
|
|
||||||
You can find more information on using the Linux kernel config tools
|
You can find more information on using the Linux kernel config tools
|
||||||
in Documentation/kbuild/kconfig.txt.
|
in Documentation/kbuild/kconfig.rst.
|
||||||
|
|
||||||
- NOTES on ``make config``:
|
- NOTES on ``make config``:
|
||||||
|
|
||||||
|
|
|
@ -90,7 +90,7 @@ the disk is not available then you have three options:
|
||||||
run a null modem to a second machine and capture the output there
|
run a null modem to a second machine and capture the output there
|
||||||
using your favourite communication program. Minicom works well.
|
using your favourite communication program. Minicom works well.
|
||||||
|
|
||||||
(3) Use Kdump (see Documentation/kdump/kdump.txt),
|
(3) Use Kdump (see Documentation/kdump/kdump.rst),
|
||||||
extract the kernel ring buffer from old memory with using dmesg
|
extract the kernel ring buffer from old memory with using dmesg
|
||||||
gdbmacro in Documentation/kdump/gdbmacros.txt.
|
gdbmacro in Documentation/kdump/gdbmacros.txt.
|
||||||
|
|
||||||
|
|
|
@ -9,5 +9,6 @@ are configurable at compile, boot or run time.
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
|
spectre
|
||||||
l1tf
|
l1tf
|
||||||
mds
|
mds
|
||||||
|
|
|
@ -0,0 +1,697 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
Spectre Side Channels
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Spectre is a class of side channel attacks that exploit branch prediction
|
||||||
|
and speculative execution on modern CPUs to read memory, possibly
|
||||||
|
bypassing access controls. Speculative execution side channel exploits
|
||||||
|
do not modify memory but attempt to infer privileged data in the memory.
|
||||||
|
|
||||||
|
This document covers Spectre variant 1 and Spectre variant 2.
|
||||||
|
|
||||||
|
Affected processors
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Speculative execution side channel methods affect a wide range of modern
|
||||||
|
high performance processors, since most modern high speed processors
|
||||||
|
use branch prediction and speculative execution.
|
||||||
|
|
||||||
|
The following CPUs are vulnerable:
|
||||||
|
|
||||||
|
- Intel Core, Atom, Pentium, and Xeon processors
|
||||||
|
|
||||||
|
- AMD Phenom, EPYC, and Zen processors
|
||||||
|
|
||||||
|
- IBM POWER and zSeries processors
|
||||||
|
|
||||||
|
- Higher end ARM processors
|
||||||
|
|
||||||
|
- Apple CPUs
|
||||||
|
|
||||||
|
- Higher end MIPS CPUs
|
||||||
|
|
||||||
|
- Likely most other high performance CPUs. Contact your CPU vendor for details.
|
||||||
|
|
||||||
|
Whether a processor is affected or not can be read out from the Spectre
|
||||||
|
vulnerability files in sysfs. See :ref:`spectre_sys_info`.
|
||||||
|
|
||||||
|
Related CVEs
|
||||||
|
------------
|
||||||
|
|
||||||
|
The following CVE entries describe Spectre variants:
|
||||||
|
|
||||||
|
============= ======================= =================
|
||||||
|
CVE-2017-5753 Bounds check bypass Spectre variant 1
|
||||||
|
CVE-2017-5715 Branch target injection Spectre variant 2
|
||||||
|
============= ======================= =================
|
||||||
|
|
||||||
|
Problem
|
||||||
|
-------
|
||||||
|
|
||||||
|
CPUs use speculative operations to improve performance. That may leave
|
||||||
|
traces of memory accesses or computations in the processor's caches,
|
||||||
|
buffers, and branch predictors. Malicious software may be able to
|
||||||
|
influence the speculative execution paths, and then use the side effects
|
||||||
|
of the speculative execution in the CPUs' caches and buffers to infer
|
||||||
|
privileged data touched during the speculative execution.
|
||||||
|
|
||||||
|
Spectre variant 1 attacks take advantage of speculative execution of
|
||||||
|
conditional branches, while Spectre variant 2 attacks use speculative
|
||||||
|
execution of indirect branches to leak privileged memory.
|
||||||
|
See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>`
|
||||||
|
:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
|
||||||
|
|
||||||
|
Spectre variant 1 (Bounds Check Bypass)
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
The bounds check bypass attack :ref:`[2] <spec_ref2>` takes advantage
|
||||||
|
of speculative execution that bypasses conditional branch instructions
|
||||||
|
used for memory access bounds check (e.g. checking if the index of an
|
||||||
|
array results in memory access within a valid range). This results in
|
||||||
|
memory accesses to invalid memory (with out-of-bound index) that are
|
||||||
|
done speculatively before validation checks resolve. Such speculative
|
||||||
|
memory accesses can leave side effects, creating side channels which
|
||||||
|
leak information to the attacker.
|
||||||
|
|
||||||
|
There are some extensions of Spectre variant 1 attacks for reading data
|
||||||
|
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
|
||||||
|
are difficult, low bandwidth, fragile, and are considered low risk.
|
||||||
|
|
||||||
|
Spectre variant 2 (Branch Target Injection)
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
The branch target injection attack takes advantage of speculative
|
||||||
|
execution of indirect branches :ref:`[3] <spec_ref3>`. The indirect
|
||||||
|
branch predictors inside the processor used to guess the target of
|
||||||
|
indirect branches can be influenced by an attacker, causing gadget code
|
||||||
|
to be speculatively executed, thus exposing sensitive data touched by
|
||||||
|
the victim. The side effects left in the CPU's caches during speculative
|
||||||
|
execution can be measured to infer data values.
|
||||||
|
|
||||||
|
.. _poison_btb:
|
||||||
|
|
||||||
|
In Spectre variant 2 attacks, the attacker can steer speculative indirect
|
||||||
|
branches in the victim to gadget code by poisoning the branch target
|
||||||
|
buffer of a CPU used for predicting indirect branch addresses. Such
|
||||||
|
poisoning could be done by indirect branching into existing code,
|
||||||
|
with the address offset of the indirect branch under the attacker's
|
||||||
|
control. Since the branch prediction on impacted hardware does not
|
||||||
|
fully disambiguate branch address and uses the offset for prediction,
|
||||||
|
this could cause privileged code's indirect branch to jump to a gadget
|
||||||
|
code with the same offset.
|
||||||
|
|
||||||
|
The most useful gadgets take an attacker-controlled input parameter (such
|
||||||
|
as a register value) so that the memory read can be controlled. Gadgets
|
||||||
|
without input parameters might be possible, but the attacker would have
|
||||||
|
very little control over what memory can be read, reducing the risk of
|
||||||
|
the attack revealing useful data.
|
||||||
|
|
||||||
|
One other variant 2 attack vector is for the attacker to poison the
|
||||||
|
return stack buffer (RSB) :ref:`[13] <spec_ref13>` to cause speculative
|
||||||
|
subroutine return instruction execution to go to a gadget. An attacker's
|
||||||
|
imbalanced subroutine call instructions might "poison" entries in the
|
||||||
|
return stack buffer which are later consumed by a victim's subroutine
|
||||||
|
return instructions. This attack can be mitigated by flushing the return
|
||||||
|
stack buffer on context switch, or virtual machine (VM) exit.
|
||||||
|
|
||||||
|
On systems with simultaneous multi-threading (SMT), attacks are possible
|
||||||
|
from the sibling thread, as level 1 cache and branch target buffer
|
||||||
|
(BTB) may be shared between hardware threads in a CPU core. A malicious
|
||||||
|
program running on the sibling thread may influence its peer's BTB to
|
||||||
|
steer its indirect branch speculations to gadget code, and measure the
|
||||||
|
speculative execution's side effects left in level 1 cache to infer the
|
||||||
|
victim's data.
|
||||||
|
|
||||||
|
Attack scenarios
|
||||||
|
----------------
|
||||||
|
|
||||||
|
The following list of attack scenarios have been anticipated, but may
|
||||||
|
not cover all possible attack vectors.
|
||||||
|
|
||||||
|
1. A user process attacking the kernel
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The attacker passes a parameter to the kernel via a register or
|
||||||
|
via a known address in memory during a syscall. Such parameter may
|
||||||
|
be used later by the kernel as an index to an array or to derive
|
||||||
|
a pointer for a Spectre variant 1 attack. The index or pointer
|
||||||
|
is invalid, but bound checks are bypassed in the code branch taken
|
||||||
|
for speculative execution. This could cause privileged memory to be
|
||||||
|
accessed and leaked.
|
||||||
|
|
||||||
|
For kernel code that has been identified where data pointers could
|
||||||
|
potentially be influenced for Spectre attacks, new "nospec" accessor
|
||||||
|
macros are used to prevent speculative loading of data.
|
||||||
|
|
||||||
|
Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
|
||||||
|
target buffer (BTB) before issuing syscall to launch an attack.
|
||||||
|
After entering the kernel, the kernel could use the poisoned branch
|
||||||
|
target buffer on indirect jump and jump to gadget code in speculative
|
||||||
|
execution.
|
||||||
|
|
||||||
|
If an attacker tries to control the memory addresses leaked during
|
||||||
|
speculative execution, he would also need to pass a parameter to the
|
||||||
|
gadget, either through a register or a known address in memory. After
|
||||||
|
the gadget has executed, he can measure the side effect.
|
||||||
|
|
||||||
|
The kernel can protect itself against consuming poisoned branch
|
||||||
|
target buffer entries by using return trampolines (also known as
|
||||||
|
"retpoline") :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` for all
|
||||||
|
indirect branches. Return trampolines trap speculative execution paths
|
||||||
|
to prevent jumping to gadget code during speculative execution.
|
||||||
|
x86 CPUs with Enhanced Indirect Branch Restricted Speculation
|
||||||
|
(Enhanced IBRS) available in hardware should use the feature to
|
||||||
|
mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is
|
||||||
|
more efficient than retpoline.
|
||||||
|
|
||||||
|
There may be gadget code in firmware which could be exploited with
|
||||||
|
Spectre variant 2 attack by a rogue user process. To mitigate such
|
||||||
|
attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature
|
||||||
|
is turned on before the kernel invokes any firmware code.
|
||||||
|
|
||||||
|
2. A user process attacking another user process
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
A malicious user process can try to attack another user process,
|
||||||
|
either via a context switch on the same hardware thread, or from the
|
||||||
|
sibling hyperthread sharing a physical processor core on simultaneous
|
||||||
|
multi-threading (SMT) system.
|
||||||
|
|
||||||
|
Spectre variant 1 attacks generally require passing parameters
|
||||||
|
between the processes, which needs a data passing relationship, such
|
||||||
|
as remote procedure calls (RPC). Those parameters are used in gadget
|
||||||
|
code to derive invalid data pointers accessing privileged memory in
|
||||||
|
the attacked process.
|
||||||
|
|
||||||
|
Spectre variant 2 attacks can be launched from a rogue process by
|
||||||
|
:ref:`poisoning <poison_btb>` the branch target buffer. This can
|
||||||
|
influence the indirect branch targets for a victim process that either
|
||||||
|
runs later on the same hardware thread, or running concurrently on
|
||||||
|
a sibling hardware thread sharing the same physical core.
|
||||||
|
|
||||||
|
A user process can protect itself against Spectre variant 2 attacks
|
||||||
|
by using the prctl() syscall to disable indirect branch speculation
|
||||||
|
for itself. An administrator can also cordon off an unsafe process
|
||||||
|
from polluting the branch target buffer by disabling the process's
|
||||||
|
indirect branch speculation. This comes with a performance cost
|
||||||
|
from not using indirect branch speculation and clearing the branch
|
||||||
|
target buffer. When SMT is enabled on x86, for a process that has
|
||||||
|
indirect branch speculation disabled, Single Threaded Indirect Branch
|
||||||
|
Predictors (STIBP) :ref:`[4] <spec_ref4>` are turned on to prevent the
|
||||||
|
sibling thread from controlling branch target buffer. In addition,
|
||||||
|
the Indirect Branch Prediction Barrier (IBPB) is issued to clear the
|
||||||
|
branch target buffer when context switching to and from such process.
|
||||||
|
|
||||||
|
On x86, the return stack buffer is stuffed on context switch.
|
||||||
|
This prevents the branch target buffer from being used for branch
|
||||||
|
prediction when the return stack buffer underflows while switching to
|
||||||
|
a deeper call stack. Any poisoned entries in the return stack buffer
|
||||||
|
left by the previous process will also be cleared.
|
||||||
|
|
||||||
|
User programs should use address space randomization to make attacks
|
||||||
|
more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
|
||||||
|
|
||||||
|
3. A virtualized guest attacking the host
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The attack mechanism is similar to how user processes attack the
|
||||||
|
kernel. The kernel is entered via hyper-calls or other virtualization
|
||||||
|
exit paths.
|
||||||
|
|
||||||
|
For Spectre variant 1 attacks, rogue guests can pass parameters
|
||||||
|
(e.g. in registers) via hyper-calls to derive invalid pointers to
|
||||||
|
speculate into privileged memory after entering the kernel. For places
|
||||||
|
where such kernel code has been identified, nospec accessor macros
|
||||||
|
are used to stop speculative memory access.
|
||||||
|
|
||||||
|
For Spectre variant 2 attacks, rogue guests can :ref:`poison
|
||||||
|
<poison_btb>` the branch target buffer or return stack buffer, causing
|
||||||
|
the kernel to jump to gadget code in the speculative execution paths.
|
||||||
|
|
||||||
|
To mitigate variant 2, the host kernel can use return trampolines
|
||||||
|
for indirect branches to bypass the poisoned branch target buffer,
|
||||||
|
and flushing the return stack buffer on VM exit. This prevents rogue
|
||||||
|
guests from affecting indirect branching in the host kernel.
|
||||||
|
|
||||||
|
To protect host processes from rogue guests, host processes can have
|
||||||
|
indirect branch speculation disabled via prctl(). The branch target
|
||||||
|
buffer is cleared before context switching to such processes.
|
||||||
|
|
||||||
|
4. A virtualized guest attacking other guest
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
A rogue guest may attack another guest to get data accessible by the
|
||||||
|
other guest.
|
||||||
|
|
||||||
|
Spectre variant 1 attacks are possible if parameters can be passed
|
||||||
|
between guests. This may be done via mechanisms such as shared memory
|
||||||
|
or message passing. Such parameters could be used to derive data
|
||||||
|
pointers to privileged data in guest. The privileged data could be
|
||||||
|
accessed by gadget code in the victim's speculation paths.
|
||||||
|
|
||||||
|
Spectre variant 2 attacks can be launched from a rogue guest by
|
||||||
|
:ref:`poisoning <poison_btb>` the branch target buffer or the return
|
||||||
|
stack buffer. Such poisoned entries could be used to influence
|
||||||
|
speculation execution paths in the victim guest.
|
||||||
|
|
||||||
|
Linux kernel mitigates attacks to other guests running in the same
|
||||||
|
CPU hardware thread by flushing the return stack buffer on VM exit,
|
||||||
|
and clearing the branch target buffer before switching to a new guest.
|
||||||
|
|
||||||
|
If SMT is used, Spectre variant 2 attacks from an untrusted guest
|
||||||
|
in the sibling hyperthread can be mitigated by the administrator,
|
||||||
|
by turning off the unsafe guest's indirect branch speculation via
|
||||||
|
prctl(). A guest can also protect itself by turning on microcode
|
||||||
|
based mitigations (such as IBPB or STIBP on x86) within the guest.
|
||||||
|
|
||||||
|
.. _spectre_sys_info:
|
||||||
|
|
||||||
|
Spectre system information
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
The Linux kernel provides a sysfs interface to enumerate the current
|
||||||
|
mitigation status of the system for Spectre: whether the system is
|
||||||
|
vulnerable, and which mitigations are active.
|
||||||
|
|
||||||
|
The sysfs file showing Spectre variant 1 mitigation status is:
|
||||||
|
|
||||||
|
/sys/devices/system/cpu/vulnerabilities/spectre_v1
|
||||||
|
|
||||||
|
The possible values in this file are:
|
||||||
|
|
||||||
|
======================================= =================================
|
||||||
|
'Mitigation: __user pointer sanitation' Protection in kernel on a case by
|
||||||
|
case base with explicit pointer
|
||||||
|
sanitation.
|
||||||
|
======================================= =================================
|
||||||
|
|
||||||
|
However, the protections are put in place on a case by case basis,
|
||||||
|
and there is no guarantee that all possible attack vectors for Spectre
|
||||||
|
variant 1 are covered.
|
||||||
|
|
||||||
|
The spectre_v2 kernel file reports if the kernel has been compiled with
|
||||||
|
retpoline mitigation or if the CPU has hardware mitigation, and if the
|
||||||
|
CPU has support for additional process-specific mitigation.
|
||||||
|
|
||||||
|
This file also reports CPU features enabled by microcode to mitigate
|
||||||
|
attack between user processes:
|
||||||
|
|
||||||
|
1. Indirect Branch Prediction Barrier (IBPB) to add additional
|
||||||
|
isolation between processes of different users.
|
||||||
|
2. Single Thread Indirect Branch Predictors (STIBP) to add additional
|
||||||
|
isolation between CPU threads running on the same core.
|
||||||
|
|
||||||
|
These CPU features may impact performance when used and can be enabled
|
||||||
|
per process on a case-by-case base.
|
||||||
|
|
||||||
|
The sysfs file showing Spectre variant 2 mitigation status is:
|
||||||
|
|
||||||
|
/sys/devices/system/cpu/vulnerabilities/spectre_v2
|
||||||
|
|
||||||
|
The possible values in this file are:
|
||||||
|
|
||||||
|
- Kernel status:
|
||||||
|
|
||||||
|
==================================== =================================
|
||||||
|
'Not affected' The processor is not vulnerable
|
||||||
|
'Vulnerable' Vulnerable, no mitigation
|
||||||
|
'Mitigation: Full generic retpoline' Software-focused mitigation
|
||||||
|
'Mitigation: Full AMD retpoline' AMD-specific software mitigation
|
||||||
|
'Mitigation: Enhanced IBRS' Hardware-focused mitigation
|
||||||
|
==================================== =================================
|
||||||
|
|
||||||
|
- Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
|
||||||
|
used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
|
||||||
|
|
||||||
|
========== =============================================================
|
||||||
|
'IBRS_FW' Protection against user program attacks when calling firmware
|
||||||
|
========== =============================================================
|
||||||
|
|
||||||
|
- Indirect branch prediction barrier (IBPB) status for protection between
|
||||||
|
processes of different users. This feature can be controlled through
|
||||||
|
prctl() per process, or through kernel command line options. This is
|
||||||
|
an x86 only feature. For more details see below.
|
||||||
|
|
||||||
|
=================== ========================================================
|
||||||
|
'IBPB: disabled' IBPB unused
|
||||||
|
'IBPB: always-on' Use IBPB on all tasks
|
||||||
|
'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks
|
||||||
|
=================== ========================================================
|
||||||
|
|
||||||
|
- Single threaded indirect branch prediction (STIBP) status for protection
|
||||||
|
between different hyper threads. This feature can be controlled through
|
||||||
|
prctl per process, or through kernel command line options. This is x86
|
||||||
|
only feature. For more details see below.
|
||||||
|
|
||||||
|
==================== ========================================================
|
||||||
|
'STIBP: disabled' STIBP unused
|
||||||
|
'STIBP: forced' Use STIBP on all tasks
|
||||||
|
'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks
|
||||||
|
==================== ========================================================
|
||||||
|
|
||||||
|
- Return stack buffer (RSB) protection status:
|
||||||
|
|
||||||
|
============= ===========================================
|
||||||
|
'RSB filling' Protection of RSB on context switch enabled
|
||||||
|
============= ===========================================
|
||||||
|
|
||||||
|
Full mitigation might require a microcode update from the CPU
|
||||||
|
vendor. When the necessary microcode is not available, the kernel will
|
||||||
|
report vulnerability.
|
||||||
|
|
||||||
|
Turning on mitigation for Spectre variant 1 and Spectre variant 2
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
|
||||||
|
1. Kernel mitigation
|
||||||
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
For the Spectre variant 1, vulnerable kernel code (as determined
|
||||||
|
by code audit or scanning tools) is annotated on a case by case
|
||||||
|
basis to use nospec accessor macros for bounds clipping :ref:`[2]
|
||||||
|
<spec_ref2>` to avoid any usable disclosure gadgets. However, it may
|
||||||
|
not cover all attack vectors for Spectre variant 1.
|
||||||
|
|
||||||
|
For Spectre variant 2 mitigation, the compiler turns indirect calls or
|
||||||
|
jumps in the kernel into equivalent return trampolines (retpolines)
|
||||||
|
:ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
|
||||||
|
addresses. Speculative execution paths under retpolines are trapped
|
||||||
|
in an infinite loop to prevent any speculative execution jumping to
|
||||||
|
a gadget.
|
||||||
|
|
||||||
|
To turn on retpoline mitigation on a vulnerable CPU, the kernel
|
||||||
|
needs to be compiled with a gcc compiler that supports the
|
||||||
|
-mindirect-branch=thunk-extern -mindirect-branch-register options.
|
||||||
|
If the kernel is compiled with a Clang compiler, the compiler needs
|
||||||
|
to support -mretpoline-external-thunk option. The kernel config
|
||||||
|
CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with
|
||||||
|
the latest updated microcode.
|
||||||
|
|
||||||
|
On Intel Skylake-era systems the mitigation covers most, but not all,
|
||||||
|
cases. See :ref:`[3] <spec_ref3>` for more details.
|
||||||
|
|
||||||
|
On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced
|
||||||
|
IBRS on x86), retpoline is automatically disabled at run time.
|
||||||
|
|
||||||
|
The retpoline mitigation is turned on by default on vulnerable
|
||||||
|
CPUs. It can be forced on or off by the administrator
|
||||||
|
via the kernel command line and sysfs control files. See
|
||||||
|
:ref:`spectre_mitigation_control_command_line`.
|
||||||
|
|
||||||
|
On x86, indirect branch restricted speculation is turned on by default
|
||||||
|
before invoking any firmware code to prevent Spectre variant 2 exploits
|
||||||
|
using the firmware.
|
||||||
|
|
||||||
|
Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
|
||||||
|
and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
|
||||||
|
attacks on the kernel generally more difficult.
|
||||||
|
|
||||||
|
2. User program mitigation
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
User programs can mitigate Spectre variant 1 using LFENCE or "bounds
|
||||||
|
clipping". For more details see :ref:`[2] <spec_ref2>`.
|
||||||
|
|
||||||
|
For Spectre variant 2 mitigation, individual user programs
|
||||||
|
can be compiled with return trampolines for indirect branches.
|
||||||
|
This protects them from consuming poisoned entries in the branch
|
||||||
|
target buffer left by malicious software. Alternatively, the
|
||||||
|
programs can disable their indirect branch speculation via prctl()
|
||||||
|
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
|
||||||
|
On x86, this will turn on STIBP to guard against attacks from the
|
||||||
|
sibling thread when the user program is running, and use IBPB to
|
||||||
|
flush the branch target buffer when switching to/from the program.
|
||||||
|
|
||||||
|
Restricting indirect branch speculation on a user program will
|
||||||
|
also prevent the program from launching a variant 2 attack
|
||||||
|
on x86. All sand-boxed SECCOMP programs have indirect branch
|
||||||
|
speculation restricted by default. Administrators can change
|
||||||
|
that behavior via the kernel command line and sysfs control files.
|
||||||
|
See :ref:`spectre_mitigation_control_command_line`.
|
||||||
|
|
||||||
|
Programs that disable their indirect branch speculation will have
|
||||||
|
more overhead and run slower.
|
||||||
|
|
||||||
|
User programs should use address space randomization
|
||||||
|
(/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
|
||||||
|
difficult.
|
||||||
|
|
||||||
|
3. VM mitigation
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Within the kernel, Spectre variant 1 attacks from rogue guests are
|
||||||
|
mitigated on a case by case basis in VM exit paths. Vulnerable code
|
||||||
|
uses nospec accessor macros for "bounds clipping", to avoid any
|
||||||
|
usable disclosure gadgets. However, this may not cover all variant
|
||||||
|
1 attack vectors.
|
||||||
|
|
||||||
|
For Spectre variant 2 attacks from rogue guests to the kernel, the
|
||||||
|
Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of
|
||||||
|
poisoned entries in branch target buffer left by rogue guests. It also
|
||||||
|
flushes the return stack buffer on every VM exit to prevent a return
|
||||||
|
stack buffer underflow so poisoned branch target buffer could be used,
|
||||||
|
or attacker guests leaving poisoned entries in the return stack buffer.
|
||||||
|
|
||||||
|
To mitigate guest-to-guest attacks in the same CPU hardware thread,
|
||||||
|
the branch target buffer is sanitized by flushing before switching
|
||||||
|
to a new guest on a CPU.
|
||||||
|
|
||||||
|
The above mitigations are turned on by default on vulnerable CPUs.
|
||||||
|
|
||||||
|
To mitigate guest-to-guest attacks from sibling thread when SMT is
|
||||||
|
in use, an untrusted guest running in the sibling thread can have
|
||||||
|
its indirect branch speculation disabled by administrator via prctl().
|
||||||
|
|
||||||
|
The kernel also allows guests to use any microcode based mitigation
|
||||||
|
they choose to use (such as IBPB or STIBP on x86) to protect themselves.
|
||||||
|
|
||||||
|
.. _spectre_mitigation_control_command_line:
|
||||||
|
|
||||||
|
Mitigation control on the kernel command line
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
Spectre variant 2 mitigation can be disabled or force enabled at the
|
||||||
|
kernel command line.
|
||||||
|
|
||||||
|
nospectre_v2
|
||||||
|
|
||||||
|
[X86] Disable all mitigations for the Spectre variant 2
|
||||||
|
(indirect branch prediction) vulnerability. System may
|
||||||
|
allow data leaks with this option, which is equivalent
|
||||||
|
to spectre_v2=off.
|
||||||
|
|
||||||
|
|
||||||
|
spectre_v2=
|
||||||
|
|
||||||
|
[X86] Control mitigation of Spectre variant 2
|
||||||
|
(indirect branch speculation) vulnerability.
|
||||||
|
The default operation protects the kernel from
|
||||||
|
user space attacks.
|
||||||
|
|
||||||
|
on
|
||||||
|
unconditionally enable, implies
|
||||||
|
spectre_v2_user=on
|
||||||
|
off
|
||||||
|
unconditionally disable, implies
|
||||||
|
spectre_v2_user=off
|
||||||
|
auto
|
||||||
|
kernel detects whether your CPU model is
|
||||||
|
vulnerable
|
||||||
|
|
||||||
|
Selecting 'on' will, and 'auto' may, choose a
|
||||||
|
mitigation method at run time according to the
|
||||||
|
CPU, the available microcode, the setting of the
|
||||||
|
CONFIG_RETPOLINE configuration option, and the
|
||||||
|
compiler with which the kernel was built.
|
||||||
|
|
||||||
|
Selecting 'on' will also enable the mitigation
|
||||||
|
against user space to user space task attacks.
|
||||||
|
|
||||||
|
Selecting 'off' will disable both the kernel and
|
||||||
|
the user space protections.
|
||||||
|
|
||||||
|
Specific mitigations can also be selected manually:
|
||||||
|
|
||||||
|
retpoline
|
||||||
|
replace indirect branches
|
||||||
|
retpoline,generic
|
||||||
|
google's original retpoline
|
||||||
|
retpoline,amd
|
||||||
|
AMD-specific minimal thunk
|
||||||
|
|
||||||
|
Not specifying this option is equivalent to
|
||||||
|
spectre_v2=auto.
|
||||||
|
|
||||||
|
For user space mitigation:
|
||||||
|
|
||||||
|
spectre_v2_user=
|
||||||
|
|
||||||
|
[X86] Control mitigation of Spectre variant 2
|
||||||
|
(indirect branch speculation) vulnerability between
|
||||||
|
user space tasks
|
||||||
|
|
||||||
|
on
|
||||||
|
Unconditionally enable mitigations. Is
|
||||||
|
enforced by spectre_v2=on
|
||||||
|
|
||||||
|
off
|
||||||
|
Unconditionally disable mitigations. Is
|
||||||
|
enforced by spectre_v2=off
|
||||||
|
|
||||||
|
prctl
|
||||||
|
Indirect branch speculation is enabled,
|
||||||
|
but mitigation can be enabled via prctl
|
||||||
|
per thread. The mitigation control state
|
||||||
|
is inherited on fork.
|
||||||
|
|
||||||
|
prctl,ibpb
|
||||||
|
Like "prctl" above, but only STIBP is
|
||||||
|
controlled per thread. IBPB is issued
|
||||||
|
always when switching between different user
|
||||||
|
space processes.
|
||||||
|
|
||||||
|
seccomp
|
||||||
|
Same as "prctl" above, but all seccomp
|
||||||
|
threads will enable the mitigation unless
|
||||||
|
they explicitly opt out.
|
||||||
|
|
||||||
|
seccomp,ibpb
|
||||||
|
Like "seccomp" above, but only STIBP is
|
||||||
|
controlled per thread. IBPB is issued
|
||||||
|
always when switching between different
|
||||||
|
user space processes.
|
||||||
|
|
||||||
|
auto
|
||||||
|
Kernel selects the mitigation depending on
|
||||||
|
the available CPU features and vulnerability.
|
||||||
|
|
||||||
|
Default mitigation:
|
||||||
|
If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
|
||||||
|
|
||||||
|
Not specifying this option is equivalent to
|
||||||
|
spectre_v2_user=auto.
|
||||||
|
|
||||||
|
In general the kernel by default selects
|
||||||
|
reasonable mitigations for the current CPU. To
|
||||||
|
disable Spectre variant 2 mitigations, boot with
|
||||||
|
spectre_v2=off. Spectre variant 1 mitigations
|
||||||
|
cannot be disabled.
|
||||||
|
|
||||||
|
Mitigation selection guide
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
1. Trusted userspace
|
||||||
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
If all userspace applications are from trusted sources and do not
|
||||||
|
execute externally supplied untrusted code, then the mitigations can
|
||||||
|
be disabled.
|
||||||
|
|
||||||
|
2. Protect sensitive programs
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
For security-sensitive programs that have secrets (e.g. crypto
|
||||||
|
keys), protection against Spectre variant 2 can be put in place by
|
||||||
|
disabling indirect branch speculation when the program is running
|
||||||
|
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
|
||||||
|
|
||||||
|
3. Sandbox untrusted programs
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Untrusted programs that could be a source of attacks can be cordoned
|
||||||
|
off by disabling their indirect branch speculation when they are run
|
||||||
|
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
|
||||||
|
This prevents untrusted programs from polluting the branch target
|
||||||
|
buffer. All programs running in SECCOMP sandboxes have indirect
|
||||||
|
branch speculation restricted by default. This behavior can be
|
||||||
|
changed via the kernel command line and sysfs control files. See
|
||||||
|
:ref:`spectre_mitigation_control_command_line`.
|
||||||
|
|
||||||
|
3. High security mode
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
All Spectre variant 2 mitigations can be forced on
|
||||||
|
at boot time for all programs (See the "on" option in
|
||||||
|
:ref:`spectre_mitigation_control_command_line`). This will add
|
||||||
|
overhead as indirect branch speculations for all programs will be
|
||||||
|
restricted.
|
||||||
|
|
||||||
|
On x86, branch target buffer will be flushed with IBPB when switching
|
||||||
|
to a new program. STIBP is left on all the time to protect programs
|
||||||
|
against variant 2 attacks originating from programs running on
|
||||||
|
sibling threads.
|
||||||
|
|
||||||
|
Alternatively, STIBP can be used only when running programs
|
||||||
|
whose indirect branch speculation is explicitly disabled,
|
||||||
|
while IBPB is still used all the time when switching to a new
|
||||||
|
program to clear the branch target buffer (See "ibpb" option in
|
||||||
|
:ref:`spectre_mitigation_control_command_line`). This "ibpb" option
|
||||||
|
has less performance cost than the "on" option, which leaves STIBP
|
||||||
|
on all the time.
|
||||||
|
|
||||||
|
References on Spectre
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
Intel white papers:
|
||||||
|
|
||||||
|
.. _spec_ref1:
|
||||||
|
|
||||||
|
[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_.
|
||||||
|
|
||||||
|
.. _spec_ref2:
|
||||||
|
|
||||||
|
[2] `Bounds check bypass <https://software.intel.com/security-software-guidance/software-guidance/bounds-check-bypass>`_.
|
||||||
|
|
||||||
|
.. _spec_ref3:
|
||||||
|
|
||||||
|
[3] `Deep dive: Retpoline: A branch target injection mitigation <https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-injection-mitigation>`_.
|
||||||
|
|
||||||
|
.. _spec_ref4:
|
||||||
|
|
||||||
|
[4] `Deep Dive: Single Thread Indirect Branch Predictors <https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors>`_.
|
||||||
|
|
||||||
|
AMD white papers:
|
||||||
|
|
||||||
|
.. _spec_ref5:
|
||||||
|
|
||||||
|
[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_.
|
||||||
|
|
||||||
|
.. _spec_ref6:
|
||||||
|
|
||||||
|
[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
|
||||||
|
|
||||||
|
ARM white papers:
|
||||||
|
|
||||||
|
.. _spec_ref7:
|
||||||
|
|
||||||
|
[7] `Cache speculation side-channels <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper>`_.
|
||||||
|
|
||||||
|
.. _spec_ref8:
|
||||||
|
|
||||||
|
[8] `Cache speculation issues update <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update>`_.
|
||||||
|
|
||||||
|
Google white paper:
|
||||||
|
|
||||||
|
.. _spec_ref9:
|
||||||
|
|
||||||
|
[9] `Retpoline: a software construct for preventing branch-target-injection <https://support.google.com/faqs/answer/7625886>`_.
|
||||||
|
|
||||||
|
MIPS white paper:
|
||||||
|
|
||||||
|
.. _spec_ref10:
|
||||||
|
|
||||||
|
[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
|
||||||
|
|
||||||
|
Academic papers:
|
||||||
|
|
||||||
|
.. _spec_ref11:
|
||||||
|
|
||||||
|
[11] `Spectre Attacks: Exploiting Speculative Execution <https://spectreattack.com/spectre.pdf>`_.
|
||||||
|
|
||||||
|
.. _spec_ref12:
|
||||||
|
|
||||||
|
[12] `NetSpectre: Read Arbitrary Memory over Network <https://arxiv.org/abs/1807.10535>`_.
|
||||||
|
|
||||||
|
.. _spec_ref13:
|
||||||
|
|
||||||
|
[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://www.usenix.org/system/files/conference/woot18/woot18-paper-koruyeh.pdf>`_.
|
|
@ -70,6 +70,7 @@ configure specific aspects of kernel behavior to your liking.
|
||||||
ras
|
ras
|
||||||
bcache
|
bcache
|
||||||
ext4
|
ext4
|
||||||
|
binderfs
|
||||||
pm/index
|
pm/index
|
||||||
thunderbolt
|
thunderbolt
|
||||||
LSM/index
|
LSM/index
|
||||||
|
|
|
@ -9,11 +9,11 @@ and sorted into English Dictionary order (defined as ignoring all
|
||||||
punctuation and sorting digits before letters in a case insensitive
|
punctuation and sorting digits before letters in a case insensitive
|
||||||
manner), and with descriptions where known.
|
manner), and with descriptions where known.
|
||||||
|
|
||||||
The kernel parses parameters from the kernel command line up to "--";
|
The kernel parses parameters from the kernel command line up to "``--``";
|
||||||
if it doesn't recognize a parameter and it doesn't contain a '.', the
|
if it doesn't recognize a parameter and it doesn't contain a '.', the
|
||||||
parameter gets passed to init: parameters with '=' go into init's
|
parameter gets passed to init: parameters with '=' go into init's
|
||||||
environment, others are passed as command line arguments to init.
|
environment, others are passed as command line arguments to init.
|
||||||
Everything after "--" is passed as an argument to init.
|
Everything after "``--``" is passed as an argument to init.
|
||||||
|
|
||||||
Module parameters can be specified in two ways: via the kernel command
|
Module parameters can be specified in two ways: via the kernel command
|
||||||
line with a module name prefix, or via modprobe, e.g.::
|
line with a module name prefix, or via modprobe, e.g.::
|
||||||
|
@ -167,7 +167,7 @@ parameter is applicable::
|
||||||
X86-32 X86-32, aka i386 architecture is enabled.
|
X86-32 X86-32, aka i386 architecture is enabled.
|
||||||
X86-64 X86-64 architecture is enabled.
|
X86-64 X86-64 architecture is enabled.
|
||||||
More X86-64 boot options can be found in
|
More X86-64 boot options can be found in
|
||||||
Documentation/x86/x86_64/boot-options.txt .
|
Documentation/x86/x86_64/boot-options.rst.
|
||||||
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
|
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
|
||||||
X86_UV SGI UV support is enabled.
|
X86_UV SGI UV support is enabled.
|
||||||
XEN Xen support is enabled
|
XEN Xen support is enabled
|
||||||
|
@ -181,10 +181,10 @@ In addition, the following text indicates that the option::
|
||||||
Parameters denoted with BOOT are actually interpreted by the boot
|
Parameters denoted with BOOT are actually interpreted by the boot
|
||||||
loader, and have no meaning to the kernel directly.
|
loader, and have no meaning to the kernel directly.
|
||||||
Do not modify the syntax of boot loader parameters without extreme
|
Do not modify the syntax of boot loader parameters without extreme
|
||||||
need or coordination with <Documentation/x86/boot.txt>.
|
need or coordination with <Documentation/x86/boot.rst>.
|
||||||
|
|
||||||
There are also arch-specific kernel-parameters not documented here.
|
There are also arch-specific kernel-parameters not documented here.
|
||||||
See for example <Documentation/x86/x86_64/boot-options.txt>.
|
See for example <Documentation/x86/x86_64/boot-options.rst>.
|
||||||
|
|
||||||
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
|
||||||
a trailing = on the name of any parameter states that that parameter will
|
a trailing = on the name of any parameter states that that parameter will
|
||||||
|
|
|
@ -53,7 +53,7 @@
|
||||||
ACPI_DEBUG_PRINT statements, e.g.,
|
ACPI_DEBUG_PRINT statements, e.g.,
|
||||||
ACPI_DEBUG_PRINT((ACPI_DB_INFO, ...
|
ACPI_DEBUG_PRINT((ACPI_DB_INFO, ...
|
||||||
The debug_level mask defaults to "info". See
|
The debug_level mask defaults to "info". See
|
||||||
Documentation/acpi/debug.txt for more information about
|
Documentation/firmware-guide/acpi/debug.rst for more information about
|
||||||
debug layers and levels.
|
debug layers and levels.
|
||||||
|
|
||||||
Enable processor driver info messages:
|
Enable processor driver info messages:
|
||||||
|
@ -708,14 +708,14 @@
|
||||||
[KNL, x86_64] select a region under 4G first, and
|
[KNL, x86_64] select a region under 4G first, and
|
||||||
fall back to reserve region above 4G when '@offset'
|
fall back to reserve region above 4G when '@offset'
|
||||||
hasn't been specified.
|
hasn't been specified.
|
||||||
See Documentation/kdump/kdump.txt for further details.
|
See Documentation/kdump/kdump.rst for further details.
|
||||||
|
|
||||||
crashkernel=range1:size1[,range2:size2,...][@offset]
|
crashkernel=range1:size1[,range2:size2,...][@offset]
|
||||||
[KNL] Same as above, but depends on the memory
|
[KNL] Same as above, but depends on the memory
|
||||||
in the running system. The syntax of range is
|
in the running system. The syntax of range is
|
||||||
start-[end] where start and end are both
|
start-[end] where start and end are both
|
||||||
a memory unit (amount[KMG]). See also
|
a memory unit (amount[KMG]). See also
|
||||||
Documentation/kdump/kdump.txt for an example.
|
Documentation/kdump/kdump.rst for an example.
|
||||||
|
|
||||||
crashkernel=size[KMG],high
|
crashkernel=size[KMG],high
|
||||||
[KNL, x86_64] range could be above 4G. Allow kernel
|
[KNL, x86_64] range could be above 4G. Allow kernel
|
||||||
|
@ -932,7 +932,7 @@
|
||||||
edid/1680x1050.bin, or edid/1920x1080.bin is given
|
edid/1680x1050.bin, or edid/1920x1080.bin is given
|
||||||
and no file with the same name exists. Details and
|
and no file with the same name exists. Details and
|
||||||
instructions how to build your own EDID data are
|
instructions how to build your own EDID data are
|
||||||
available in Documentation/EDID/HOWTO.txt. An EDID
|
available in Documentation/EDID/howto.rst. An EDID
|
||||||
data set will only be used for a particular connector,
|
data set will only be used for a particular connector,
|
||||||
if its name and a colon are prepended to the EDID
|
if its name and a colon are prepended to the EDID
|
||||||
name. Each connector may use a unique EDID data
|
name. Each connector may use a unique EDID data
|
||||||
|
@ -963,7 +963,7 @@
|
||||||
for details.
|
for details.
|
||||||
|
|
||||||
nompx [X86] Disables Intel Memory Protection Extensions.
|
nompx [X86] Disables Intel Memory Protection Extensions.
|
||||||
See Documentation/x86/intel_mpx.txt for more
|
See Documentation/x86/intel_mpx.rst for more
|
||||||
information about the feature.
|
information about the feature.
|
||||||
|
|
||||||
nopku [X86] Disable Memory Protection Keys CPU feature found
|
nopku [X86] Disable Memory Protection Keys CPU feature found
|
||||||
|
@ -1189,7 +1189,7 @@
|
||||||
that is to be dynamically loaded by Linux. If there are
|
that is to be dynamically loaded by Linux. If there are
|
||||||
multiple variables with the same name but with different
|
multiple variables with the same name but with different
|
||||||
vendor GUIDs, all of them will be loaded. See
|
vendor GUIDs, all of them will be loaded. See
|
||||||
Documentation/acpi/ssdt-overlays.txt for details.
|
Documentation/admin-guide/acpi/ssdt-overlays.rst for details.
|
||||||
|
|
||||||
|
|
||||||
eisa_irq_edge= [PARISC,HW]
|
eisa_irq_edge= [PARISC,HW]
|
||||||
|
@ -1209,7 +1209,7 @@
|
||||||
Specifies physical address of start of kernel core
|
Specifies physical address of start of kernel core
|
||||||
image elf header and optionally the size. Generally
|
image elf header and optionally the size. Generally
|
||||||
kexec loader will pass this option to capture kernel.
|
kexec loader will pass this option to capture kernel.
|
||||||
See Documentation/kdump/kdump.txt for details.
|
See Documentation/kdump/kdump.rst for details.
|
||||||
|
|
||||||
enable_mtrr_cleanup [X86]
|
enable_mtrr_cleanup [X86]
|
||||||
The kernel tries to adjust MTRR layout from continuous
|
The kernel tries to adjust MTRR layout from continuous
|
||||||
|
@ -1388,9 +1388,6 @@
|
||||||
Valid parameters: "on", "off"
|
Valid parameters: "on", "off"
|
||||||
Default: "on"
|
Default: "on"
|
||||||
|
|
||||||
hisax= [HW,ISDN]
|
|
||||||
See Documentation/isdn/README.HiSax.
|
|
||||||
|
|
||||||
hlt [BUGS=ARM,SH]
|
hlt [BUGS=ARM,SH]
|
||||||
|
|
||||||
hpet= [X86-32,HPET] option to control HPET usage
|
hpet= [X86-32,HPET] option to control HPET usage
|
||||||
|
@ -1507,7 +1504,7 @@
|
||||||
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
|
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
|
||||||
.vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
|
.vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
|
||||||
.cdrom .chs .ignore_cable are additional options
|
.cdrom .chs .ignore_cable are additional options
|
||||||
See Documentation/ide/ide.txt.
|
See Documentation/ide/ide.rst.
|
||||||
|
|
||||||
ide-generic.probe-mask= [HW] (E)IDE subsystem
|
ide-generic.probe-mask= [HW] (E)IDE subsystem
|
||||||
Format: <int>
|
Format: <int>
|
||||||
|
@ -2383,7 +2380,7 @@
|
||||||
|
|
||||||
mce [X86-32] Machine Check Exception
|
mce [X86-32] Machine Check Exception
|
||||||
|
|
||||||
mce=option [X86-64] See Documentation/x86/x86_64/boot-options.txt
|
mce=option [X86-64] See Documentation/x86/x86_64/boot-options.rst
|
||||||
|
|
||||||
md= [HW] RAID subsystems devices and level
|
md= [HW] RAID subsystems devices and level
|
||||||
See Documentation/admin-guide/md.rst.
|
See Documentation/admin-guide/md.rst.
|
||||||
|
@ -2439,7 +2436,7 @@
|
||||||
set according to the
|
set according to the
|
||||||
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
|
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
|
||||||
option.
|
option.
|
||||||
See Documentation/memory-hotplug.txt.
|
See Documentation/admin-guide/mm/memory-hotplug.rst.
|
||||||
|
|
||||||
memmap=exactmap [KNL,X86] Enable setting of an exact
|
memmap=exactmap [KNL,X86] Enable setting of an exact
|
||||||
E820 memory map, as specified by the user.
|
E820 memory map, as specified by the user.
|
||||||
|
@ -2528,7 +2525,7 @@
|
||||||
mem_encrypt=on: Activate SME
|
mem_encrypt=on: Activate SME
|
||||||
mem_encrypt=off: Do not activate SME
|
mem_encrypt=off: Do not activate SME
|
||||||
|
|
||||||
Refer to Documentation/x86/amd-memory-encryption.txt
|
Refer to Documentation/virtual/kvm/amd-memory-encryption.rst
|
||||||
for details on when memory encryption can be activated.
|
for details on when memory encryption can be activated.
|
||||||
|
|
||||||
mem_sleep_default= [SUSPEND] Default system suspend mode:
|
mem_sleep_default= [SUSPEND] Default system suspend mode:
|
||||||
|
@ -2836,8 +2833,9 @@
|
||||||
0 - turn hardlockup detector in nmi_watchdog off
|
0 - turn hardlockup detector in nmi_watchdog off
|
||||||
1 - turn hardlockup detector in nmi_watchdog on
|
1 - turn hardlockup detector in nmi_watchdog on
|
||||||
When panic is specified, panic when an NMI watchdog
|
When panic is specified, panic when an NMI watchdog
|
||||||
timeout occurs (or 'nopanic' to override the opposite
|
timeout occurs (or 'nopanic' to not panic on an NMI
|
||||||
default). To disable both hard and soft lockup detectors,
|
watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set)
|
||||||
|
To disable both hard and soft lockup detectors,
|
||||||
please see 'nowatchdog'.
|
please see 'nowatchdog'.
|
||||||
This is useful when you use a panic=... timeout and
|
This is useful when you use a panic=... timeout and
|
||||||
need the box quickly up again.
|
need the box quickly up again.
|
||||||
|
@ -3528,7 +3526,7 @@
|
||||||
See Documentation/blockdev/paride.txt.
|
See Documentation/blockdev/paride.txt.
|
||||||
|
|
||||||
pirq= [SMP,APIC] Manual mp-table setup
|
pirq= [SMP,APIC] Manual mp-table setup
|
||||||
See Documentation/x86/i386/IO-APIC.txt.
|
See Documentation/x86/i386/IO-APIC.rst.
|
||||||
|
|
||||||
plip= [PPT,NET] Parallel port network link
|
plip= [PPT,NET] Parallel port network link
|
||||||
Format: { parport<nr> | timid | 0 }
|
Format: { parport<nr> | timid | 0 }
|
||||||
|
@ -5032,7 +5030,7 @@
|
||||||
vector=percpu: enable percpu vector domain
|
vector=percpu: enable percpu vector domain
|
||||||
|
|
||||||
video= [FB] Frame buffer configuration
|
video= [FB] Frame buffer configuration
|
||||||
See Documentation/fb/modedb.txt.
|
See Documentation/fb/modedb.rst.
|
||||||
|
|
||||||
video.brightness_switch_enabled= [0,1]
|
video.brightness_switch_enabled= [0,1]
|
||||||
If set to 1, on receiving an ACPI notify event
|
If set to 1, on receiving an ACPI notify event
|
||||||
|
@ -5060,7 +5058,7 @@
|
||||||
Can be used multiple times for multiple devices.
|
Can be used multiple times for multiple devices.
|
||||||
|
|
||||||
vga= [BOOT,X86-32] Select a particular video mode
|
vga= [BOOT,X86-32] Select a particular video mode
|
||||||
See Documentation/x86/boot.txt and
|
See Documentation/x86/boot.rst and
|
||||||
Documentation/svga.txt.
|
Documentation/svga.txt.
|
||||||
Use vga=ask for menu.
|
Use vga=ask for menu.
|
||||||
This is actually a boot loader parameter; the value is
|
This is actually a boot loader parameter; the value is
|
||||||
|
@ -5167,7 +5165,7 @@
|
||||||
Default: 3 = cyan.
|
Default: 3 = cyan.
|
||||||
|
|
||||||
watchdog timers [HW,WDT] For information on watchdog timers,
|
watchdog timers [HW,WDT] For information on watchdog timers,
|
||||||
see Documentation/watchdog/watchdog-parameters.txt
|
see Documentation/watchdog/watchdog-parameters.rst
|
||||||
or other driver-specific files in the
|
or other driver-specific files in the
|
||||||
Documentation/watchdog/ directory.
|
Documentation/watchdog/ directory.
|
||||||
|
|
||||||
|
|
|
@ -165,5 +165,6 @@ write-through caching.
|
||||||
========
|
========
|
||||||
See Also
|
See Also
|
||||||
========
|
========
|
||||||
.. [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
|
|
||||||
Section 5.2.27
|
[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
|
||||||
|
- Section 5.2.27
|
||||||
|
|
|
@ -199,7 +199,7 @@ Architecture (MCA)\ [#f3]_.
|
||||||
mode).
|
mode).
|
||||||
|
|
||||||
.. [#f3] For more details about the Machine Check Architecture (MCA),
|
.. [#f3] For more details about the Machine Check Architecture (MCA),
|
||||||
please read Documentation/x86/x86_64/machinecheck at the Kernel tree.
|
please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree.
|
||||||
|
|
||||||
EDAC - Error Detection And Correction
|
EDAC - Error Detection And Correction
|
||||||
*************************************
|
*************************************
|
||||||
|
|
|
@ -1,3 +1,6 @@
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
ATA over Ethernet is a network protocol that provides simple access to
|
ATA over Ethernet is a network protocol that provides simple access to
|
||||||
block storage on the LAN.
|
block storage on the LAN.
|
||||||
|
|
||||||
|
@ -22,7 +25,8 @@ document the use of the driver and are not necessary if you install
|
||||||
the aoetools.
|
the aoetools.
|
||||||
|
|
||||||
|
|
||||||
CREATING DEVICE NODES
|
Creating Device Nodes
|
||||||
|
=====================
|
||||||
|
|
||||||
Users of udev should find the block device nodes created
|
Users of udev should find the block device nodes created
|
||||||
automatically, but to create all the necessary device nodes, use the
|
automatically, but to create all the necessary device nodes, use the
|
||||||
|
@ -38,7 +42,8 @@ CREATING DEVICE NODES
|
||||||
confusing when an AoE device is not present the first time the a
|
confusing when an AoE device is not present the first time the a
|
||||||
command is run but appears a second later.
|
command is run but appears a second later.
|
||||||
|
|
||||||
USING DEVICE NODES
|
Using Device Nodes
|
||||||
|
==================
|
||||||
|
|
||||||
"cat /dev/etherd/err" blocks, waiting for error diagnostic output,
|
"cat /dev/etherd/err" blocks, waiting for error diagnostic output,
|
||||||
like any retransmitted packets.
|
like any retransmitted packets.
|
||||||
|
@ -55,7 +60,7 @@ USING DEVICE NODES
|
||||||
by sysfs counterparts. Using the commands in aoetools insulates
|
by sysfs counterparts. Using the commands in aoetools insulates
|
||||||
users from these implementation details.
|
users from these implementation details.
|
||||||
|
|
||||||
The block devices are named like this:
|
The block devices are named like this::
|
||||||
|
|
||||||
e{shelf}.{slot}
|
e{shelf}.{slot}
|
||||||
e{shelf}.{slot}p{part}
|
e{shelf}.{slot}p{part}
|
||||||
|
@ -64,7 +69,8 @@ USING DEVICE NODES
|
||||||
first shelf (shelf address zero). That's the whole disk. The first
|
first shelf (shelf address zero). That's the whole disk. The first
|
||||||
partition on that disk would be "e0.2p1".
|
partition on that disk would be "e0.2p1".
|
||||||
|
|
||||||
USING SYSFS
|
Using sysfs
|
||||||
|
===========
|
||||||
|
|
||||||
Each aoe block device in /sys/block has the extra attributes of
|
Each aoe block device in /sys/block has the extra attributes of
|
||||||
state, mac, and netif. The state attribute is "up" when the device
|
state, mac, and netif. The state attribute is "up" when the device
|
||||||
|
@ -78,29 +84,29 @@ USING SYSFS
|
||||||
|
|
||||||
There is a script in this directory that formats this information in
|
There is a script in this directory that formats this information in
|
||||||
a convenient way. Users with aoetools should use the aoe-stat
|
a convenient way. Users with aoetools should use the aoe-stat
|
||||||
command.
|
command::
|
||||||
|
|
||||||
root@makki root# sh Documentation/aoe/status.sh
|
root@makki root# sh Documentation/aoe/status.sh
|
||||||
e10.0 eth3 up
|
e10.0 eth3 up
|
||||||
e10.1 eth3 up
|
e10.1 eth3 up
|
||||||
e10.2 eth3 up
|
e10.2 eth3 up
|
||||||
e10.3 eth3 up
|
e10.3 eth3 up
|
||||||
e10.4 eth3 up
|
e10.4 eth3 up
|
||||||
e10.5 eth3 up
|
e10.5 eth3 up
|
||||||
e10.6 eth3 up
|
e10.6 eth3 up
|
||||||
e10.7 eth3 up
|
e10.7 eth3 up
|
||||||
e10.8 eth3 up
|
e10.8 eth3 up
|
||||||
e10.9 eth3 up
|
e10.9 eth3 up
|
||||||
e4.0 eth1 up
|
e4.0 eth1 up
|
||||||
e4.1 eth1 up
|
e4.1 eth1 up
|
||||||
e4.2 eth1 up
|
e4.2 eth1 up
|
||||||
e4.3 eth1 up
|
e4.3 eth1 up
|
||||||
e4.4 eth1 up
|
e4.4 eth1 up
|
||||||
e4.5 eth1 up
|
e4.5 eth1 up
|
||||||
e4.6 eth1 up
|
e4.6 eth1 up
|
||||||
e4.7 eth1 up
|
e4.7 eth1 up
|
||||||
e4.8 eth1 up
|
e4.8 eth1 up
|
||||||
e4.9 eth1 up
|
e4.9 eth1 up
|
||||||
|
|
||||||
Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
|
Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
|
||||||
option discussed below) instead of /dev/etherd/interfaces to limit
|
option discussed below) instead of /dev/etherd/interfaces to limit
|
||||||
|
@ -113,12 +119,13 @@ USING SYSFS
|
||||||
for this purpose. You can also directly use the
|
for this purpose. You can also directly use the
|
||||||
/dev/etherd/discover special file described above.
|
/dev/etherd/discover special file described above.
|
||||||
|
|
||||||
DRIVER OPTIONS
|
Driver Options
|
||||||
|
==============
|
||||||
|
|
||||||
There is a boot option for the built-in aoe driver and a
|
There is a boot option for the built-in aoe driver and a
|
||||||
corresponding module parameter, aoe_iflist. Without this option,
|
corresponding module parameter, aoe_iflist. Without this option,
|
||||||
all network interfaces may be used for ATA over Ethernet. Here is a
|
all network interfaces may be used for ATA over Ethernet. Here is a
|
||||||
usage example for the module parameter.
|
usage example for the module parameter::
|
||||||
|
|
||||||
modprobe aoe_iflist="eth1 eth3"
|
modprobe aoe_iflist="eth1 eth3"
|
||||||
|
|
|
@ -0,0 +1,23 @@
|
||||||
|
Example of udev rules
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
.. include:: udev.txt
|
||||||
|
:literal:
|
||||||
|
|
||||||
|
Example of udev install rules script
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
.. literalinclude:: udev-install.sh
|
||||||
|
:language: shell
|
||||||
|
|
||||||
|
Example script to get status
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
.. literalinclude:: status.sh
|
||||||
|
:language: shell
|
||||||
|
|
||||||
|
Example of AoE autoload script
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
.. literalinclude:: autoload.sh
|
||||||
|
:language: shell
|
|
@ -0,0 +1,19 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
=======================
|
||||||
|
ATA over Ethernet (AoE)
|
||||||
|
=======================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
aoe
|
||||||
|
todo
|
||||||
|
examples
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,3 +1,6 @@
|
||||||
|
TODO
|
||||||
|
====
|
||||||
|
|
||||||
There is a potential for deadlock when allocating a struct sk_buff for
|
There is a potential for deadlock when allocating a struct sk_buff for
|
||||||
data that needs to be written out to aoe storage. If the data is
|
data that needs to be written out to aoe storage. If the data is
|
||||||
being written from a dirty page in order to free that page, and if
|
being written from a dirty page in order to free that page, and if
|
|
@ -11,7 +11,7 @@
|
||||||
# udev_rules="/etc/udev/rules.d/"
|
# udev_rules="/etc/udev/rules.d/"
|
||||||
# bash# ls /etc/udev/rules.d/
|
# bash# ls /etc/udev/rules.d/
|
||||||
# 10-wacom.rules 50-udev.rules
|
# 10-wacom.rules 50-udev.rules
|
||||||
# bash# cp /path/to/linux-2.6.xx/Documentation/aoe/udev.txt \
|
# bash# cp /path/to/linux/Documentation/aoe/udev.txt \
|
||||||
# /etc/udev/rules.d/60-aoe.rules
|
# /etc/udev/rules.d/60-aoe.rules
|
||||||
#
|
#
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
Too many problems poped up because of unnoticed misaligned memory access in
|
Too many problems popped up because of unnoticed misaligned memory access in
|
||||||
kernel code lately. Therefore the alignment fixup is now unconditionally
|
kernel code lately. Therefore the alignment fixup is now unconditionally
|
||||||
configured in for SA11x0 based targets. According to Alan Cox, this is a
|
configured in for SA11x0 based targets. According to Alan Cox, this is a
|
||||||
bad idea to configure it out, but Russell King has some good reasons for
|
bad idea to configure it out, but Russell King has some good reasons for
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
========================
|
========================
|
||||||
STM32 ARM Linux Overview
|
STM32 ARM Linux Overview
|
||||||
========================
|
========================
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
STM32F429 Overview
|
STM32F429 Overview
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
STM32F746 Overview
|
STM32F746 Overview
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
STM32F769 Overview
|
STM32F769 Overview
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
STM32H743 Overview
|
STM32H743 Overview
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
STM32MP157 Overview
|
STM32MP157 Overview
|
||||||
===================
|
===================
|
||||||
|
|
||||||
|
|
|
@ -1,5 +1,7 @@
|
||||||
|
===========
|
||||||
ACPI Tables
|
ACPI Tables
|
||||||
-----------
|
===========
|
||||||
|
|
||||||
The expectations of individual ACPI tables are discussed in the list that
|
The expectations of individual ACPI tables are discussed in the list that
|
||||||
follows.
|
follows.
|
||||||
|
|
||||||
|
@ -11,54 +13,71 @@ outside of the UEFI Forum (see Section 5.2.6 of the specification).
|
||||||
|
|
||||||
For ACPI on arm64, tables also fall into the following categories:
|
For ACPI on arm64, tables also fall into the following categories:
|
||||||
|
|
||||||
-- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT
|
- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT
|
||||||
|
|
||||||
-- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
|
- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
|
||||||
|
|
||||||
-- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT,
|
- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT,
|
||||||
MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO,
|
MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO,
|
||||||
TCPA, TPM2, UEFI, XENV
|
TCPA, TPM2, UEFI, XENV
|
||||||
|
|
||||||
-- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT,
|
- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT,
|
||||||
MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
|
MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
|
||||||
|
|
||||||
|
====== ========================================================================
|
||||||
Table Usage for ARMv8 Linux
|
Table Usage for ARMv8 Linux
|
||||||
----- ----------------------------------------------------------------
|
====== ========================================================================
|
||||||
BERT Section 18.3 (signature == "BERT")
|
BERT Section 18.3 (signature == "BERT")
|
||||||
== Boot Error Record Table ==
|
|
||||||
|
**Boot Error Record Table**
|
||||||
|
|
||||||
Must be supplied if RAS support is provided by the platform. It
|
Must be supplied if RAS support is provided by the platform. It
|
||||||
is recommended this table be supplied.
|
is recommended this table be supplied.
|
||||||
|
|
||||||
BOOT Signature Reserved (signature == "BOOT")
|
BOOT Signature Reserved (signature == "BOOT")
|
||||||
== simple BOOT flag table ==
|
|
||||||
|
**simple BOOT flag table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
BGRT Section 5.2.22 (signature == "BGRT")
|
BGRT Section 5.2.22 (signature == "BGRT")
|
||||||
== Boot Graphics Resource Table ==
|
|
||||||
|
**Boot Graphics Resource Table**
|
||||||
|
|
||||||
Optional, not currently supported, with no real use-case for an
|
Optional, not currently supported, with no real use-case for an
|
||||||
ARM server.
|
ARM server.
|
||||||
|
|
||||||
CPEP Section 5.2.18 (signature == "CPEP")
|
CPEP Section 5.2.18 (signature == "CPEP")
|
||||||
== Corrected Platform Error Polling table ==
|
|
||||||
|
**Corrected Platform Error Polling table**
|
||||||
|
|
||||||
Optional, not currently supported, and not recommended until such
|
Optional, not currently supported, and not recommended until such
|
||||||
time as ARM-compatible hardware is available, and the specification
|
time as ARM-compatible hardware is available, and the specification
|
||||||
suitably modified.
|
suitably modified.
|
||||||
|
|
||||||
CSRT Signature Reserved (signature == "CSRT")
|
CSRT Signature Reserved (signature == "CSRT")
|
||||||
== Core System Resources Table ==
|
|
||||||
|
**Core System Resources Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
DBG2 Signature Reserved (signature == "DBG2")
|
DBG2 Signature Reserved (signature == "DBG2")
|
||||||
== DeBuG port table 2 ==
|
|
||||||
|
**DeBuG port table 2**
|
||||||
|
|
||||||
License has changed and should be usable. Optional if used instead
|
License has changed and should be usable. Optional if used instead
|
||||||
of earlycon=<device> on the command line.
|
of earlycon=<device> on the command line.
|
||||||
|
|
||||||
DBGP Signature Reserved (signature == "DBGP")
|
DBGP Signature Reserved (signature == "DBGP")
|
||||||
== DeBuG Port table ==
|
|
||||||
|
**DeBuG Port table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
DSDT Section 5.2.11.1 (signature == "DSDT")
|
DSDT Section 5.2.11.1 (signature == "DSDT")
|
||||||
== Differentiated System Description Table ==
|
|
||||||
|
**Differentiated System Description Table**
|
||||||
|
|
||||||
A DSDT is required; see also SSDT.
|
A DSDT is required; see also SSDT.
|
||||||
|
|
||||||
ACPI tables contain only one DSDT but can contain one or more SSDTs,
|
ACPI tables contain only one DSDT but can contain one or more SSDTs,
|
||||||
|
@ -66,22 +85,30 @@ DSDT Section 5.2.11.1 (signature == "DSDT")
|
||||||
but cannot modify or replace anything in the DSDT.
|
but cannot modify or replace anything in the DSDT.
|
||||||
|
|
||||||
DMAR Signature Reserved (signature == "DMAR")
|
DMAR Signature Reserved (signature == "DMAR")
|
||||||
== DMA Remapping table ==
|
|
||||||
|
**DMA Remapping table**
|
||||||
|
|
||||||
x86 only table, will not be supported.
|
x86 only table, will not be supported.
|
||||||
|
|
||||||
DRTM Signature Reserved (signature == "DRTM")
|
DRTM Signature Reserved (signature == "DRTM")
|
||||||
== Dynamic Root of Trust for Measurement table ==
|
|
||||||
|
**Dynamic Root of Trust for Measurement table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
ECDT Section 5.2.16 (signature == "ECDT")
|
ECDT Section 5.2.16 (signature == "ECDT")
|
||||||
== Embedded Controller Description Table ==
|
|
||||||
|
**Embedded Controller Description Table**
|
||||||
|
|
||||||
Optional, not currently supported, but could be used on ARM if and
|
Optional, not currently supported, but could be used on ARM if and
|
||||||
only if one uses the GPE_BIT field to represent an IRQ number, since
|
only if one uses the GPE_BIT field to represent an IRQ number, since
|
||||||
there are no GPE blocks defined in hardware reduced mode. This would
|
there are no GPE blocks defined in hardware reduced mode. This would
|
||||||
need to be modified in the ACPI specification.
|
need to be modified in the ACPI specification.
|
||||||
|
|
||||||
EINJ Section 18.6 (signature == "EINJ")
|
EINJ Section 18.6 (signature == "EINJ")
|
||||||
== Error Injection table ==
|
|
||||||
|
**Error Injection table**
|
||||||
|
|
||||||
This table is very useful for testing platform response to error
|
This table is very useful for testing platform response to error
|
||||||
conditions; it allows one to inject an error into the system as
|
conditions; it allows one to inject an error into the system as
|
||||||
if it had actually occurred. However, this table should not be
|
if it had actually occurred. However, this table should not be
|
||||||
|
@ -89,27 +116,35 @@ EINJ Section 18.6 (signature == "EINJ")
|
||||||
and executed with the ACPICA tools only during testing.
|
and executed with the ACPICA tools only during testing.
|
||||||
|
|
||||||
ERST Section 18.5 (signature == "ERST")
|
ERST Section 18.5 (signature == "ERST")
|
||||||
== Error Record Serialization Table ==
|
|
||||||
|
**Error Record Serialization Table**
|
||||||
|
|
||||||
On a platform supports RAS, this table must be supplied if it is not
|
On a platform supports RAS, this table must be supplied if it is not
|
||||||
UEFI-based; if it is UEFI-based, this table may be supplied. When this
|
UEFI-based; if it is UEFI-based, this table may be supplied. When this
|
||||||
table is not present, UEFI run time service will be utilized to save
|
table is not present, UEFI run time service will be utilized to save
|
||||||
and retrieve hardware error information to and from a persistent store.
|
and retrieve hardware error information to and from a persistent store.
|
||||||
|
|
||||||
ETDT Signature Reserved (signature == "ETDT")
|
ETDT Signature Reserved (signature == "ETDT")
|
||||||
== Event Timer Description Table ==
|
|
||||||
|
**Event Timer Description Table**
|
||||||
|
|
||||||
Obsolete table, will not be supported.
|
Obsolete table, will not be supported.
|
||||||
|
|
||||||
FACS Section 5.2.10 (signature == "FACS")
|
FACS Section 5.2.10 (signature == "FACS")
|
||||||
== Firmware ACPI Control Structure ==
|
|
||||||
|
**Firmware ACPI Control Structure**
|
||||||
|
|
||||||
It is unlikely that this table will be terribly useful. If it is
|
It is unlikely that this table will be terribly useful. If it is
|
||||||
provided, the Global Lock will NOT be used since it is not part of
|
provided, the Global Lock will NOT be used since it is not part of
|
||||||
the hardware reduced profile, and only 64-bit address fields will
|
the hardware reduced profile, and only 64-bit address fields will
|
||||||
be considered valid.
|
be considered valid.
|
||||||
|
|
||||||
FADT Section 5.2.9 (signature == "FACP")
|
FADT Section 5.2.9 (signature == "FACP")
|
||||||
== Fixed ACPI Description Table ==
|
|
||||||
|
**Fixed ACPI Description Table**
|
||||||
Required for arm64.
|
Required for arm64.
|
||||||
|
|
||||||
|
|
||||||
The HW_REDUCED_ACPI flag must be set. All of the fields that are
|
The HW_REDUCED_ACPI flag must be set. All of the fields that are
|
||||||
to be ignored when HW_REDUCED_ACPI is set are expected to be set to
|
to be ignored when HW_REDUCED_ACPI is set are expected to be set to
|
||||||
zero.
|
zero.
|
||||||
|
@ -118,22 +153,28 @@ FADT Section 5.2.9 (signature == "FACP")
|
||||||
used, not FIRMWARE_CTRL.
|
used, not FIRMWARE_CTRL.
|
||||||
|
|
||||||
If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is
|
If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is
|
||||||
filled in properly -- that the PSCI_COMPLIANT flag is set and that
|
filled in properly - that the PSCI_COMPLIANT flag is set and that
|
||||||
PSCI_USE_HVC is set or unset as needed (see table 5-37).
|
PSCI_USE_HVC is set or unset as needed (see table 5-37).
|
||||||
|
|
||||||
For the DSDT that is also required, the X_DSDT field is to be used,
|
For the DSDT that is also required, the X_DSDT field is to be used,
|
||||||
not the DSDT field.
|
not the DSDT field.
|
||||||
|
|
||||||
FPDT Section 5.2.23 (signature == "FPDT")
|
FPDT Section 5.2.23 (signature == "FPDT")
|
||||||
== Firmware Performance Data Table ==
|
|
||||||
|
**Firmware Performance Data Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
GTDT Section 5.2.24 (signature == "GTDT")
|
GTDT Section 5.2.24 (signature == "GTDT")
|
||||||
== Generic Timer Description Table ==
|
|
||||||
|
**Generic Timer Description Table**
|
||||||
|
|
||||||
Required for arm64.
|
Required for arm64.
|
||||||
|
|
||||||
HEST Section 18.3.2 (signature == "HEST")
|
HEST Section 18.3.2 (signature == "HEST")
|
||||||
== Hardware Error Source Table ==
|
|
||||||
|
**Hardware Error Source Table**
|
||||||
|
|
||||||
ARM-specific error sources have been defined; please use those or the
|
ARM-specific error sources have been defined; please use those or the
|
||||||
PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER
|
PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER
|
||||||
Bridge), or use type 9 (Generic Hardware Error Source). Firmware first
|
Bridge), or use type 9 (Generic Hardware Error Source). Firmware first
|
||||||
|
@ -144,122 +185,174 @@ HEST Section 18.3.2 (signature == "HEST")
|
||||||
is recommended this table be supplied.
|
is recommended this table be supplied.
|
||||||
|
|
||||||
HPET Signature Reserved (signature == "HPET")
|
HPET Signature Reserved (signature == "HPET")
|
||||||
== High Precision Event timer Table ==
|
|
||||||
|
**High Precision Event timer Table**
|
||||||
|
|
||||||
x86 only table, will not be supported.
|
x86 only table, will not be supported.
|
||||||
|
|
||||||
IBFT Signature Reserved (signature == "IBFT")
|
IBFT Signature Reserved (signature == "IBFT")
|
||||||
== iSCSI Boot Firmware Table ==
|
|
||||||
|
**iSCSI Boot Firmware Table**
|
||||||
|
|
||||||
Microsoft defined table, support TBD.
|
Microsoft defined table, support TBD.
|
||||||
|
|
||||||
IORT Signature Reserved (signature == "IORT")
|
IORT Signature Reserved (signature == "IORT")
|
||||||
== Input Output Remapping Table ==
|
|
||||||
|
**Input Output Remapping Table**
|
||||||
|
|
||||||
arm64 only table, required in order to describe IO topology, SMMUs,
|
arm64 only table, required in order to describe IO topology, SMMUs,
|
||||||
and GIC ITSs, and how those various components are connected together,
|
and GIC ITSs, and how those various components are connected together,
|
||||||
such as identifying which components are behind which SMMUs/ITSs.
|
such as identifying which components are behind which SMMUs/ITSs.
|
||||||
This table will only be required on certain SBSA platforms (e.g.,
|
This table will only be required on certain SBSA platforms (e.g.,
|
||||||
when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it
|
when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it
|
||||||
remains optional.
|
remains optional.
|
||||||
|
|
||||||
IVRS Signature Reserved (signature == "IVRS")
|
IVRS Signature Reserved (signature == "IVRS")
|
||||||
== I/O Virtualization Reporting Structure ==
|
|
||||||
|
**I/O Virtualization Reporting Structure**
|
||||||
|
|
||||||
x86_64 (AMD) only table, will not be supported.
|
x86_64 (AMD) only table, will not be supported.
|
||||||
|
|
||||||
LPIT Signature Reserved (signature == "LPIT")
|
LPIT Signature Reserved (signature == "LPIT")
|
||||||
== Low Power Idle Table ==
|
|
||||||
|
**Low Power Idle Table**
|
||||||
|
|
||||||
x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
|
x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
|
||||||
descriptions and power states on ARM platforms should use the DSDT
|
descriptions and power states on ARM platforms should use the DSDT
|
||||||
and define processor container devices (_HID ACPI0010, Section 8.4,
|
and define processor container devices (_HID ACPI0010, Section 8.4,
|
||||||
and more specifically 8.4.3 and and 8.4.4).
|
and more specifically 8.4.3 and and 8.4.4).
|
||||||
|
|
||||||
MADT Section 5.2.12 (signature == "APIC")
|
MADT Section 5.2.12 (signature == "APIC")
|
||||||
== Multiple APIC Description Table ==
|
|
||||||
|
**Multiple APIC Description Table**
|
||||||
|
|
||||||
Required for arm64. Only the GIC interrupt controller structures
|
Required for arm64. Only the GIC interrupt controller structures
|
||||||
should be used (types 0xA - 0xF).
|
should be used (types 0xA - 0xF).
|
||||||
|
|
||||||
MCFG Signature Reserved (signature == "MCFG")
|
MCFG Signature Reserved (signature == "MCFG")
|
||||||
== Memory-mapped ConFiGuration space ==
|
|
||||||
|
**Memory-mapped ConFiGuration space**
|
||||||
|
|
||||||
If the platform supports PCI/PCIe, an MCFG table is required.
|
If the platform supports PCI/PCIe, an MCFG table is required.
|
||||||
|
|
||||||
MCHI Signature Reserved (signature == "MCHI")
|
MCHI Signature Reserved (signature == "MCHI")
|
||||||
== Management Controller Host Interface table ==
|
|
||||||
|
**Management Controller Host Interface table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
MPST Section 5.2.21 (signature == "MPST")
|
MPST Section 5.2.21 (signature == "MPST")
|
||||||
== Memory Power State Table ==
|
|
||||||
|
**Memory Power State Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
MSCT Section 5.2.19 (signature == "MSCT")
|
MSCT Section 5.2.19 (signature == "MSCT")
|
||||||
== Maximum System Characteristic Table ==
|
|
||||||
|
**Maximum System Characteristic Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
MSDM Signature Reserved (signature == "MSDM")
|
MSDM Signature Reserved (signature == "MSDM")
|
||||||
== Microsoft Data Management table ==
|
|
||||||
|
**Microsoft Data Management table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
NFIT Section 5.2.25 (signature == "NFIT")
|
NFIT Section 5.2.25 (signature == "NFIT")
|
||||||
== NVDIMM Firmware Interface Table ==
|
|
||||||
|
**NVDIMM Firmware Interface Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
OEMx Signature of "OEMx" only
|
OEMx Signature of "OEMx" only
|
||||||
== OEM Specific Tables ==
|
|
||||||
|
**OEM Specific Tables**
|
||||||
|
|
||||||
All tables starting with a signature of "OEM" are reserved for OEM
|
All tables starting with a signature of "OEM" are reserved for OEM
|
||||||
use. Since these are not meant to be of general use but are limited
|
use. Since these are not meant to be of general use but are limited
|
||||||
to very specific end users, they are not recommended for use and are
|
to very specific end users, they are not recommended for use and are
|
||||||
not supported by the kernel for arm64.
|
not supported by the kernel for arm64.
|
||||||
|
|
||||||
PCCT Section 14.1 (signature == "PCCT)
|
PCCT Section 14.1 (signature == "PCCT)
|
||||||
== Platform Communications Channel Table ==
|
|
||||||
|
**Platform Communications Channel Table**
|
||||||
|
|
||||||
Recommend for use on arm64; use of PCC is recommended when using CPPC
|
Recommend for use on arm64; use of PCC is recommended when using CPPC
|
||||||
to control performance and power for platform processors.
|
to control performance and power for platform processors.
|
||||||
|
|
||||||
PMTT Section 5.2.21.12 (signature == "PMTT")
|
PMTT Section 5.2.21.12 (signature == "PMTT")
|
||||||
== Platform Memory Topology Table ==
|
|
||||||
|
**Platform Memory Topology Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
PSDT Section 5.2.11.3 (signature == "PSDT")
|
PSDT Section 5.2.11.3 (signature == "PSDT")
|
||||||
== Persistent System Description Table ==
|
|
||||||
|
**Persistent System Description Table**
|
||||||
|
|
||||||
Obsolete table, will not be supported.
|
Obsolete table, will not be supported.
|
||||||
|
|
||||||
RASF Section 5.2.20 (signature == "RASF")
|
RASF Section 5.2.20 (signature == "RASF")
|
||||||
== RAS Feature table ==
|
|
||||||
|
**RAS Feature table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
RSDP Section 5.2.5 (signature == "RSD PTR")
|
RSDP Section 5.2.5 (signature == "RSD PTR")
|
||||||
== Root System Description PoinTeR ==
|
|
||||||
|
**Root System Description PoinTeR**
|
||||||
|
|
||||||
Required for arm64.
|
Required for arm64.
|
||||||
|
|
||||||
RSDT Section 5.2.7 (signature == "RSDT")
|
RSDT Section 5.2.7 (signature == "RSDT")
|
||||||
== Root System Description Table ==
|
|
||||||
|
**Root System Description Table**
|
||||||
|
|
||||||
Since this table can only provide 32-bit addresses, it is deprecated
|
Since this table can only provide 32-bit addresses, it is deprecated
|
||||||
on arm64, and will not be used. If provided, it will be ignored.
|
on arm64, and will not be used. If provided, it will be ignored.
|
||||||
|
|
||||||
SBST Section 5.2.14 (signature == "SBST")
|
SBST Section 5.2.14 (signature == "SBST")
|
||||||
== Smart Battery Subsystem Table ==
|
|
||||||
|
**Smart Battery Subsystem Table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
SLIC Signature Reserved (signature == "SLIC")
|
SLIC Signature Reserved (signature == "SLIC")
|
||||||
== Software LIcensing table ==
|
|
||||||
|
**Software LIcensing table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
SLIT Section 5.2.17 (signature == "SLIT")
|
SLIT Section 5.2.17 (signature == "SLIT")
|
||||||
== System Locality distance Information Table ==
|
|
||||||
|
**System Locality distance Information Table**
|
||||||
|
|
||||||
Optional in general, but required for NUMA systems.
|
Optional in general, but required for NUMA systems.
|
||||||
|
|
||||||
SPCR Signature Reserved (signature == "SPCR")
|
SPCR Signature Reserved (signature == "SPCR")
|
||||||
== Serial Port Console Redirection table ==
|
|
||||||
|
**Serial Port Console Redirection table**
|
||||||
|
|
||||||
Required for arm64.
|
Required for arm64.
|
||||||
|
|
||||||
SPMI Signature Reserved (signature == "SPMI")
|
SPMI Signature Reserved (signature == "SPMI")
|
||||||
== Server Platform Management Interface table ==
|
|
||||||
|
**Server Platform Management Interface table**
|
||||||
|
|
||||||
Optional, not currently supported.
|
Optional, not currently supported.
|
||||||
|
|
||||||
SRAT Section 5.2.16 (signature == "SRAT")
|
SRAT Section 5.2.16 (signature == "SRAT")
|
||||||
== System Resource Affinity Table ==
|
|
||||||
|
**System Resource Affinity Table**
|
||||||
|
|
||||||
Optional, but if used, only the GICC Affinity structures are read.
|
Optional, but if used, only the GICC Affinity structures are read.
|
||||||
To support arm64 NUMA, this table is required.
|
To support arm64 NUMA, this table is required.
|
||||||
|
|
||||||
SSDT Section 5.2.11.2 (signature == "SSDT")
|
SSDT Section 5.2.11.2 (signature == "SSDT")
|
||||||
== Secondary System Description Table ==
|
|
||||||
|
**Secondary System Description Table**
|
||||||
|
|
||||||
These tables are a continuation of the DSDT; these are recommended
|
These tables are a continuation of the DSDT; these are recommended
|
||||||
for use with devices that can be added to a running system, but can
|
for use with devices that can be added to a running system, but can
|
||||||
also serve the purpose of dividing up device descriptions into more
|
also serve the purpose of dividing up device descriptions into more
|
||||||
|
@ -272,49 +365,69 @@ SSDT Section 5.2.11.2 (signature == "SSDT")
|
||||||
one DSDT but can contain many SSDTs.
|
one DSDT but can contain many SSDTs.
|
||||||
|
|
||||||
STAO Signature Reserved (signature == "STAO")
|
STAO Signature Reserved (signature == "STAO")
|
||||||
== _STA Override table ==
|
|
||||||
|
**_STA Override table**
|
||||||
|
|
||||||
Optional, but only necessary in virtualized environments in order to
|
Optional, but only necessary in virtualized environments in order to
|
||||||
hide devices from guest OSs.
|
hide devices from guest OSs.
|
||||||
|
|
||||||
TCPA Signature Reserved (signature == "TCPA")
|
TCPA Signature Reserved (signature == "TCPA")
|
||||||
== Trusted Computing Platform Alliance table ==
|
|
||||||
|
**Trusted Computing Platform Alliance table**
|
||||||
|
|
||||||
Optional, not currently supported, and may need changes to fully
|
Optional, not currently supported, and may need changes to fully
|
||||||
interoperate with arm64.
|
interoperate with arm64.
|
||||||
|
|
||||||
TPM2 Signature Reserved (signature == "TPM2")
|
TPM2 Signature Reserved (signature == "TPM2")
|
||||||
== Trusted Platform Module 2 table ==
|
|
||||||
|
**Trusted Platform Module 2 table**
|
||||||
|
|
||||||
Optional, not currently supported, and may need changes to fully
|
Optional, not currently supported, and may need changes to fully
|
||||||
interoperate with arm64.
|
interoperate with arm64.
|
||||||
|
|
||||||
UEFI Signature Reserved (signature == "UEFI")
|
UEFI Signature Reserved (signature == "UEFI")
|
||||||
== UEFI ACPI data table ==
|
|
||||||
|
**UEFI ACPI data table**
|
||||||
|
|
||||||
Optional, not currently supported. No known use case for arm64,
|
Optional, not currently supported. No known use case for arm64,
|
||||||
at present.
|
at present.
|
||||||
|
|
||||||
WAET Signature Reserved (signature == "WAET")
|
WAET Signature Reserved (signature == "WAET")
|
||||||
== Windows ACPI Emulated devices Table ==
|
|
||||||
|
**Windows ACPI Emulated devices Table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
WDAT Signature Reserved (signature == "WDAT")
|
WDAT Signature Reserved (signature == "WDAT")
|
||||||
== Watch Dog Action Table ==
|
|
||||||
|
**Watch Dog Action Table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
WDRT Signature Reserved (signature == "WDRT")
|
WDRT Signature Reserved (signature == "WDRT")
|
||||||
== Watch Dog Resource Table ==
|
|
||||||
|
**Watch Dog Resource Table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
WPBT Signature Reserved (signature == "WPBT")
|
WPBT Signature Reserved (signature == "WPBT")
|
||||||
== Windows Platform Binary Table ==
|
|
||||||
|
**Windows Platform Binary Table**
|
||||||
|
|
||||||
Microsoft only table, will not be supported.
|
Microsoft only table, will not be supported.
|
||||||
|
|
||||||
XENV Signature Reserved (signature == "XENV")
|
XENV Signature Reserved (signature == "XENV")
|
||||||
== Xen project table ==
|
|
||||||
|
**Xen project table**
|
||||||
|
|
||||||
Optional, used only by Xen at present.
|
Optional, used only by Xen at present.
|
||||||
|
|
||||||
XSDT Section 5.2.8 (signature == "XSDT")
|
XSDT Section 5.2.8 (signature == "XSDT")
|
||||||
== eXtended System Description Table ==
|
|
||||||
Required for arm64.
|
|
||||||
|
|
||||||
|
**eXtended System Description Table**
|
||||||
|
|
||||||
|
Required for arm64.
|
||||||
|
====== ========================================================================
|
||||||
|
|
||||||
ACPI Objects
|
ACPI Objects
|
||||||
------------
|
------------
|
||||||
|
@ -323,10 +436,11 @@ shown in the list that follows; any object not explicitly mentioned below
|
||||||
should be used as needed for a particular platform or particular subsystem,
|
should be used as needed for a particular platform or particular subsystem,
|
||||||
such as power management or PCI.
|
such as power management or PCI.
|
||||||
|
|
||||||
|
===== ================ ========================================================
|
||||||
Name Section Usage for ARMv8 Linux
|
Name Section Usage for ARMv8 Linux
|
||||||
---- ------------ -------------------------------------------------
|
===== ================ ========================================================
|
||||||
_CCA 6.2.17 This method must be defined for all bus masters
|
_CCA 6.2.17 This method must be defined for all bus masters
|
||||||
on arm64 -- there are no assumptions made about
|
on arm64 - there are no assumptions made about
|
||||||
whether such devices are cache coherent or not.
|
whether such devices are cache coherent or not.
|
||||||
The _CCA value is inherited by all descendants of
|
The _CCA value is inherited by all descendants of
|
||||||
these devices so it does not need to be repeated.
|
these devices so it does not need to be repeated.
|
||||||
|
@ -422,8 +536,8 @@ _OSC 6.2.11 This method can be a global method in ACPI (i.e.,
|
||||||
by the kernel community, then register it with the
|
by the kernel community, then register it with the
|
||||||
UEFI Forum.
|
UEFI Forum.
|
||||||
|
|
||||||
\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is
|
\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is
|
||||||
concerned, _OSI is not to be used to determine what
|
concerned, _OSI is not to be used to determine what
|
||||||
sort of system is being used or what functionality
|
sort of system is being used or what functionality
|
||||||
is provided. The _OSC method is to be used instead.
|
is provided. The _OSC method is to be used instead.
|
||||||
|
|
||||||
|
@ -447,7 +561,7 @@ _PSx 7.3.2-5 Use as needed; power management specific. If _PS0 is
|
||||||
usage, change them in these methods.
|
usage, change them in these methods.
|
||||||
|
|
||||||
_RDI 8.4.4.4 Recommended for use with processor definitions (_HID
|
_RDI 8.4.4.4 Recommended for use with processor definitions (_HID
|
||||||
ACPI0010) on arm64. This should only be used in
|
ACPI0010) on arm64. This should only be used in
|
||||||
conjunction with _LPI.
|
conjunction with _LPI.
|
||||||
|
|
||||||
\_REV 5.7.4 Always returns the latest version of ACPI supported.
|
\_REV 5.7.4 Always returns the latest version of ACPI supported.
|
||||||
|
@ -476,6 +590,7 @@ _SWS 7.4.3 Use as needed; power management specific; this may
|
||||||
|
|
||||||
_UID 6.1.12 Recommended for distinguishing devices of the same
|
_UID 6.1.12 Recommended for distinguishing devices of the same
|
||||||
class; define it if at all possible.
|
class; define it if at all possible.
|
||||||
|
===== ================ ========================================================
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -488,7 +603,7 @@ platforms, ACPI events must be signaled differently.
|
||||||
|
|
||||||
There are two options: GPIO-signaled interrupts (Section 5.6.5), and
|
There are two options: GPIO-signaled interrupts (Section 5.6.5), and
|
||||||
interrupt-signaled events (Section 5.6.9). Interrupt-signaled events are a
|
interrupt-signaled events (Section 5.6.9). Interrupt-signaled events are a
|
||||||
new feature in the ACPI 6.1 specification. Either -- or both -- can be used
|
new feature in the ACPI 6.1 specification. Either - or both - can be used
|
||||||
on a given platform, and which to use may be dependent of limitations in any
|
on a given platform, and which to use may be dependent of limitations in any
|
||||||
given SoC. If possible, interrupt-signaled events are recommended.
|
given SoC. If possible, interrupt-signaled events are recommended.
|
||||||
|
|
||||||
|
@ -564,39 +679,40 @@ supported.
|
||||||
|
|
||||||
The following classes of objects are not supported:
|
The following classes of objects are not supported:
|
||||||
|
|
||||||
-- Section 9.2: ambient light sensor devices
|
- Section 9.2: ambient light sensor devices
|
||||||
|
|
||||||
-- Section 9.3: battery devices
|
- Section 9.3: battery devices
|
||||||
|
|
||||||
-- Section 9.4: lids (e.g., laptop lids)
|
- Section 9.4: lids (e.g., laptop lids)
|
||||||
|
|
||||||
-- Section 9.8.2: IDE controllers
|
- Section 9.8.2: IDE controllers
|
||||||
|
|
||||||
-- Section 9.9: floppy controllers
|
- Section 9.9: floppy controllers
|
||||||
|
|
||||||
-- Section 9.10: GPE block devices
|
- Section 9.10: GPE block devices
|
||||||
|
|
||||||
-- Section 9.15: PC/AT RTC/CMOS devices
|
- Section 9.15: PC/AT RTC/CMOS devices
|
||||||
|
|
||||||
-- Section 9.16: user presence detection devices
|
- Section 9.16: user presence detection devices
|
||||||
|
|
||||||
-- Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT
|
- Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT
|
||||||
|
|
||||||
-- Section 9.18: time and alarm devices (see 9.15)
|
- Section 9.18: time and alarm devices (see 9.15)
|
||||||
|
|
||||||
-- Section 10: power source and power meter devices
|
- Section 10: power source and power meter devices
|
||||||
|
|
||||||
-- Section 11: thermal management
|
- Section 11: thermal management
|
||||||
|
|
||||||
-- Section 12: embedded controllers interface
|
- Section 12: embedded controllers interface
|
||||||
|
|
||||||
-- Section 13: SMBus interfaces
|
- Section 13: SMBus interfaces
|
||||||
|
|
||||||
|
|
||||||
This also means that there is no support for the following objects:
|
This also means that there is no support for the following objects:
|
||||||
|
|
||||||
|
==== =========================== ==== ==========
|
||||||
Name Section Name Section
|
Name Section Name Section
|
||||||
---- ------------ ---- ------------
|
==== =========================== ==== ==========
|
||||||
_ALC 9.3.4 _FDM 9.10.3
|
_ALC 9.3.4 _FDM 9.10.3
|
||||||
_ALI 9.3.2 _FIX 6.2.7
|
_ALI 9.3.2 _FIX 6.2.7
|
||||||
_ALP 9.3.6 _GAI 10.4.5
|
_ALP 9.3.6 _GAI 10.4.5
|
||||||
|
@ -619,4 +735,4 @@ _DCK 6.5.2 _UPD 9.16.1
|
||||||
_EC 12.12 _UPP 9.16.2
|
_EC 12.12 _UPP 9.16.2
|
||||||
_FDE 9.10.1 _WPC 10.5.2
|
_FDE 9.10.1 _WPC 10.5.2
|
||||||
_FDI 9.10.2 _WPP 10.5.3
|
_FDI 9.10.2 _WPP 10.5.3
|
||||||
|
==== =========================== ==== ==========
|
|
@ -1,5 +1,7 @@
|
||||||
|
=====================
|
||||||
ACPI on ARMv8 Servers
|
ACPI on ARMv8 Servers
|
||||||
---------------------
|
=====================
|
||||||
|
|
||||||
ACPI can be used for ARMv8 general purpose servers designed to follow
|
ACPI can be used for ARMv8 general purpose servers designed to follow
|
||||||
the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
|
the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
|
||||||
Base Boot Requirements) [1] specifications. Please note that the SBBR
|
Base Boot Requirements) [1] specifications. Please note that the SBBR
|
||||||
|
@ -34,28 +36,28 @@ of the summary text almost directly, to be honest.
|
||||||
|
|
||||||
The short form of the rationale for ACPI on ARM is:
|
The short form of the rationale for ACPI on ARM is:
|
||||||
|
|
||||||
-- ACPI’s byte code (AML) allows the platform to encode hardware behavior,
|
- ACPI’s byte code (AML) allows the platform to encode hardware behavior,
|
||||||
while DT explicitly does not support this. For hardware vendors, being
|
while DT explicitly does not support this. For hardware vendors, being
|
||||||
able to encode behavior is a key tool used in supporting operating
|
able to encode behavior is a key tool used in supporting operating
|
||||||
system releases on new hardware.
|
system releases on new hardware.
|
||||||
|
|
||||||
-- ACPI’s OSPM defines a power management model that constrains what the
|
- ACPI’s OSPM defines a power management model that constrains what the
|
||||||
platform is allowed to do into a specific model, while still providing
|
platform is allowed to do into a specific model, while still providing
|
||||||
flexibility in hardware design.
|
flexibility in hardware design.
|
||||||
|
|
||||||
-- In the enterprise server environment, ACPI has established bindings (such
|
- In the enterprise server environment, ACPI has established bindings (such
|
||||||
as for RAS) which are currently used in production systems. DT does not.
|
as for RAS) which are currently used in production systems. DT does not.
|
||||||
Such bindings could be defined in DT at some point, but doing so means ARM
|
Such bindings could be defined in DT at some point, but doing so means ARM
|
||||||
and x86 would end up using completely different code paths in both firmware
|
and x86 would end up using completely different code paths in both firmware
|
||||||
and the kernel.
|
and the kernel.
|
||||||
|
|
||||||
-- Choosing a single interface to describe the abstraction between a platform
|
- Choosing a single interface to describe the abstraction between a platform
|
||||||
and an OS is important. Hardware vendors would not be required to implement
|
and an OS is important. Hardware vendors would not be required to implement
|
||||||
both DT and ACPI if they want to support multiple operating systems. And,
|
both DT and ACPI if they want to support multiple operating systems. And,
|
||||||
agreeing on a single interface instead of being fragmented into per OS
|
agreeing on a single interface instead of being fragmented into per OS
|
||||||
interfaces makes for better interoperability overall.
|
interfaces makes for better interoperability overall.
|
||||||
|
|
||||||
-- The new ACPI governance process works well and Linux is now at the same
|
- The new ACPI governance process works well and Linux is now at the same
|
||||||
table as hardware vendors and other OS vendors. In fact, there is no
|
table as hardware vendors and other OS vendors. In fact, there is no
|
||||||
longer any reason to feel that ACPI only belongs to Windows or that
|
longer any reason to feel that ACPI only belongs to Windows or that
|
||||||
Linux is in any way secondary to Microsoft in this arena. The move of
|
Linux is in any way secondary to Microsoft in this arena. The move of
|
||||||
|
@ -169,31 +171,31 @@ For the ACPI core to operate properly, and in turn provide the information
|
||||||
the kernel needs to configure devices, it expects to find the following
|
the kernel needs to configure devices, it expects to find the following
|
||||||
tables (all section numbers refer to the ACPI 6.1 specification):
|
tables (all section numbers refer to the ACPI 6.1 specification):
|
||||||
|
|
||||||
-- RSDP (Root System Description Pointer), section 5.2.5
|
- RSDP (Root System Description Pointer), section 5.2.5
|
||||||
|
|
||||||
-- XSDT (eXtended System Description Table), section 5.2.8
|
- XSDT (eXtended System Description Table), section 5.2.8
|
||||||
|
|
||||||
-- FADT (Fixed ACPI Description Table), section 5.2.9
|
- FADT (Fixed ACPI Description Table), section 5.2.9
|
||||||
|
|
||||||
-- DSDT (Differentiated System Description Table), section
|
- DSDT (Differentiated System Description Table), section
|
||||||
5.2.11.1
|
5.2.11.1
|
||||||
|
|
||||||
-- MADT (Multiple APIC Description Table), section 5.2.12
|
- MADT (Multiple APIC Description Table), section 5.2.12
|
||||||
|
|
||||||
-- GTDT (Generic Timer Description Table), section 5.2.24
|
- GTDT (Generic Timer Description Table), section 5.2.24
|
||||||
|
|
||||||
-- If PCI is supported, the MCFG (Memory mapped ConFiGuration
|
- If PCI is supported, the MCFG (Memory mapped ConFiGuration
|
||||||
Table), section 5.2.6, specifically Table 5-31.
|
Table), section 5.2.6, specifically Table 5-31.
|
||||||
|
|
||||||
-- If booting without a console=<device> kernel parameter is
|
- If booting without a console=<device> kernel parameter is
|
||||||
supported, the SPCR (Serial Port Console Redirection table),
|
supported, the SPCR (Serial Port Console Redirection table),
|
||||||
section 5.2.6, specifically Table 5-31.
|
section 5.2.6, specifically Table 5-31.
|
||||||
|
|
||||||
-- If necessary to describe the I/O topology, SMMUs and GIC ITSs,
|
- If necessary to describe the I/O topology, SMMUs and GIC ITSs,
|
||||||
the IORT (Input Output Remapping Table, section 5.2.6, specifically
|
the IORT (Input Output Remapping Table, section 5.2.6, specifically
|
||||||
Table 5-31).
|
Table 5-31).
|
||||||
|
|
||||||
-- If NUMA is supported, the SRAT (System Resource Affinity Table)
|
- If NUMA is supported, the SRAT (System Resource Affinity Table)
|
||||||
and SLIT (System Locality distance Information Table), sections
|
and SLIT (System Locality distance Information Table), sections
|
||||||
5.2.16 and 5.2.17, respectively.
|
5.2.16 and 5.2.17, respectively.
|
||||||
|
|
||||||
|
@ -269,9 +271,9 @@ describes how to define the structure of an object returned via _DSD, and
|
||||||
how specific data structures are defined by specific UUIDs. Linux should
|
how specific data structures are defined by specific UUIDs. Linux should
|
||||||
only use the _DSD Device Properties UUID [5]:
|
only use the _DSD Device Properties UUID [5]:
|
||||||
|
|
||||||
-- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
|
- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
|
||||||
|
|
||||||
-- http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
|
- http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
|
||||||
|
|
||||||
The UEFI Forum provides a mechanism for registering device properties [4]
|
The UEFI Forum provides a mechanism for registering device properties [4]
|
||||||
so that they may be used across all operating systems supporting ACPI.
|
so that they may be used across all operating systems supporting ACPI.
|
||||||
|
@ -327,10 +329,10 @@ turning a device full off.
|
||||||
|
|
||||||
There are two options for using those Power Resources. They can:
|
There are two options for using those Power Resources. They can:
|
||||||
|
|
||||||
-- be managed in a _PSx method which gets called on entry to power
|
- be managed in a _PSx method which gets called on entry to power
|
||||||
state Dx.
|
state Dx.
|
||||||
|
|
||||||
-- be declared separately as power resources with their own _ON and _OFF
|
- be declared separately as power resources with their own _ON and _OFF
|
||||||
methods. They are then tied back to D-states for a particular device
|
methods. They are then tied back to D-states for a particular device
|
||||||
via _PRx which specifies which power resources a device needs to be on
|
via _PRx which specifies which power resources a device needs to be on
|
||||||
while in Dx. Kernel then tracks number of devices using a power resource
|
while in Dx. Kernel then tracks number of devices using a power resource
|
||||||
|
@ -339,16 +341,16 @@ There are two options for using those Power Resources. They can:
|
||||||
The kernel ACPI code will also assume that the _PSx methods follow the normal
|
The kernel ACPI code will also assume that the _PSx methods follow the normal
|
||||||
ACPI rules for such methods:
|
ACPI rules for such methods:
|
||||||
|
|
||||||
-- If either _PS0 or _PS3 is implemented, then the other method must also
|
- If either _PS0 or _PS3 is implemented, then the other method must also
|
||||||
be implemented.
|
be implemented.
|
||||||
|
|
||||||
-- If a device requires usage or setup of a power resource when on, the ASL
|
- If a device requires usage or setup of a power resource when on, the ASL
|
||||||
should organize that it is allocated/enabled using the _PS0 method.
|
should organize that it is allocated/enabled using the _PS0 method.
|
||||||
|
|
||||||
-- Resources allocated or enabled in the _PS0 method should be disabled
|
- Resources allocated or enabled in the _PS0 method should be disabled
|
||||||
or de-allocated in the _PS3 method.
|
or de-allocated in the _PS3 method.
|
||||||
|
|
||||||
-- Firmware will leave the resources in a reasonable state before handing
|
- Firmware will leave the resources in a reasonable state before handing
|
||||||
over control to the kernel.
|
over control to the kernel.
|
||||||
|
|
||||||
Such code in _PSx methods will of course be very platform specific. But,
|
Such code in _PSx methods will of course be very platform specific. But,
|
||||||
|
@ -394,52 +396,52 @@ else must be discovered by the driver probe function. Then, have the rest
|
||||||
of the driver operate off of the contents of that struct. Doing so should
|
of the driver operate off of the contents of that struct. Doing so should
|
||||||
allow most divergence between ACPI and DT functionality to be kept local to
|
allow most divergence between ACPI and DT functionality to be kept local to
|
||||||
the probe function instead of being scattered throughout the driver. For
|
the probe function instead of being scattered throughout the driver. For
|
||||||
example:
|
example::
|
||||||
|
|
||||||
static int device_probe_dt(struct platform_device *pdev)
|
static int device_probe_dt(struct platform_device *pdev)
|
||||||
{
|
{
|
||||||
/* DT specific functionality */
|
/* DT specific functionality */
|
||||||
...
|
...
|
||||||
}
|
}
|
||||||
|
|
||||||
static int device_probe_acpi(struct platform_device *pdev)
|
static int device_probe_acpi(struct platform_device *pdev)
|
||||||
{
|
{
|
||||||
/* ACPI specific functionality */
|
/* ACPI specific functionality */
|
||||||
...
|
...
|
||||||
}
|
}
|
||||||
|
|
||||||
static int device_probe(struct platform_device *pdev)
|
static int device_probe(struct platform_device *pdev)
|
||||||
{
|
{
|
||||||
...
|
...
|
||||||
struct device_node node = pdev->dev.of_node;
|
struct device_node node = pdev->dev.of_node;
|
||||||
...
|
...
|
||||||
|
|
||||||
if (node)
|
if (node)
|
||||||
ret = device_probe_dt(pdev);
|
ret = device_probe_dt(pdev);
|
||||||
else if (ACPI_HANDLE(&pdev->dev))
|
else if (ACPI_HANDLE(&pdev->dev))
|
||||||
ret = device_probe_acpi(pdev);
|
ret = device_probe_acpi(pdev);
|
||||||
else
|
else
|
||||||
/* other initialization */
|
/* other initialization */
|
||||||
...
|
...
|
||||||
/* Continue with any generic probe operations */
|
/* Continue with any generic probe operations */
|
||||||
...
|
...
|
||||||
}
|
}
|
||||||
|
|
||||||
DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it
|
DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it
|
||||||
clear the different names the driver is probed for, both from DT and from
|
clear the different names the driver is probed for, both from DT and from
|
||||||
ACPI:
|
ACPI::
|
||||||
|
|
||||||
static struct of_device_id virtio_mmio_match[] = {
|
static struct of_device_id virtio_mmio_match[] = {
|
||||||
{ .compatible = "virtio,mmio", },
|
{ .compatible = "virtio,mmio", },
|
||||||
{ }
|
{ }
|
||||||
};
|
};
|
||||||
MODULE_DEVICE_TABLE(of, virtio_mmio_match);
|
MODULE_DEVICE_TABLE(of, virtio_mmio_match);
|
||||||
|
|
||||||
static const struct acpi_device_id virtio_mmio_acpi_match[] = {
|
static const struct acpi_device_id virtio_mmio_acpi_match[] = {
|
||||||
{ "LNRO0005", },
|
{ "LNRO0005", },
|
||||||
{ }
|
{ }
|
||||||
};
|
};
|
||||||
MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match);
|
MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match);
|
||||||
|
|
||||||
|
|
||||||
ASWG
|
ASWG
|
||||||
|
@ -471,7 +473,8 @@ Linux Code
|
||||||
Individual items specific to Linux on ARM, contained in the the Linux
|
Individual items specific to Linux on ARM, contained in the the Linux
|
||||||
source code, are in the list that follows:
|
source code, are in the list that follows:
|
||||||
|
|
||||||
ACPI_OS_NAME This macro defines the string to be returned when
|
ACPI_OS_NAME
|
||||||
|
This macro defines the string to be returned when
|
||||||
an ACPI method invokes the _OS method. On ARM64
|
an ACPI method invokes the _OS method. On ARM64
|
||||||
systems, this macro will be "Linux" by default.
|
systems, this macro will be "Linux" by default.
|
||||||
The command line parameter acpi_os=<string>
|
The command line parameter acpi_os=<string>
|
||||||
|
@ -482,38 +485,44 @@ ACPI_OS_NAME This macro defines the string to be returned when
|
||||||
ACPI Objects
|
ACPI Objects
|
||||||
------------
|
------------
|
||||||
Detailed expectations for ACPI tables and object are listed in the file
|
Detailed expectations for ACPI tables and object are listed in the file
|
||||||
Documentation/arm64/acpi_object_usage.txt.
|
Documentation/arm64/acpi_object_usage.rst.
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
----------
|
----------
|
||||||
[0] http://silver.arm.com -- document ARM-DEN-0029, or newer
|
[0] http://silver.arm.com
|
||||||
|
document ARM-DEN-0029, or newer:
|
||||||
"Server Base System Architecture", version 2.3, dated 27 Mar 2014
|
"Server Base System Architecture", version 2.3, dated 27 Mar 2014
|
||||||
|
|
||||||
[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
|
[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
|
||||||
Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
|
Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
|
||||||
Software on ARM Platforms", dated 16 Aug 2014
|
Software on ARM Platforms", dated 16 Aug 2014
|
||||||
|
|
||||||
[2] http://www.secretlab.ca/archives/151, 10 Jan 2015, Copyright (c) 2015,
|
[2] http://www.secretlab.ca/archives/151,
|
||||||
|
10 Jan 2015, Copyright (c) 2015,
|
||||||
Linaro Ltd., written by Grant Likely.
|
Linaro Ltd., written by Grant Likely.
|
||||||
|
|
||||||
[3] AMD ACPI for Seattle platform documentation:
|
[3] AMD ACPI for Seattle platform documentation
|
||||||
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
|
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
|
||||||
|
|
||||||
[4] http://www.uefi.org/acpi -- please see the link for the "ACPI _DSD Device
|
|
||||||
|
[4] http://www.uefi.org/acpi
|
||||||
|
please see the link for the "ACPI _DSD Device
|
||||||
Property Registry Instructions"
|
Property Registry Instructions"
|
||||||
|
|
||||||
[5] http://www.uefi.org/acpi -- please see the link for the "_DSD (Device
|
[5] http://www.uefi.org/acpi
|
||||||
|
please see the link for the "_DSD (Device
|
||||||
Specific Data) Implementation Guide"
|
Specific Data) Implementation Guide"
|
||||||
|
|
||||||
[6] Kernel code for the unified device property interface can be found in
|
[6] Kernel code for the unified device
|
||||||
|
property interface can be found in
|
||||||
include/linux/property.h and drivers/base/property.c.
|
include/linux/property.h and drivers/base/property.c.
|
||||||
|
|
||||||
|
|
||||||
Authors
|
Authors
|
||||||
-------
|
-------
|
||||||
Al Stone <al.stone@linaro.org>
|
- Al Stone <al.stone@linaro.org>
|
||||||
Graeme Gregory <graeme.gregory@linaro.org>
|
- Graeme Gregory <graeme.gregory@linaro.org>
|
||||||
Hanjun Guo <hanjun.guo@linaro.org>
|
- Hanjun Guo <hanjun.guo@linaro.org>
|
||||||
|
|
||||||
Grant Likely <grant.likely@linaro.org>, for the "Why ACPI on ARM?" section
|
- Grant Likely <grant.likely@linaro.org>, for the "Why ACPI on ARM?" section
|
|
@ -1,7 +1,9 @@
|
||||||
Booting AArch64 Linux
|
=====================
|
||||||
=====================
|
Booting AArch64 Linux
|
||||||
|
=====================
|
||||||
|
|
||||||
Author: Will Deacon <will.deacon@arm.com>
|
Author: Will Deacon <will.deacon@arm.com>
|
||||||
|
|
||||||
Date : 07 September 2012
|
Date : 07 September 2012
|
||||||
|
|
||||||
This document is based on the ARM booting document by Russell King and
|
This document is based on the ARM booting document by Russell King and
|
||||||
|
@ -12,7 +14,7 @@ The AArch64 exception model is made up of a number of exception levels
|
||||||
counterpart. EL2 is the hypervisor level and exists only in non-secure
|
counterpart. EL2 is the hypervisor level and exists only in non-secure
|
||||||
mode. EL3 is the highest priority level and exists only in secure mode.
|
mode. EL3 is the highest priority level and exists only in secure mode.
|
||||||
|
|
||||||
For the purposes of this document, we will use the term `boot loader'
|
For the purposes of this document, we will use the term `boot loader`
|
||||||
simply to define all software that executes on the CPU(s) before control
|
simply to define all software that executes on the CPU(s) before control
|
||||||
is passed to the Linux kernel. This may include secure monitor and
|
is passed to the Linux kernel. This may include secure monitor and
|
||||||
hypervisor code, or it may just be a handful of instructions for
|
hypervisor code, or it may just be a handful of instructions for
|
||||||
|
@ -70,7 +72,7 @@ Image target is available instead.
|
||||||
|
|
||||||
Requirement: MANDATORY
|
Requirement: MANDATORY
|
||||||
|
|
||||||
The decompressed kernel image contains a 64-byte header as follows:
|
The decompressed kernel image contains a 64-byte header as follows::
|
||||||
|
|
||||||
u32 code0; /* Executable code */
|
u32 code0; /* Executable code */
|
||||||
u32 code1; /* Executable code */
|
u32 code1; /* Executable code */
|
||||||
|
@ -103,19 +105,26 @@ Header notes:
|
||||||
|
|
||||||
- The flags field (introduced in v3.17) is a little-endian 64-bit field
|
- The flags field (introduced in v3.17) is a little-endian 64-bit field
|
||||||
composed as follows:
|
composed as follows:
|
||||||
Bit 0: Kernel endianness. 1 if BE, 0 if LE.
|
|
||||||
Bit 1-2: Kernel Page size.
|
============= ===============================================================
|
||||||
0 - Unspecified.
|
Bit 0 Kernel endianness. 1 if BE, 0 if LE.
|
||||||
1 - 4K
|
Bit 1-2 Kernel Page size.
|
||||||
2 - 16K
|
|
||||||
3 - 64K
|
* 0 - Unspecified.
|
||||||
Bit 3: Kernel physical placement
|
* 1 - 4K
|
||||||
0 - 2MB aligned base should be as close as possible
|
* 2 - 16K
|
||||||
to the base of DRAM, since memory below it is not
|
* 3 - 64K
|
||||||
accessible via the linear mapping
|
Bit 3 Kernel physical placement
|
||||||
1 - 2MB aligned base may be anywhere in physical
|
|
||||||
memory
|
0
|
||||||
Bits 4-63: Reserved.
|
2MB aligned base should be as close as possible
|
||||||
|
to the base of DRAM, since memory below it is not
|
||||||
|
accessible via the linear mapping
|
||||||
|
1
|
||||||
|
2MB aligned base may be anywhere in physical
|
||||||
|
memory
|
||||||
|
Bits 4-63 Reserved.
|
||||||
|
============= ===============================================================
|
||||||
|
|
||||||
- When image_size is zero, a bootloader should attempt to keep as much
|
- When image_size is zero, a bootloader should attempt to keep as much
|
||||||
memory as possible free for use by the kernel immediately after the
|
memory as possible free for use by the kernel immediately after the
|
||||||
|
@ -147,19 +156,22 @@ Before jumping into the kernel, the following conditions must be met:
|
||||||
corrupted by bogus network packets or disk data. This will save
|
corrupted by bogus network packets or disk data. This will save
|
||||||
you many hours of debug.
|
you many hours of debug.
|
||||||
|
|
||||||
- Primary CPU general-purpose register settings
|
- Primary CPU general-purpose register settings:
|
||||||
x0 = physical address of device tree blob (dtb) in system RAM.
|
|
||||||
x1 = 0 (reserved for future use)
|
- x0 = physical address of device tree blob (dtb) in system RAM.
|
||||||
x2 = 0 (reserved for future use)
|
- x1 = 0 (reserved for future use)
|
||||||
x3 = 0 (reserved for future use)
|
- x2 = 0 (reserved for future use)
|
||||||
|
- x3 = 0 (reserved for future use)
|
||||||
|
|
||||||
- CPU mode
|
- CPU mode
|
||||||
|
|
||||||
All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
|
All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
|
||||||
IRQ and FIQ).
|
IRQ and FIQ).
|
||||||
The CPU must be in either EL2 (RECOMMENDED in order to have access to
|
The CPU must be in either EL2 (RECOMMENDED in order to have access to
|
||||||
the virtualisation extensions) or non-secure EL1.
|
the virtualisation extensions) or non-secure EL1.
|
||||||
|
|
||||||
- Caches, MMUs
|
- Caches, MMUs
|
||||||
|
|
||||||
The MMU must be off.
|
The MMU must be off.
|
||||||
Instruction cache may be on or off.
|
Instruction cache may be on or off.
|
||||||
The address range corresponding to the loaded kernel image must be
|
The address range corresponding to the loaded kernel image must be
|
||||||
|
@ -172,18 +184,21 @@ Before jumping into the kernel, the following conditions must be met:
|
||||||
operations (not recommended) must be configured and disabled.
|
operations (not recommended) must be configured and disabled.
|
||||||
|
|
||||||
- Architected timers
|
- Architected timers
|
||||||
|
|
||||||
CNTFRQ must be programmed with the timer frequency and CNTVOFF must
|
CNTFRQ must be programmed with the timer frequency and CNTVOFF must
|
||||||
be programmed with a consistent value on all CPUs. If entering the
|
be programmed with a consistent value on all CPUs. If entering the
|
||||||
kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where
|
kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where
|
||||||
available.
|
available.
|
||||||
|
|
||||||
- Coherency
|
- Coherency
|
||||||
|
|
||||||
All CPUs to be booted by the kernel must be part of the same coherency
|
All CPUs to be booted by the kernel must be part of the same coherency
|
||||||
domain on entry to the kernel. This may require IMPLEMENTATION DEFINED
|
domain on entry to the kernel. This may require IMPLEMENTATION DEFINED
|
||||||
initialisation to enable the receiving of maintenance operations on
|
initialisation to enable the receiving of maintenance operations on
|
||||||
each CPU.
|
each CPU.
|
||||||
|
|
||||||
- System registers
|
- System registers
|
||||||
|
|
||||||
All writable architected system registers at the exception level where
|
All writable architected system registers at the exception level where
|
||||||
the kernel image will be entered must be initialised by software at a
|
the kernel image will be entered must be initialised by software at a
|
||||||
higher exception level to prevent execution in an UNKNOWN state.
|
higher exception level to prevent execution in an UNKNOWN state.
|
||||||
|
@ -195,28 +210,40 @@ Before jumping into the kernel, the following conditions must be met:
|
||||||
|
|
||||||
For systems with a GICv3 interrupt controller to be used in v3 mode:
|
For systems with a GICv3 interrupt controller to be used in v3 mode:
|
||||||
- If EL3 is present:
|
- If EL3 is present:
|
||||||
ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
|
|
||||||
ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1.
|
- ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
|
||||||
|
- ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1.
|
||||||
|
|
||||||
- If the kernel is entered at EL1:
|
- If the kernel is entered at EL1:
|
||||||
ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
|
|
||||||
ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.
|
- ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
|
||||||
|
- ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.
|
||||||
|
|
||||||
- The DT or ACPI tables must describe a GICv3 interrupt controller.
|
- The DT or ACPI tables must describe a GICv3 interrupt controller.
|
||||||
|
|
||||||
For systems with a GICv3 interrupt controller to be used in
|
For systems with a GICv3 interrupt controller to be used in
|
||||||
compatibility (v2) mode:
|
compatibility (v2) mode:
|
||||||
|
|
||||||
- If EL3 is present:
|
- If EL3 is present:
|
||||||
ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0.
|
|
||||||
|
ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0.
|
||||||
|
|
||||||
- If the kernel is entered at EL1:
|
- If the kernel is entered at EL1:
|
||||||
ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0.
|
|
||||||
|
ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0.
|
||||||
|
|
||||||
- The DT or ACPI tables must describe a GICv2 interrupt controller.
|
- The DT or ACPI tables must describe a GICv2 interrupt controller.
|
||||||
|
|
||||||
For CPUs with pointer authentication functionality:
|
For CPUs with pointer authentication functionality:
|
||||||
- If EL3 is present:
|
- If EL3 is present:
|
||||||
SCR_EL3.APK (bit 16) must be initialised to 0b1
|
|
||||||
SCR_EL3.API (bit 17) must be initialised to 0b1
|
- SCR_EL3.APK (bit 16) must be initialised to 0b1
|
||||||
|
- SCR_EL3.API (bit 17) must be initialised to 0b1
|
||||||
|
|
||||||
- If the kernel is entered at EL1:
|
- If the kernel is entered at EL1:
|
||||||
HCR_EL2.APK (bit 40) must be initialised to 0b1
|
|
||||||
HCR_EL2.API (bit 41) must be initialised to 0b1
|
- HCR_EL2.APK (bit 40) must be initialised to 0b1
|
||||||
|
- HCR_EL2.API (bit 41) must be initialised to 0b1
|
||||||
|
|
||||||
The requirements described above for CPU mode, caches, MMUs, architected
|
The requirements described above for CPU mode, caches, MMUs, architected
|
||||||
timers, coherency and system registers apply to all CPUs. All CPUs must
|
timers, coherency and system registers apply to all CPUs. All CPUs must
|
|
@ -1,5 +1,6 @@
|
||||||
ARM64 CPU Feature Registers
|
===========================
|
||||||
===========================
|
ARM64 CPU Feature Registers
|
||||||
|
===========================
|
||||||
|
|
||||||
Author: Suzuki K Poulose <suzuki.poulose@arm.com>
|
Author: Suzuki K Poulose <suzuki.poulose@arm.com>
|
||||||
|
|
||||||
|
@ -9,7 +10,7 @@ registers to userspace. The availability of this ABI is advertised
|
||||||
via the HWCAP_CPUID in HWCAPs.
|
via the HWCAP_CPUID in HWCAPs.
|
||||||
|
|
||||||
1. Motivation
|
1. Motivation
|
||||||
---------------
|
-------------
|
||||||
|
|
||||||
The ARM architecture defines a set of feature registers, which describe
|
The ARM architecture defines a set of feature registers, which describe
|
||||||
the capabilities of the CPU/system. Access to these system registers is
|
the capabilities of the CPU/system. Access to these system registers is
|
||||||
|
@ -33,9 +34,10 @@ there are some issues with their usage.
|
||||||
|
|
||||||
|
|
||||||
2. Requirements
|
2. Requirements
|
||||||
-----------------
|
---------------
|
||||||
|
|
||||||
|
a) Safety:
|
||||||
|
|
||||||
a) Safety :
|
|
||||||
Applications should be able to use the information provided by the
|
Applications should be able to use the information provided by the
|
||||||
infrastructure to run safely across the system. This has greater
|
infrastructure to run safely across the system. This has greater
|
||||||
implications on a system with heterogeneous CPUs.
|
implications on a system with heterogeneous CPUs.
|
||||||
|
@ -47,7 +49,8 @@ there are some issues with their usage.
|
||||||
Otherwise an application could crash when scheduled on the CPU
|
Otherwise an application could crash when scheduled on the CPU
|
||||||
which doesn't support CRC32.
|
which doesn't support CRC32.
|
||||||
|
|
||||||
b) Security :
|
b) Security:
|
||||||
|
|
||||||
Applications should only be able to receive information that is
|
Applications should only be able to receive information that is
|
||||||
relevant to the normal operation in userspace. Hence, some of the
|
relevant to the normal operation in userspace. Hence, some of the
|
||||||
fields are masked out(i.e, made invisible) and their values are set to
|
fields are masked out(i.e, made invisible) and their values are set to
|
||||||
|
@ -58,10 +61,12 @@ there are some issues with their usage.
|
||||||
(even when the CPU provides it).
|
(even when the CPU provides it).
|
||||||
|
|
||||||
c) Implementation Defined Features
|
c) Implementation Defined Features
|
||||||
|
|
||||||
The infrastructure doesn't expose any register which is
|
The infrastructure doesn't expose any register which is
|
||||||
IMPLEMENTATION DEFINED as per ARMv8-A Architecture.
|
IMPLEMENTATION DEFINED as per ARMv8-A Architecture.
|
||||||
|
|
||||||
d) CPU Identification :
|
d) CPU Identification:
|
||||||
|
|
||||||
MIDR_EL1 is exposed to help identify the processor. On a
|
MIDR_EL1 is exposed to help identify the processor. On a
|
||||||
heterogeneous system, this could be racy (just like getcpu()). The
|
heterogeneous system, this could be racy (just like getcpu()). The
|
||||||
process could be migrated to another CPU by the time it uses the
|
process could be migrated to another CPU by the time it uses the
|
||||||
|
@ -70,7 +75,7 @@ there are some issues with their usage.
|
||||||
currently executing on. The REVIDR is not exposed due to this
|
currently executing on. The REVIDR is not exposed due to this
|
||||||
constraint, as REVIDR makes sense only in conjunction with the
|
constraint, as REVIDR makes sense only in conjunction with the
|
||||||
MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
|
MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
|
||||||
at:
|
at::
|
||||||
|
|
||||||
/sys/devices/system/cpu/cpu$ID/regs/identification/
|
/sys/devices/system/cpu/cpu$ID/regs/identification/
|
||||||
\- midr
|
\- midr
|
||||||
|
@ -85,7 +90,8 @@ exception and ends up in SIGILL being delivered to the process.
|
||||||
The infrastructure hooks into the exception handler and emulates the
|
The infrastructure hooks into the exception handler and emulates the
|
||||||
operation if the source belongs to the supported system register space.
|
operation if the source belongs to the supported system register space.
|
||||||
|
|
||||||
The infrastructure emulates only the following system register space:
|
The infrastructure emulates only the following system register space::
|
||||||
|
|
||||||
Op0=3, Op1=0, CRn=0, CRm=0,4,5,6,7
|
Op0=3, Op1=0, CRn=0, CRm=0,4,5,6,7
|
||||||
|
|
||||||
(See Table C5-6 'System instruction encodings for non-Debug System
|
(See Table C5-6 'System instruction encodings for non-Debug System
|
||||||
|
@ -107,73 +113,76 @@ infrastructure:
|
||||||
-------------------------------------------
|
-------------------------------------------
|
||||||
|
|
||||||
1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0
|
1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0
|
||||||
x--------------------------------------------------x
|
|
||||||
|
+------------------------------+---------+---------+
|
||||||
| Name | bits | visible |
|
| Name | bits | visible |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| TS | [55-52] | y |
|
| TS | [55-52] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| FHM | [51-48] | y |
|
| FHM | [51-48] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| DP | [47-44] | y |
|
| DP | [47-44] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SM4 | [43-40] | y |
|
| SM4 | [43-40] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SM3 | [39-36] | y |
|
| SM3 | [39-36] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SHA3 | [35-32] | y |
|
| SHA3 | [35-32] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| RDM | [31-28] | y |
|
| RDM | [31-28] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| ATOMICS | [23-20] | y |
|
| ATOMICS | [23-20] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| CRC32 | [19-16] | y |
|
| CRC32 | [19-16] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SHA2 | [15-12] | y |
|
| SHA2 | [15-12] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SHA1 | [11-8] | y |
|
| SHA1 | [11-8] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| AES | [7-4] | y |
|
| AES | [7-4] | y |
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
|
|
||||||
|
|
||||||
2) ID_AA64PFR0_EL1 - Processor Feature Register 0
|
2) ID_AA64PFR0_EL1 - Processor Feature Register 0
|
||||||
x--------------------------------------------------x
|
|
||||||
|
+------------------------------+---------+---------+
|
||||||
| Name | bits | visible |
|
| Name | bits | visible |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| DIT | [51-48] | y |
|
| DIT | [51-48] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SVE | [35-32] | y |
|
| SVE | [35-32] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| GIC | [27-24] | n |
|
| GIC | [27-24] | n |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| AdvSIMD | [23-20] | y |
|
| AdvSIMD | [23-20] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| FP | [19-16] | y |
|
| FP | [19-16] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| EL3 | [15-12] | n |
|
| EL3 | [15-12] | n |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| EL2 | [11-8] | n |
|
| EL2 | [11-8] | n |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| EL1 | [7-4] | n |
|
| EL1 | [7-4] | n |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| EL0 | [3-0] | n |
|
| EL0 | [3-0] | n |
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
|
|
||||||
|
|
||||||
3) MIDR_EL1 - Main ID Register
|
3) MIDR_EL1 - Main ID Register
|
||||||
x--------------------------------------------------x
|
|
||||||
|
+------------------------------+---------+---------+
|
||||||
| Name | bits | visible |
|
| Name | bits | visible |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| Implementer | [31-24] | y |
|
| Implementer | [31-24] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| Variant | [23-20] | y |
|
| Variant | [23-20] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| Architecture | [19-16] | y |
|
| Architecture | [19-16] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| PartNum | [15-4] | y |
|
| PartNum | [15-4] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| Revision | [3-0] | y |
|
| Revision | [3-0] | y |
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
|
|
||||||
NOTE: The 'visible' fields of MIDR_EL1 will contain the value
|
NOTE: The 'visible' fields of MIDR_EL1 will contain the value
|
||||||
as available on the CPU where it is fetched and is not a system
|
as available on the CPU where it is fetched and is not a system
|
||||||
|
@ -181,90 +190,92 @@ infrastructure:
|
||||||
|
|
||||||
4) ID_AA64ISAR1_EL1 - Instruction set attribute register 1
|
4) ID_AA64ISAR1_EL1 - Instruction set attribute register 1
|
||||||
|
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
| Name | bits | visible |
|
| Name | bits | visible |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| GPI | [31-28] | y |
|
| GPI | [31-28] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| GPA | [27-24] | y |
|
| GPA | [27-24] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| LRCPC | [23-20] | y |
|
| LRCPC | [23-20] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| FCMA | [19-16] | y |
|
| FCMA | [19-16] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| JSCVT | [15-12] | y |
|
| JSCVT | [15-12] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| API | [11-8] | y |
|
| API | [11-8] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| APA | [7-4] | y |
|
| APA | [7-4] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| DPB | [3-0] | y |
|
| DPB | [3-0] | y |
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
|
|
||||||
5) ID_AA64MMFR2_EL1 - Memory model feature register 2
|
5) ID_AA64MMFR2_EL1 - Memory model feature register 2
|
||||||
|
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
| Name | bits | visible |
|
| Name | bits | visible |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| AT | [35-32] | y |
|
| AT | [35-32] | y |
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
|
|
||||||
6) ID_AA64ZFR0_EL1 - SVE feature ID register 0
|
6) ID_AA64ZFR0_EL1 - SVE feature ID register 0
|
||||||
|
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
| Name | bits | visible |
|
| Name | bits | visible |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SM4 | [43-40] | y |
|
| SM4 | [43-40] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SHA3 | [35-32] | y |
|
| SHA3 | [35-32] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| BitPerm | [19-16] | y |
|
| BitPerm | [19-16] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| AES | [7-4] | y |
|
| AES | [7-4] | y |
|
||||||
|--------------------------------------------------|
|
+------------------------------+---------+---------+
|
||||||
| SVEVer | [3-0] | y |
|
| SVEVer | [3-0] | y |
|
||||||
x--------------------------------------------------x
|
+------------------------------+---------+---------+
|
||||||
|
|
||||||
Appendix I: Example
|
Appendix I: Example
|
||||||
---------------------------
|
-------------------
|
||||||
|
|
||||||
/*
|
::
|
||||||
* Sample program to demonstrate the MRS emulation ABI.
|
|
||||||
*
|
|
||||||
* Copyright (C) 2015-2016, ARM Ltd
|
|
||||||
*
|
|
||||||
* Author: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
||||||
*
|
|
||||||
* This program is free software; you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License version 2 as
|
|
||||||
* published by the Free Software Foundation.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
* This program is free software; you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License version 2 as
|
|
||||||
* published by the Free Software Foundation.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#include <asm/hwcap.h>
|
/*
|
||||||
#include <stdio.h>
|
* Sample program to demonstrate the MRS emulation ABI.
|
||||||
#include <sys/auxv.h>
|
*
|
||||||
|
* Copyright (C) 2015-2016, ARM Ltd
|
||||||
|
*
|
||||||
|
* Author: Suzuki K Poulose <suzuki.poulose@arm.com>
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License version 2 as
|
||||||
|
* published by the Free Software Foundation.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*/
|
||||||
|
|
||||||
#define get_cpu_ftr(id) ({ \
|
#include <asm/hwcap.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <sys/auxv.h>
|
||||||
|
|
||||||
|
#define get_cpu_ftr(id) ({ \
|
||||||
unsigned long __val; \
|
unsigned long __val; \
|
||||||
asm("mrs %0, "#id : "=r" (__val)); \
|
asm("mrs %0, "#id : "=r" (__val)); \
|
||||||
printf("%-20s: 0x%016lx\n", #id, __val); \
|
printf("%-20s: 0x%016lx\n", #id, __val); \
|
||||||
})
|
})
|
||||||
|
|
||||||
int main(void)
|
int main(void)
|
||||||
{
|
{
|
||||||
|
|
||||||
if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
|
if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
|
||||||
fputs("CPUID registers unavailable\n", stderr);
|
fputs("CPUID registers unavailable\n", stderr);
|
||||||
|
@ -284,13 +295,10 @@ int main(void)
|
||||||
get_cpu_ftr(MPIDR_EL1);
|
get_cpu_ftr(MPIDR_EL1);
|
||||||
get_cpu_ftr(REVIDR_EL1);
|
get_cpu_ftr(REVIDR_EL1);
|
||||||
|
|
||||||
#if 0
|
#if 0
|
||||||
/* Unexposed register access causes SIGILL */
|
/* Unexposed register access causes SIGILL */
|
||||||
get_cpu_ftr(ID_MMFR0_EL1);
|
get_cpu_ftr(ID_MMFR0_EL1);
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
================
|
||||||
ARM64 ELF hwcaps
|
ARM64 ELF hwcaps
|
||||||
================
|
================
|
||||||
|
|
||||||
|
@ -15,16 +16,16 @@ of flags called hwcaps, exposed in the auxilliary vector.
|
||||||
|
|
||||||
Userspace software can test for features by acquiring the AT_HWCAP or
|
Userspace software can test for features by acquiring the AT_HWCAP or
|
||||||
AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant
|
AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant
|
||||||
flags are set, e.g.
|
flags are set, e.g.::
|
||||||
|
|
||||||
bool floating_point_is_present(void)
|
bool floating_point_is_present(void)
|
||||||
{
|
{
|
||||||
unsigned long hwcaps = getauxval(AT_HWCAP);
|
unsigned long hwcaps = getauxval(AT_HWCAP);
|
||||||
if (hwcaps & HWCAP_FP)
|
if (hwcaps & HWCAP_FP)
|
||||||
return true;
|
return true;
|
||||||
|
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
Where software relies on a feature described by a hwcap, it should check
|
Where software relies on a feature described by a hwcap, it should check
|
||||||
the relevant hwcap flag to verify that the feature is present before
|
the relevant hwcap flag to verify that the feature is present before
|
||||||
|
@ -45,7 +46,7 @@ userspace code at EL0. These hwcaps are defined in terms of ID register
|
||||||
fields, and should be interpreted with reference to the definition of
|
fields, and should be interpreted with reference to the definition of
|
||||||
these fields in the ARM Architecture Reference Manual (ARM ARM).
|
these fields in the ARM Architecture Reference Manual (ARM ARM).
|
||||||
|
|
||||||
Such hwcaps are described below in the form:
|
Such hwcaps are described below in the form::
|
||||||
|
|
||||||
Functionality implied by idreg.field == val.
|
Functionality implied by idreg.field == val.
|
||||||
|
|
||||||
|
@ -64,75 +65,58 @@ reference to ID registers, and may refer to other documentation.
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
HWCAP_FP
|
HWCAP_FP
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000.
|
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000.
|
||||||
|
|
||||||
HWCAP_ASIMD
|
HWCAP_ASIMD
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000.
|
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000.
|
||||||
|
|
||||||
HWCAP_EVTSTRM
|
HWCAP_EVTSTRM
|
||||||
|
|
||||||
The generic timer is configured to generate events at a frequency of
|
The generic timer is configured to generate events at a frequency of
|
||||||
approximately 100KHz.
|
approximately 100KHz.
|
||||||
|
|
||||||
HWCAP_AES
|
HWCAP_AES
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001.
|
||||||
|
|
||||||
HWCAP_PMULL
|
HWCAP_PMULL
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010.
|
Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010.
|
||||||
|
|
||||||
HWCAP_SHA1
|
HWCAP_SHA1
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
|
||||||
|
|
||||||
HWCAP_SHA2
|
HWCAP_SHA2
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.
|
||||||
|
|
||||||
HWCAP_CRC32
|
HWCAP_CRC32
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001.
|
||||||
|
|
||||||
HWCAP_ATOMICS
|
HWCAP_ATOMICS
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010.
|
Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010.
|
||||||
|
|
||||||
HWCAP_FPHP
|
HWCAP_FPHP
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001.
|
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001.
|
||||||
|
|
||||||
HWCAP_ASIMDHP
|
HWCAP_ASIMDHP
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001.
|
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001.
|
||||||
|
|
||||||
HWCAP_CPUID
|
HWCAP_CPUID
|
||||||
|
|
||||||
EL0 access to certain ID registers is available, to the extent
|
EL0 access to certain ID registers is available, to the extent
|
||||||
described by Documentation/arm64/cpu-feature-registers.txt.
|
described by Documentation/arm64/cpu-feature-registers.rst.
|
||||||
|
|
||||||
These ID registers may imply the availability of features.
|
These ID registers may imply the availability of features.
|
||||||
|
|
||||||
HWCAP_ASIMDRDM
|
HWCAP_ASIMDRDM
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001.
|
||||||
|
|
||||||
HWCAP_JSCVT
|
HWCAP_JSCVT
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001.
|
Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001.
|
||||||
|
|
||||||
HWCAP_FCMA
|
HWCAP_FCMA
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001.
|
Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001.
|
||||||
|
|
||||||
HWCAP_LRCPC
|
HWCAP_LRCPC
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001.
|
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001.
|
||||||
|
|
||||||
HWCAP_DCPOP
|
HWCAP_DCPOP
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001.
|
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001.
|
||||||
|
|
||||||
HWCAP2_DCPODP
|
HWCAP2_DCPODP
|
||||||
|
@ -140,27 +124,21 @@ HWCAP2_DCPODP
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
|
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
|
||||||
|
|
||||||
HWCAP_SHA3
|
HWCAP_SHA3
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001.
|
||||||
|
|
||||||
HWCAP_SM3
|
HWCAP_SM3
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001.
|
||||||
|
|
||||||
HWCAP_SM4
|
HWCAP_SM4
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001.
|
||||||
|
|
||||||
HWCAP_ASIMDDP
|
HWCAP_ASIMDDP
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001.
|
||||||
|
|
||||||
HWCAP_SHA512
|
HWCAP_SHA512
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010.
|
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010.
|
||||||
|
|
||||||
HWCAP_SVE
|
HWCAP_SVE
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001.
|
Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001.
|
||||||
|
|
||||||
HWCAP2_SVE2
|
HWCAP2_SVE2
|
||||||
|
@ -188,23 +166,18 @@ HWCAP2_SVESM4
|
||||||
Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001.
|
Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001.
|
||||||
|
|
||||||
HWCAP_ASIMDFHM
|
HWCAP_ASIMDFHM
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001.
|
||||||
|
|
||||||
HWCAP_DIT
|
HWCAP_DIT
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001.
|
Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001.
|
||||||
|
|
||||||
HWCAP_USCAT
|
HWCAP_USCAT
|
||||||
|
|
||||||
Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001.
|
Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001.
|
||||||
|
|
||||||
HWCAP_ILRCPC
|
HWCAP_ILRCPC
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010.
|
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010.
|
||||||
|
|
||||||
HWCAP_FLAGM
|
HWCAP_FLAGM
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
|
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
|
||||||
|
|
||||||
HWCAP2_FLAGM2
|
HWCAP2_FLAGM2
|
||||||
|
@ -212,20 +185,17 @@ HWCAP2_FLAGM2
|
||||||
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010.
|
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010.
|
||||||
|
|
||||||
HWCAP_SSBS
|
HWCAP_SSBS
|
||||||
|
|
||||||
Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.
|
Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.
|
||||||
|
|
||||||
HWCAP_PACA
|
HWCAP_PACA
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or
|
Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or
|
||||||
ID_AA64ISAR1_EL1.API == 0b0001, as described by
|
ID_AA64ISAR1_EL1.API == 0b0001, as described by
|
||||||
Documentation/arm64/pointer-authentication.txt.
|
Documentation/arm64/pointer-authentication.rst.
|
||||||
|
|
||||||
HWCAP_PACG
|
HWCAP_PACG
|
||||||
|
|
||||||
Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or
|
Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or
|
||||||
ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
|
ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
|
||||||
Documentation/arm64/pointer-authentication.txt.
|
Documentation/arm64/pointer-authentication.rst.
|
||||||
|
|
||||||
HWCAP2_FRINT
|
HWCAP2_FRINT
|
||||||
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
====================
|
||||||
HugeTLBpage on ARM64
|
HugeTLBpage on ARM64
|
||||||
====================
|
====================
|
||||||
|
|
||||||
|
@ -31,8 +32,10 @@ and level of the page table.
|
||||||
|
|
||||||
The following hugepage sizes are supported -
|
The following hugepage sizes are supported -
|
||||||
|
|
||||||
CONT PTE PMD CONT PMD PUD
|
====== ======== ==== ======== ===
|
||||||
-------- --- -------- ---
|
- CONT PTE PMD CONT PMD PUD
|
||||||
|
====== ======== ==== ======== ===
|
||||||
4K: 64K 2M 32M 1G
|
4K: 64K 2M 32M 1G
|
||||||
16K: 2M 32M 1G
|
16K: 2M 32M 1G
|
||||||
64K: 2M 512M 16G
|
64K: 2M 512M 16G
|
||||||
|
====== ======== ==== ======== ===
|
|
@ -0,0 +1,28 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
==================
|
||||||
|
ARM64 Architecture
|
||||||
|
==================
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
acpi_object_usage
|
||||||
|
arm-acpi
|
||||||
|
booting
|
||||||
|
cpu-feature-registers
|
||||||
|
elf_hwcaps
|
||||||
|
hugetlbpage
|
||||||
|
legacy_instructions
|
||||||
|
memory
|
||||||
|
pointer-authentication
|
||||||
|
silicon-errata
|
||||||
|
sve
|
||||||
|
tagged-pointers
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,3 +1,7 @@
|
||||||
|
===================
|
||||||
|
Legacy instructions
|
||||||
|
===================
|
||||||
|
|
||||||
The arm64 port of the Linux kernel provides infrastructure to support
|
The arm64 port of the Linux kernel provides infrastructure to support
|
||||||
emulation of instructions which have been deprecated, or obsoleted in
|
emulation of instructions which have been deprecated, or obsoleted in
|
||||||
the architecture. The infrastructure code uses undefined instruction
|
the architecture. The infrastructure code uses undefined instruction
|
||||||
|
@ -9,19 +13,22 @@ The emulation mode can be controlled by writing to sysctl nodes
|
||||||
behaviours and the corresponding values of the sysctl nodes -
|
behaviours and the corresponding values of the sysctl nodes -
|
||||||
|
|
||||||
* Undef
|
* Undef
|
||||||
Value: 0
|
Value: 0
|
||||||
|
|
||||||
Generates undefined instruction abort. Default for instructions that
|
Generates undefined instruction abort. Default for instructions that
|
||||||
have been obsoleted in the architecture, e.g., SWP
|
have been obsoleted in the architecture, e.g., SWP
|
||||||
|
|
||||||
* Emulate
|
* Emulate
|
||||||
Value: 1
|
Value: 1
|
||||||
|
|
||||||
Uses software emulation. To aid migration of software, in this mode
|
Uses software emulation. To aid migration of software, in this mode
|
||||||
usage of emulated instruction is traced as well as rate limited
|
usage of emulated instruction is traced as well as rate limited
|
||||||
warnings are issued. This is the default for deprecated
|
warnings are issued. This is the default for deprecated
|
||||||
instructions, .e.g., CP15 barriers
|
instructions, .e.g., CP15 barriers
|
||||||
|
|
||||||
* Hardware Execution
|
* Hardware Execution
|
||||||
Value: 2
|
Value: 2
|
||||||
|
|
||||||
Although marked as deprecated, some implementations may support the
|
Although marked as deprecated, some implementations may support the
|
||||||
enabling/disabling of hardware support for the execution of these
|
enabling/disabling of hardware support for the execution of these
|
||||||
instructions. Using hardware execution generally provides better
|
instructions. Using hardware execution generally provides better
|
||||||
|
@ -38,20 +45,24 @@ individual instruction notes for further information.
|
||||||
Supported legacy instructions
|
Supported legacy instructions
|
||||||
-----------------------------
|
-----------------------------
|
||||||
* SWP{B}
|
* SWP{B}
|
||||||
Node: /proc/sys/abi/swp
|
|
||||||
Status: Obsolete
|
:Node: /proc/sys/abi/swp
|
||||||
Default: Undef (0)
|
:Status: Obsolete
|
||||||
|
:Default: Undef (0)
|
||||||
|
|
||||||
* CP15 Barriers
|
* CP15 Barriers
|
||||||
Node: /proc/sys/abi/cp15_barrier
|
|
||||||
Status: Deprecated
|
:Node: /proc/sys/abi/cp15_barrier
|
||||||
Default: Emulate (1)
|
:Status: Deprecated
|
||||||
|
:Default: Emulate (1)
|
||||||
|
|
||||||
* SETEND
|
* SETEND
|
||||||
Node: /proc/sys/abi/setend
|
|
||||||
Status: Deprecated
|
:Node: /proc/sys/abi/setend
|
||||||
Default: Emulate (1)*
|
:Status: Deprecated
|
||||||
Note: All the cpus on the system must have mixed endian support at EL0
|
:Default: Emulate (1)*
|
||||||
for this feature to be enabled. If a new CPU - which doesn't support mixed
|
|
||||||
endian - is hotplugged in after this feature has been enabled, there could
|
Note: All the cpus on the system must have mixed endian support at EL0
|
||||||
be unexpected results in the application.
|
for this feature to be enabled. If a new CPU - which doesn't support mixed
|
||||||
|
endian - is hotplugged in after this feature has been enabled, there could
|
||||||
|
be unexpected results in the application.
|
|
@ -0,0 +1,98 @@
|
||||||
|
==============================
|
||||||
|
Memory Layout on AArch64 Linux
|
||||||
|
==============================
|
||||||
|
|
||||||
|
Author: Catalin Marinas <catalin.marinas@arm.com>
|
||||||
|
|
||||||
|
This document describes the virtual memory layout used by the AArch64
|
||||||
|
Linux kernel. The architecture allows up to 4 levels of translation
|
||||||
|
tables with a 4KB page size and up to 3 levels with a 64KB page size.
|
||||||
|
|
||||||
|
AArch64 Linux uses either 3 levels or 4 levels of translation tables
|
||||||
|
with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
|
||||||
|
(256TB) virtual addresses, respectively, for both user and kernel. With
|
||||||
|
64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
|
||||||
|
virtual address, are used but the memory layout is the same.
|
||||||
|
|
||||||
|
User addresses have bits 63:48 set to 0 while the kernel addresses have
|
||||||
|
the same bits set to 1. TTBRx selection is given by bit 63 of the
|
||||||
|
virtual address. The swapper_pg_dir contains only kernel (global)
|
||||||
|
mappings while the user pgd contains only user (non-global) mappings.
|
||||||
|
The swapper_pg_dir address is written to TTBR1 and never written to
|
||||||
|
TTBR0.
|
||||||
|
|
||||||
|
|
||||||
|
AArch64 Linux memory layout with 4KB pages + 3 levels::
|
||||||
|
|
||||||
|
Start End Size Use
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
0000000000000000 0000007fffffffff 512GB user
|
||||||
|
ffffff8000000000 ffffffffffffffff 512GB kernel
|
||||||
|
|
||||||
|
|
||||||
|
AArch64 Linux memory layout with 4KB pages + 4 levels::
|
||||||
|
|
||||||
|
Start End Size Use
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
0000000000000000 0000ffffffffffff 256TB user
|
||||||
|
ffff000000000000 ffffffffffffffff 256TB kernel
|
||||||
|
|
||||||
|
|
||||||
|
AArch64 Linux memory layout with 64KB pages + 2 levels::
|
||||||
|
|
||||||
|
Start End Size Use
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
0000000000000000 000003ffffffffff 4TB user
|
||||||
|
fffffc0000000000 ffffffffffffffff 4TB kernel
|
||||||
|
|
||||||
|
|
||||||
|
AArch64 Linux memory layout with 64KB pages + 3 levels::
|
||||||
|
|
||||||
|
Start End Size Use
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
0000000000000000 0000ffffffffffff 256TB user
|
||||||
|
ffff000000000000 ffffffffffffffff 256TB kernel
|
||||||
|
|
||||||
|
|
||||||
|
For details of the virtual kernel memory layout please see the kernel
|
||||||
|
booting log.
|
||||||
|
|
||||||
|
|
||||||
|
Translation table lookup with 4KB pages::
|
||||||
|
|
||||||
|
+--------+--------+--------+--------+--------+--------+--------+--------+
|
||||||
|
|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
|
||||||
|
+--------+--------+--------+--------+--------+--------+--------+--------+
|
||||||
|
| | | | | |
|
||||||
|
| | | | | v
|
||||||
|
| | | | | [11:0] in-page offset
|
||||||
|
| | | | +-> [20:12] L3 index
|
||||||
|
| | | +-----------> [29:21] L2 index
|
||||||
|
| | +---------------------> [38:30] L1 index
|
||||||
|
| +-------------------------------> [47:39] L0 index
|
||||||
|
+-------------------------------------------------> [63] TTBR0/1
|
||||||
|
|
||||||
|
|
||||||
|
Translation table lookup with 64KB pages::
|
||||||
|
|
||||||
|
+--------+--------+--------+--------+--------+--------+--------+--------+
|
||||||
|
|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
|
||||||
|
+--------+--------+--------+--------+--------+--------+--------+--------+
|
||||||
|
| | | | |
|
||||||
|
| | | | v
|
||||||
|
| | | | [15:0] in-page offset
|
||||||
|
| | | +----------> [28:16] L3 index
|
||||||
|
| | +--------------------------> [41:29] L2 index
|
||||||
|
| +-------------------------------> [47:42] L1 index
|
||||||
|
+-------------------------------------------------> [63] TTBR0/1
|
||||||
|
|
||||||
|
|
||||||
|
When using KVM without the Virtualization Host Extensions, the
|
||||||
|
hypervisor maps kernel pages in EL2 at a fixed (and potentially
|
||||||
|
random) offset from the linear mapping. See the kern_hyp_va macro and
|
||||||
|
kvm_update_va_mask function for more details. MMIO devices such as
|
||||||
|
GICv2 gets mapped next to the HYP idmap page, as do vectors when
|
||||||
|
ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
|
||||||
|
|
||||||
|
When using KVM with the Virtualization Host Extensions, no additional
|
||||||
|
mappings are created, since the host kernel runs directly in EL2.
|
|
@ -1,97 +0,0 @@
|
||||||
Memory Layout on AArch64 Linux
|
|
||||||
==============================
|
|
||||||
|
|
||||||
Author: Catalin Marinas <catalin.marinas@arm.com>
|
|
||||||
|
|
||||||
This document describes the virtual memory layout used by the AArch64
|
|
||||||
Linux kernel. The architecture allows up to 4 levels of translation
|
|
||||||
tables with a 4KB page size and up to 3 levels with a 64KB page size.
|
|
||||||
|
|
||||||
AArch64 Linux uses either 3 levels or 4 levels of translation tables
|
|
||||||
with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
|
|
||||||
(256TB) virtual addresses, respectively, for both user and kernel. With
|
|
||||||
64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
|
|
||||||
virtual address, are used but the memory layout is the same.
|
|
||||||
|
|
||||||
User addresses have bits 63:48 set to 0 while the kernel addresses have
|
|
||||||
the same bits set to 1. TTBRx selection is given by bit 63 of the
|
|
||||||
virtual address. The swapper_pg_dir contains only kernel (global)
|
|
||||||
mappings while the user pgd contains only user (non-global) mappings.
|
|
||||||
The swapper_pg_dir address is written to TTBR1 and never written to
|
|
||||||
TTBR0.
|
|
||||||
|
|
||||||
|
|
||||||
AArch64 Linux memory layout with 4KB pages + 3 levels:
|
|
||||||
|
|
||||||
Start End Size Use
|
|
||||||
-----------------------------------------------------------------------
|
|
||||||
0000000000000000 0000007fffffffff 512GB user
|
|
||||||
ffffff8000000000 ffffffffffffffff 512GB kernel
|
|
||||||
|
|
||||||
|
|
||||||
AArch64 Linux memory layout with 4KB pages + 4 levels:
|
|
||||||
|
|
||||||
Start End Size Use
|
|
||||||
-----------------------------------------------------------------------
|
|
||||||
0000000000000000 0000ffffffffffff 256TB user
|
|
||||||
ffff000000000000 ffffffffffffffff 256TB kernel
|
|
||||||
|
|
||||||
|
|
||||||
AArch64 Linux memory layout with 64KB pages + 2 levels:
|
|
||||||
|
|
||||||
Start End Size Use
|
|
||||||
-----------------------------------------------------------------------
|
|
||||||
0000000000000000 000003ffffffffff 4TB user
|
|
||||||
fffffc0000000000 ffffffffffffffff 4TB kernel
|
|
||||||
|
|
||||||
|
|
||||||
AArch64 Linux memory layout with 64KB pages + 3 levels:
|
|
||||||
|
|
||||||
Start End Size Use
|
|
||||||
-----------------------------------------------------------------------
|
|
||||||
0000000000000000 0000ffffffffffff 256TB user
|
|
||||||
ffff000000000000 ffffffffffffffff 256TB kernel
|
|
||||||
|
|
||||||
|
|
||||||
For details of the virtual kernel memory layout please see the kernel
|
|
||||||
booting log.
|
|
||||||
|
|
||||||
|
|
||||||
Translation table lookup with 4KB pages:
|
|
||||||
|
|
||||||
+--------+--------+--------+--------+--------+--------+--------+--------+
|
|
||||||
|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
|
|
||||||
+--------+--------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| | | | | |
|
|
||||||
| | | | | v
|
|
||||||
| | | | | [11:0] in-page offset
|
|
||||||
| | | | +-> [20:12] L3 index
|
|
||||||
| | | +-----------> [29:21] L2 index
|
|
||||||
| | +---------------------> [38:30] L1 index
|
|
||||||
| +-------------------------------> [47:39] L0 index
|
|
||||||
+-------------------------------------------------> [63] TTBR0/1
|
|
||||||
|
|
||||||
|
|
||||||
Translation table lookup with 64KB pages:
|
|
||||||
|
|
||||||
+--------+--------+--------+--------+--------+--------+--------+--------+
|
|
||||||
|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
|
|
||||||
+--------+--------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| | | | |
|
|
||||||
| | | | v
|
|
||||||
| | | | [15:0] in-page offset
|
|
||||||
| | | +----------> [28:16] L3 index
|
|
||||||
| | +--------------------------> [41:29] L2 index
|
|
||||||
| +-------------------------------> [47:42] L1 index
|
|
||||||
+-------------------------------------------------> [63] TTBR0/1
|
|
||||||
|
|
||||||
|
|
||||||
When using KVM without the Virtualization Host Extensions, the
|
|
||||||
hypervisor maps kernel pages in EL2 at a fixed (and potentially
|
|
||||||
random) offset from the linear mapping. See the kern_hyp_va macro and
|
|
||||||
kvm_update_va_mask function for more details. MMIO devices such as
|
|
||||||
GICv2 gets mapped next to the HYP idmap page, as do vectors when
|
|
||||||
ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
|
|
||||||
|
|
||||||
When using KVM with the Virtualization Host Extensions, no additional
|
|
||||||
mappings are created, since the host kernel runs directly in EL2.
|
|
|
@ -1,7 +1,9 @@
|
||||||
|
=======================================
|
||||||
Pointer authentication in AArch64 Linux
|
Pointer authentication in AArch64 Linux
|
||||||
=======================================
|
=======================================
|
||||||
|
|
||||||
Author: Mark Rutland <mark.rutland@arm.com>
|
Author: Mark Rutland <mark.rutland@arm.com>
|
||||||
|
|
||||||
Date: 2017-07-19
|
Date: 2017-07-19
|
||||||
|
|
||||||
This document briefly describes the provision of pointer authentication
|
This document briefly describes the provision of pointer authentication
|
|
@ -1,7 +1,9 @@
|
||||||
Silicon Errata and Software Workarounds
|
=======================================
|
||||||
=======================================
|
Silicon Errata and Software Workarounds
|
||||||
|
=======================================
|
||||||
|
|
||||||
Author: Will Deacon <will.deacon@arm.com>
|
Author: Will Deacon <will.deacon@arm.com>
|
||||||
|
|
||||||
Date : 27 November 2015
|
Date : 27 November 2015
|
||||||
|
|
||||||
It is an unfortunate fact of life that hardware is often produced with
|
It is an unfortunate fact of life that hardware is often produced with
|
||||||
|
@ -9,11 +11,13 @@ so-called "errata", which can cause it to deviate from the architecture
|
||||||
under specific circumstances. For hardware produced by ARM, these
|
under specific circumstances. For hardware produced by ARM, these
|
||||||
errata are broadly classified into the following categories:
|
errata are broadly classified into the following categories:
|
||||||
|
|
||||||
Category A: A critical error without a viable workaround.
|
========== ========================================================
|
||||||
Category B: A significant or critical error with an acceptable
|
Category A A critical error without a viable workaround.
|
||||||
|
Category B A significant or critical error with an acceptable
|
||||||
workaround.
|
workaround.
|
||||||
Category C: A minor error that is not expected to occur under normal
|
Category C A minor error that is not expected to occur under normal
|
||||||
operation.
|
operation.
|
||||||
|
========== ========================================================
|
||||||
|
|
||||||
For more information, consult one of the "Software Developers Errata
|
For more information, consult one of the "Software Developers Errata
|
||||||
Notice" documents available on infocenter.arm.com (registration
|
Notice" documents available on infocenter.arm.com (registration
|
||||||
|
@ -42,47 +46,86 @@ file acts as a registry of software workarounds in the Linux Kernel and
|
||||||
will be updated when new workarounds are committed and backported to
|
will be updated when new workarounds are committed and backported to
|
||||||
stable kernels.
|
stable kernels.
|
||||||
|
|
||||||
| Implementor | Component | Erratum ID | Kconfig |
|
|
||||||
+----------------+-----------------+-----------------+-----------------------------+
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
| Implementor | Component | Erratum ID | Kconfig |
|
||||||
|
+================+=================+=================+=============================+
|
||||||
| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
|
| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
|
||||||
| | | | |
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
|
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
|
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
|
| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 |
|
| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 |
|
| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 |
|
| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 |
|
| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A57 | #852523 | N/A |
|
| ARM | Cortex-A57 | #852523 | N/A |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 |
|
| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A72 | #853709 | N/A |
|
| ARM | Cortex-A72 | #853709 | N/A |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
|
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
|
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 |
|
| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 |
|
| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
|
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| ARM | MMU-500 | #841119,826419 | N/A |
|
| ARM | MMU-500 | #841119,826419 | N/A |
|
||||||
| | | | |
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 |
|
| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 |
|
| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
|
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
|
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
|
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
|
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX2 SMMUv3| #74 | N/A |
|
| Cavium | ThunderX2 SMMUv3| #74 | N/A |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Cavium | ThunderX2 SMMUv3| #126 | N/A |
|
| Cavium | ThunderX2 SMMUv3| #126 | N/A |
|
||||||
| | | | |
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
|
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
|
||||||
| | | | |
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 |
|
| Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Hisilicon | Hip0{6,7} | #161010701 | N/A |
|
| Hisilicon | Hip0{6,7} | #161010701 | N/A |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 |
|
| Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A |
|
| Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A |
|
||||||
| | | | |
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 |
|
| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Qualcomm Tech. | Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 |
|
| Qualcomm Tech. | Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 |
|
| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 |
|
| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
||||||
| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 |
|
| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 |
|
||||||
|
+----------------+-----------------+-----------------+-----------------------------+
|
|
@ -1,7 +1,9 @@
|
||||||
Scalable Vector Extension support for AArch64 Linux
|
===================================================
|
||||||
===================================================
|
Scalable Vector Extension support for AArch64 Linux
|
||||||
|
===================================================
|
||||||
|
|
||||||
Author: Dave Martin <Dave.Martin@arm.com>
|
Author: Dave Martin <Dave.Martin@arm.com>
|
||||||
|
|
||||||
Date: 4 August 2017
|
Date: 4 August 2017
|
||||||
|
|
||||||
This document outlines briefly the interface provided to userspace by Linux in
|
This document outlines briefly the interface provided to userspace by Linux in
|
||||||
|
@ -442,7 +444,7 @@ In A64 state, SVE adds the following:
|
||||||
|
|
||||||
* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
|
* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
|
||||||
operations in a similar way to the way in which they interact with ARMv8
|
operations in a similar way to the way in which they interact with ARMv8
|
||||||
floating-point operations.
|
floating-point operations::
|
||||||
|
|
||||||
8VL-1 128 0 bit index
|
8VL-1 128 0 bit index
|
||||||
+---- //// -----------------+
|
+---- //// -----------------+
|
||||||
|
@ -499,6 +501,8 @@ ARMv8-A defines the following floating-point / SIMD register state:
|
||||||
* 32 128-bit vector registers V0..V31
|
* 32 128-bit vector registers V0..V31
|
||||||
* 2 32-bit status/control registers FPSR, FPCR
|
* 2 32-bit status/control registers FPSR, FPCR
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
127 0 bit index
|
127 0 bit index
|
||||||
+---------------+
|
+---------------+
|
||||||
V0 | |
|
V0 | |
|
||||||
|
@ -533,7 +537,7 @@ References
|
||||||
[2] arch/arm64/include/uapi/asm/ptrace.h
|
[2] arch/arm64/include/uapi/asm/ptrace.h
|
||||||
AArch64 Linux ptrace ABI definitions
|
AArch64 Linux ptrace ABI definitions
|
||||||
|
|
||||||
[3] Documentation/arm64/cpu-feature-registers.txt
|
[3] Documentation/arm64/cpu-feature-registers.rst
|
||||||
|
|
||||||
[4] ARM IHI0055C
|
[4] ARM IHI0055C
|
||||||
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
|
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
|
|
@ -1,7 +1,9 @@
|
||||||
Tagged virtual addresses in AArch64 Linux
|
=========================================
|
||||||
=========================================
|
Tagged virtual addresses in AArch64 Linux
|
||||||
|
=========================================
|
||||||
|
|
||||||
Author: Will Deacon <will.deacon@arm.com>
|
Author: Will Deacon <will.deacon@arm.com>
|
||||||
|
|
||||||
Date : 12 June 2013
|
Date : 12 June 2013
|
||||||
|
|
||||||
This document briefly describes the provision of tagged virtual
|
This document briefly describes the provision of tagged virtual
|
|
@ -151,6 +151,7 @@ for the type. The maximum value of ``BTF_INT_BITS()`` is 128.
|
||||||
|
|
||||||
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
|
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
|
||||||
for this int. For example, a bitfield struct member has:
|
for this int. For example, a bitfield struct member has:
|
||||||
|
|
||||||
* btf member bit offset 100 from the start of the structure,
|
* btf member bit offset 100 from the start of the structure,
|
||||||
* btf member pointing to an int type,
|
* btf member pointing to an int type,
|
||||||
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
|
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
|
||||||
|
@ -160,6 +161,7 @@ from bits ``100 + 2 = 102``.
|
||||||
|
|
||||||
Alternatively, the bitfield struct member can be the following to access the
|
Alternatively, the bitfield struct member can be the following to access the
|
||||||
same bits as the above:
|
same bits as the above:
|
||||||
|
|
||||||
* btf member bit offset 102,
|
* btf member bit offset 102,
|
||||||
* btf member pointing to an int type,
|
* btf member pointing to an int type,
|
||||||
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
|
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
|
||||||
|
|
|
@ -1,21 +0,0 @@
|
||||||
LATEXFILE = cdrom-standard
|
|
||||||
|
|
||||||
all:
|
|
||||||
make clean
|
|
||||||
latex $(LATEXFILE)
|
|
||||||
latex $(LATEXFILE)
|
|
||||||
@if [ -x `which gv` ]; then \
|
|
||||||
`dvips -q -t letter -o $(LATEXFILE).ps $(LATEXFILE).dvi` ;\
|
|
||||||
`gv -antialias -media letter -nocenter $(LATEXFILE).ps` ;\
|
|
||||||
else \
|
|
||||||
`xdvi $(LATEXFILE).dvi &` ;\
|
|
||||||
fi
|
|
||||||
make sortofclean
|
|
||||||
|
|
||||||
clean:
|
|
||||||
rm -f $(LATEXFILE).ps $(LATEXFILE).dvi $(LATEXFILE).aux $(LATEXFILE).log
|
|
||||||
|
|
||||||
sortofclean:
|
|
||||||
rm -f $(LATEXFILE).aux $(LATEXFILE).log
|
|
||||||
|
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,18 +1,20 @@
|
||||||
IDE-CD driver documentation
|
IDE-CD driver documentation
|
||||||
Originally by scott snyder <snyder@fnald0.fnal.gov> (19 May 1996)
|
===========================
|
||||||
Carrying on the torch is: Erik Andersen <andersee@debian.org>
|
|
||||||
New maintainers (19 Oct 1998): Jens Axboe <axboe@image.dk>
|
:Originally by: scott snyder <snyder@fnald0.fnal.gov> (19 May 1996)
|
||||||
|
:Carrying on the torch is: Erik Andersen <andersee@debian.org>
|
||||||
|
:New maintainers (19 Oct 1998): Jens Axboe <axboe@image.dk>
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
The ide-cd driver should work with all ATAPI ver 1.2 to ATAPI 2.6 compliant
|
The ide-cd driver should work with all ATAPI ver 1.2 to ATAPI 2.6 compliant
|
||||||
CDROM drives which attach to an IDE interface. Note that some CDROM vendors
|
CDROM drives which attach to an IDE interface. Note that some CDROM vendors
|
||||||
(including Mitsumi, Sony, Creative, Aztech, and Goldstar) have made
|
(including Mitsumi, Sony, Creative, Aztech, and Goldstar) have made
|
||||||
both ATAPI-compliant drives and drives which use a proprietary
|
both ATAPI-compliant drives and drives which use a proprietary
|
||||||
interface. If your drive uses one of those proprietary interfaces,
|
interface. If your drive uses one of those proprietary interfaces,
|
||||||
this driver will not work with it (but one of the other CDROM drivers
|
this driver will not work with it (but one of the other CDROM drivers
|
||||||
probably will). This driver will not work with `ATAPI' drives which
|
probably will). This driver will not work with `ATAPI` drives which
|
||||||
attach to the parallel port. In addition, there is at least one drive
|
attach to the parallel port. In addition, there is at least one drive
|
||||||
(CyCDROM CR520ie) which attaches to the IDE port but is not ATAPI;
|
(CyCDROM CR520ie) which attaches to the IDE port but is not ATAPI;
|
||||||
this driver will not work with drives like that either (but see the
|
this driver will not work with drives like that either (but see the
|
||||||
|
@ -31,7 +33,7 @@ This driver provides the following features:
|
||||||
from audio tracks. The program cdda2wav can be used for this.
|
from audio tracks. The program cdda2wav can be used for this.
|
||||||
Note, however, that only some drives actually support this.
|
Note, however, that only some drives actually support this.
|
||||||
|
|
||||||
- There is now support for CDROM changers which comply with the
|
- There is now support for CDROM changers which comply with the
|
||||||
ATAPI 2.6 draft standard (such as the NEC CDR-251). This additional
|
ATAPI 2.6 draft standard (such as the NEC CDR-251). This additional
|
||||||
functionality includes a function call to query which slot is the
|
functionality includes a function call to query which slot is the
|
||||||
currently selected slot, a function call to query which slots contain
|
currently selected slot, a function call to query which slots contain
|
||||||
|
@ -45,22 +47,22 @@ This driver provides the following features:
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
0. The ide-cd relies on the ide disk driver. See
|
0. The ide-cd relies on the ide disk driver. See
|
||||||
Documentation/ide/ide.txt for up-to-date information on the ide
|
Documentation/ide/ide.rst for up-to-date information on the ide
|
||||||
driver.
|
driver.
|
||||||
|
|
||||||
1. Make sure that the ide and ide-cd drivers are compiled into the
|
1. Make sure that the ide and ide-cd drivers are compiled into the
|
||||||
kernel you're using. When configuring the kernel, in the section
|
kernel you're using. When configuring the kernel, in the section
|
||||||
entitled "Floppy, IDE, and other block devices", say either `Y'
|
entitled "Floppy, IDE, and other block devices", say either `Y`
|
||||||
(which will compile the support directly into the kernel) or `M'
|
(which will compile the support directly into the kernel) or `M`
|
||||||
(to compile support as a module which can be loaded and unloaded)
|
(to compile support as a module which can be loaded and unloaded)
|
||||||
to the options:
|
to the options::
|
||||||
|
|
||||||
ATA/ATAPI/MFM/RLL support
|
ATA/ATAPI/MFM/RLL support
|
||||||
Include IDE/ATAPI CDROM support
|
Include IDE/ATAPI CDROM support
|
||||||
|
|
||||||
Depending on what type of IDE interface you have, you may need to
|
Depending on what type of IDE interface you have, you may need to
|
||||||
specify additional configuration options. See
|
specify additional configuration options. See
|
||||||
Documentation/ide/ide.txt.
|
Documentation/ide/ide.rst.
|
||||||
|
|
||||||
2. You should also ensure that the iso9660 filesystem is either
|
2. You should also ensure that the iso9660 filesystem is either
|
||||||
compiled into the kernel or available as a loadable module. You
|
compiled into the kernel or available as a loadable module. You
|
||||||
|
@ -72,35 +74,35 @@ This driver provides the following features:
|
||||||
address and an IRQ number, the standard assignments being
|
address and an IRQ number, the standard assignments being
|
||||||
0x1f0 and 14 for the primary interface and 0x170 and 15 for the
|
0x1f0 and 14 for the primary interface and 0x170 and 15 for the
|
||||||
secondary interface. Each interface can control up to two devices,
|
secondary interface. Each interface can control up to two devices,
|
||||||
where each device can be a hard drive, a CDROM drive, a floppy drive,
|
where each device can be a hard drive, a CDROM drive, a floppy drive,
|
||||||
or a tape drive. The two devices on an interface are called `master'
|
or a tape drive. The two devices on an interface are called `master`
|
||||||
and `slave'; this is usually selectable via a jumper on the drive.
|
and `slave`; this is usually selectable via a jumper on the drive.
|
||||||
|
|
||||||
Linux names these devices as follows. The master and slave devices
|
Linux names these devices as follows. The master and slave devices
|
||||||
on the primary IDE interface are called `hda' and `hdb',
|
on the primary IDE interface are called `hda` and `hdb`,
|
||||||
respectively. The drives on the secondary interface are called
|
respectively. The drives on the secondary interface are called
|
||||||
`hdc' and `hdd'. (Interfaces at other locations get other letters
|
`hdc` and `hdd`. (Interfaces at other locations get other letters
|
||||||
in the third position; see Documentation/ide/ide.txt.)
|
in the third position; see Documentation/ide/ide.rst.)
|
||||||
|
|
||||||
If you want your CDROM drive to be found automatically by the
|
If you want your CDROM drive to be found automatically by the
|
||||||
driver, you should make sure your IDE interface uses either the
|
driver, you should make sure your IDE interface uses either the
|
||||||
primary or secondary addresses mentioned above. In addition, if
|
primary or secondary addresses mentioned above. In addition, if
|
||||||
the CDROM drive is the only device on the IDE interface, it should
|
the CDROM drive is the only device on the IDE interface, it should
|
||||||
be jumpered as `master'. (If for some reason you cannot configure
|
be jumpered as `master`. (If for some reason you cannot configure
|
||||||
your system in this manner, you can probably still use the driver.
|
your system in this manner, you can probably still use the driver.
|
||||||
You may have to pass extra configuration information to the kernel
|
You may have to pass extra configuration information to the kernel
|
||||||
when you boot, however. See Documentation/ide/ide.txt for more
|
when you boot, however. See Documentation/ide/ide.rst for more
|
||||||
information.)
|
information.)
|
||||||
|
|
||||||
4. Boot the system. If the drive is recognized, you should see a
|
4. Boot the system. If the drive is recognized, you should see a
|
||||||
message which looks like
|
message which looks like::
|
||||||
|
|
||||||
hdb: NEC CD-ROM DRIVE:260, ATAPI CDROM drive
|
hdb: NEC CD-ROM DRIVE:260, ATAPI CDROM drive
|
||||||
|
|
||||||
If you do not see this, see section 5 below.
|
If you do not see this, see section 5 below.
|
||||||
|
|
||||||
5. You may want to create a symbolic link /dev/cdrom pointing to the
|
5. You may want to create a symbolic link /dev/cdrom pointing to the
|
||||||
actual device. You can do this with the command
|
actual device. You can do this with the command::
|
||||||
|
|
||||||
ln -s /dev/hdX /dev/cdrom
|
ln -s /dev/hdX /dev/cdrom
|
||||||
|
|
||||||
|
@ -108,14 +110,14 @@ This driver provides the following features:
|
||||||
drive is installed.
|
drive is installed.
|
||||||
|
|
||||||
6. You should be able to see any error messages from the driver with
|
6. You should be able to see any error messages from the driver with
|
||||||
the `dmesg' command.
|
the `dmesg` command.
|
||||||
|
|
||||||
|
|
||||||
3. Basic usage
|
3. Basic usage
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
An ISO 9660 CDROM can be mounted by putting the disc in the drive and
|
An ISO 9660 CDROM can be mounted by putting the disc in the drive and
|
||||||
typing (as root)
|
typing (as root)::
|
||||||
|
|
||||||
mount -t iso9660 /dev/cdrom /mnt/cdrom
|
mount -t iso9660 /dev/cdrom /mnt/cdrom
|
||||||
|
|
||||||
|
@ -123,7 +125,7 @@ where it is assumed that /dev/cdrom is a link pointing to the actual
|
||||||
device (as described in step 5 of the last section) and /mnt/cdrom is
|
device (as described in step 5 of the last section) and /mnt/cdrom is
|
||||||
an empty directory. You should now be able to see the contents of the
|
an empty directory. You should now be able to see the contents of the
|
||||||
CDROM under the /mnt/cdrom directory. If you want to eject the CDROM,
|
CDROM under the /mnt/cdrom directory. If you want to eject the CDROM,
|
||||||
you must first dismount it with a command like
|
you must first dismount it with a command like::
|
||||||
|
|
||||||
umount /mnt/cdrom
|
umount /mnt/cdrom
|
||||||
|
|
||||||
|
@ -148,7 +150,7 @@ such as cdda2wav. The only types of drive which I've heard support
|
||||||
this are Sony and Toshiba drives. You will get errors if you try to
|
this are Sony and Toshiba drives. You will get errors if you try to
|
||||||
use this function on a drive which does not support it.
|
use this function on a drive which does not support it.
|
||||||
|
|
||||||
For supported changers, you can use the `cdchange' program (appended to
|
For supported changers, you can use the `cdchange` program (appended to
|
||||||
the end of this file) to switch between changer slots. Note that the
|
the end of this file) to switch between changer slots. Note that the
|
||||||
drive should be unmounted before attempting this. The program takes
|
drive should be unmounted before attempting this. The program takes
|
||||||
two arguments: the CDROM device, and the slot number to which you wish
|
two arguments: the CDROM device, and the slot number to which you wish
|
||||||
|
@ -161,17 +163,17 @@ to change. If the slot number is -1, the drive is unloaded.
|
||||||
This section discusses some common problems encountered when trying to
|
This section discusses some common problems encountered when trying to
|
||||||
use the driver, and some possible solutions. Note that if you are
|
use the driver, and some possible solutions. Note that if you are
|
||||||
experiencing problems, you should probably also review
|
experiencing problems, you should probably also review
|
||||||
Documentation/ide/ide.txt for current information about the underlying
|
Documentation/ide/ide.rst for current information about the underlying
|
||||||
IDE support code. Some of these items apply only to earlier versions
|
IDE support code. Some of these items apply only to earlier versions
|
||||||
of the driver, but are mentioned here for completeness.
|
of the driver, but are mentioned here for completeness.
|
||||||
|
|
||||||
In most cases, you should probably check with `dmesg' for any errors
|
In most cases, you should probably check with `dmesg` for any errors
|
||||||
from the driver.
|
from the driver.
|
||||||
|
|
||||||
a. Drive is not detected during booting.
|
a. Drive is not detected during booting.
|
||||||
|
|
||||||
- Review the configuration instructions above and in
|
- Review the configuration instructions above and in
|
||||||
Documentation/ide/ide.txt, and check how your hardware is
|
Documentation/ide/ide.rst, and check how your hardware is
|
||||||
configured.
|
configured.
|
||||||
|
|
||||||
- If your drive is the only device on an IDE interface, it should
|
- If your drive is the only device on an IDE interface, it should
|
||||||
|
@ -179,14 +181,14 @@ a. Drive is not detected during booting.
|
||||||
|
|
||||||
- If your IDE interface is not at the standard addresses of 0x170
|
- If your IDE interface is not at the standard addresses of 0x170
|
||||||
or 0x1f0, you'll need to explicitly inform the driver using a
|
or 0x1f0, you'll need to explicitly inform the driver using a
|
||||||
lilo option. See Documentation/ide/ide.txt. (This feature was
|
lilo option. See Documentation/ide/ide.rst. (This feature was
|
||||||
added around kernel version 1.3.30.)
|
added around kernel version 1.3.30.)
|
||||||
|
|
||||||
- If the autoprobing is not finding your drive, you can tell the
|
- If the autoprobing is not finding your drive, you can tell the
|
||||||
driver to assume that one exists by using a lilo option of the
|
driver to assume that one exists by using a lilo option of the
|
||||||
form `hdX=cdrom', where X is the drive letter corresponding to
|
form `hdX=cdrom`, where X is the drive letter corresponding to
|
||||||
where your drive is installed. Note that if you do this and you
|
where your drive is installed. Note that if you do this and you
|
||||||
see a boot message like
|
see a boot message like::
|
||||||
|
|
||||||
hdX: ATAPI cdrom (?)
|
hdX: ATAPI cdrom (?)
|
||||||
|
|
||||||
|
@ -205,7 +207,7 @@ a. Drive is not detected during booting.
|
||||||
Support for some interfaces needing extra initialization is
|
Support for some interfaces needing extra initialization is
|
||||||
provided in later 1.3.x kernels. You may need to turn on
|
provided in later 1.3.x kernels. You may need to turn on
|
||||||
additional kernel configuration options to get them to work;
|
additional kernel configuration options to get them to work;
|
||||||
see Documentation/ide/ide.txt.
|
see Documentation/ide/ide.rst.
|
||||||
|
|
||||||
Even if support is not available for your interface, you may be
|
Even if support is not available for your interface, you may be
|
||||||
able to get it to work with the following procedure. First boot
|
able to get it to work with the following procedure. First boot
|
||||||
|
@ -220,7 +222,7 @@ b. Timeout/IRQ errors.
|
||||||
probably not making it to the host.
|
probably not making it to the host.
|
||||||
|
|
||||||
- IRQ problems may also be indicated by the message
|
- IRQ problems may also be indicated by the message
|
||||||
`IRQ probe failed (<n>)' while booting. If <n> is zero, that
|
`IRQ probe failed (<n>)` while booting. If <n> is zero, that
|
||||||
means that the system did not see an interrupt from the drive when
|
means that the system did not see an interrupt from the drive when
|
||||||
it was expecting one (on any feasible IRQ). If <n> is negative,
|
it was expecting one (on any feasible IRQ). If <n> is negative,
|
||||||
that means the system saw interrupts on multiple IRQ lines, when
|
that means the system saw interrupts on multiple IRQ lines, when
|
||||||
|
@ -240,27 +242,27 @@ b. Timeout/IRQ errors.
|
||||||
there are hardware problems with the interrupt setup; they
|
there are hardware problems with the interrupt setup; they
|
||||||
apparently don't use interrupts.
|
apparently don't use interrupts.
|
||||||
|
|
||||||
- If you own a Pioneer DR-A24X, you _will_ get nasty error messages
|
- If you own a Pioneer DR-A24X, you _will_ get nasty error messages
|
||||||
on boot such as "irq timeout: status=0x50 { DriveReady SeekComplete }"
|
on boot such as "irq timeout: status=0x50 { DriveReady SeekComplete }"
|
||||||
The Pioneer DR-A24X CDROM drives are fairly popular these days.
|
The Pioneer DR-A24X CDROM drives are fairly popular these days.
|
||||||
Unfortunately, these drives seem to become very confused when we perform
|
Unfortunately, these drives seem to become very confused when we perform
|
||||||
the standard Linux ATA disk drive probe. If you own one of these drives,
|
the standard Linux ATA disk drive probe. If you own one of these drives,
|
||||||
you can bypass the ATA probing which confuses these CDROM drives, by
|
you can bypass the ATA probing which confuses these CDROM drives, by
|
||||||
adding `append="hdX=noprobe hdX=cdrom"' to your lilo.conf file and running
|
adding `append="hdX=noprobe hdX=cdrom"` to your lilo.conf file and running
|
||||||
lilo (again where X is the drive letter corresponding to where your drive
|
lilo (again where X is the drive letter corresponding to where your drive
|
||||||
is installed.)
|
is installed.)
|
||||||
|
|
||||||
c. System hangups.
|
c. System hangups.
|
||||||
|
|
||||||
- If the system locks up when you try to access the CDROM, the most
|
- If the system locks up when you try to access the CDROM, the most
|
||||||
likely cause is that you have a buggy IDE adapter which doesn't
|
likely cause is that you have a buggy IDE adapter which doesn't
|
||||||
properly handle simultaneous transactions on multiple interfaces.
|
properly handle simultaneous transactions on multiple interfaces.
|
||||||
The most notorious of these is the CMD640B chip. This problem can
|
The most notorious of these is the CMD640B chip. This problem can
|
||||||
be worked around by specifying the `serialize' option when
|
be worked around by specifying the `serialize` option when
|
||||||
booting. Recent kernels should be able to detect the need for
|
booting. Recent kernels should be able to detect the need for
|
||||||
this automatically in most cases, but the detection is not
|
this automatically in most cases, but the detection is not
|
||||||
foolproof. See Documentation/ide/ide.txt for more information
|
foolproof. See Documentation/ide/ide.rst for more information
|
||||||
about the `serialize' option and the CMD640B.
|
about the `serialize` option and the CMD640B.
|
||||||
|
|
||||||
- Note that many MS-DOS CDROM drivers will work with such buggy
|
- Note that many MS-DOS CDROM drivers will work with such buggy
|
||||||
hardware, apparently because they never attempt to overlap CDROM
|
hardware, apparently because they never attempt to overlap CDROM
|
||||||
|
@ -269,14 +271,14 @@ c. System hangups.
|
||||||
|
|
||||||
d. Can't mount a CDROM.
|
d. Can't mount a CDROM.
|
||||||
|
|
||||||
- If you get errors from mount, it may help to check `dmesg' to see
|
- If you get errors from mount, it may help to check `dmesg` to see
|
||||||
if there are any more specific errors from the driver or from the
|
if there are any more specific errors from the driver or from the
|
||||||
filesystem.
|
filesystem.
|
||||||
|
|
||||||
- Make sure there's a CDROM loaded in the drive, and that's it's an
|
- Make sure there's a CDROM loaded in the drive, and that's it's an
|
||||||
ISO 9660 disc. You can't mount an audio CD.
|
ISO 9660 disc. You can't mount an audio CD.
|
||||||
|
|
||||||
- With the CDROM in the drive and unmounted, try something like
|
- With the CDROM in the drive and unmounted, try something like::
|
||||||
|
|
||||||
cat /dev/cdrom | od | more
|
cat /dev/cdrom | od | more
|
||||||
|
|
||||||
|
@ -284,9 +286,9 @@ d. Can't mount a CDROM.
|
||||||
OK, and the problem is at the filesystem level (i.e., the CDROM is
|
OK, and the problem is at the filesystem level (i.e., the CDROM is
|
||||||
not ISO 9660 or has errors in the filesystem structure).
|
not ISO 9660 or has errors in the filesystem structure).
|
||||||
|
|
||||||
- If you see `not a block device' errors, check that the definitions
|
- If you see `not a block device` errors, check that the definitions
|
||||||
of the device special files are correct. They should be as
|
of the device special files are correct. They should be as
|
||||||
follows:
|
follows::
|
||||||
|
|
||||||
brw-rw---- 1 root disk 3, 0 Nov 11 18:48 /dev/hda
|
brw-rw---- 1 root disk 3, 0 Nov 11 18:48 /dev/hda
|
||||||
brw-rw---- 1 root disk 3, 64 Nov 11 18:48 /dev/hdb
|
brw-rw---- 1 root disk 3, 64 Nov 11 18:48 /dev/hdb
|
||||||
|
@ -301,7 +303,7 @@ d. Can't mount a CDROM.
|
||||||
If you have a /dev/cdrom symbolic link, check that it is pointing
|
If you have a /dev/cdrom symbolic link, check that it is pointing
|
||||||
to the correct device file.
|
to the correct device file.
|
||||||
|
|
||||||
If you hear people talking of the devices `hd1a' and `hd1b', these
|
If you hear people talking of the devices `hd1a` and `hd1b`, these
|
||||||
were old names for what are now called hdc and hdd. Those names
|
were old names for what are now called hdc and hdd. Those names
|
||||||
should be considered obsolete.
|
should be considered obsolete.
|
||||||
|
|
||||||
|
@ -311,8 +313,8 @@ d. Can't mount a CDROM.
|
||||||
always give meaningful error messages.
|
always give meaningful error messages.
|
||||||
|
|
||||||
|
|
||||||
e. Directory listings are unpredictably truncated, and `dmesg' shows
|
e. Directory listings are unpredictably truncated, and `dmesg` shows
|
||||||
`buffer botch' error messages from the driver.
|
`buffer botch` error messages from the driver.
|
||||||
|
|
||||||
- There was a bug in the version of the driver in 1.2.x kernels
|
- There was a bug in the version of the driver in 1.2.x kernels
|
||||||
which could cause this. It was fixed in 1.3.0. If you can't
|
which could cause this. It was fixed in 1.3.0. If you can't
|
||||||
|
@ -335,34 +337,36 @@ f. Data corruption.
|
||||||
5. cdchange.c
|
5. cdchange.c
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
/*
|
::
|
||||||
* cdchange.c [-v] <device> [<slot>]
|
|
||||||
*
|
|
||||||
* This loads a CDROM from a specified slot in a changer, and displays
|
|
||||||
* information about the changer status. The drive should be unmounted before
|
|
||||||
* using this program.
|
|
||||||
*
|
|
||||||
* Changer information is displayed if either the -v flag is specified
|
|
||||||
* or no slot was specified.
|
|
||||||
*
|
|
||||||
* Based on code originally from Gerhard Zuber <zuber@berlin.snafu.de>.
|
|
||||||
* Changer status information, and rewrite for the new Uniform CDROM driver
|
|
||||||
* interface by Erik Andersen <andersee@debian.org>.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#include <stdio.h>
|
/*
|
||||||
#include <stdlib.h>
|
* cdchange.c [-v] <device> [<slot>]
|
||||||
#include <errno.h>
|
*
|
||||||
#include <string.h>
|
* This loads a CDROM from a specified slot in a changer, and displays
|
||||||
#include <unistd.h>
|
* information about the changer status. The drive should be unmounted before
|
||||||
#include <fcntl.h>
|
* using this program.
|
||||||
#include <sys/ioctl.h>
|
*
|
||||||
#include <linux/cdrom.h>
|
* Changer information is displayed if either the -v flag is specified
|
||||||
|
* or no slot was specified.
|
||||||
|
*
|
||||||
|
* Based on code originally from Gerhard Zuber <zuber@berlin.snafu.de>.
|
||||||
|
* Changer status information, and rewrite for the new Uniform CDROM driver
|
||||||
|
* interface by Erik Andersen <andersee@debian.org>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <sys/ioctl.h>
|
||||||
|
#include <linux/cdrom.h>
|
||||||
|
|
||||||
|
|
||||||
int
|
int
|
||||||
main (int argc, char **argv)
|
main (int argc, char **argv)
|
||||||
{
|
{
|
||||||
char *program;
|
char *program;
|
||||||
char *device;
|
char *device;
|
||||||
int fd; /* file descriptor for CD-ROM device */
|
int fd; /* file descriptor for CD-ROM device */
|
||||||
|
@ -382,30 +386,30 @@ main (int argc, char **argv)
|
||||||
fprintf (stderr, " Slots are numbered 1 -- n.\n");
|
fprintf (stderr, " Slots are numbered 1 -- n.\n");
|
||||||
exit (1);
|
exit (1);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (strcmp (argv[0], "-v") == 0) {
|
if (strcmp (argv[0], "-v") == 0) {
|
||||||
verbose = 1;
|
verbose = 1;
|
||||||
++argv;
|
++argv;
|
||||||
--argc;
|
--argc;
|
||||||
}
|
}
|
||||||
|
|
||||||
device = argv[0];
|
device = argv[0];
|
||||||
|
|
||||||
if (argc == 2)
|
if (argc == 2)
|
||||||
slot = atoi (argv[1]) - 1;
|
slot = atoi (argv[1]) - 1;
|
||||||
|
|
||||||
/* open device */
|
/* open device */
|
||||||
fd = open(device, O_RDONLY | O_NONBLOCK);
|
fd = open(device, O_RDONLY | O_NONBLOCK);
|
||||||
if (fd < 0) {
|
if (fd < 0) {
|
||||||
fprintf (stderr, "%s: open failed for `%s': %s\n",
|
fprintf (stderr, "%s: open failed for `%s`: %s\n",
|
||||||
program, device, strerror (errno));
|
program, device, strerror (errno));
|
||||||
exit (1);
|
exit (1);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Check CD player status */
|
/* Check CD player status */
|
||||||
total_slots_available = ioctl (fd, CDROM_CHANGER_NSLOTS);
|
total_slots_available = ioctl (fd, CDROM_CHANGER_NSLOTS);
|
||||||
if (total_slots_available <= 1 ) {
|
if (total_slots_available <= 1 ) {
|
||||||
fprintf (stderr, "%s: Device `%s' is not an ATAPI "
|
fprintf (stderr, "%s: Device `%s` is not an ATAPI "
|
||||||
"compliant CD changer.\n", program, device);
|
"compliant CD changer.\n", program, device);
|
||||||
exit (1);
|
exit (1);
|
||||||
}
|
}
|
||||||
|
@ -418,7 +422,7 @@ main (int argc, char **argv)
|
||||||
exit (1);
|
exit (1);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* load */
|
/* load */
|
||||||
slot=ioctl (fd, CDROM_SELECT_DISC, slot);
|
slot=ioctl (fd, CDROM_SELECT_DISC, slot);
|
||||||
if (slot<0) {
|
if (slot<0) {
|
||||||
fflush(stdout);
|
fflush(stdout);
|
||||||
|
@ -462,14 +466,14 @@ main (int argc, char **argv)
|
||||||
|
|
||||||
for (x_slot=0; x_slot<total_slots_available; x_slot++) {
|
for (x_slot=0; x_slot<total_slots_available; x_slot++) {
|
||||||
printf ("Slot %2d: ", x_slot+1);
|
printf ("Slot %2d: ", x_slot+1);
|
||||||
status = ioctl (fd, CDROM_DRIVE_STATUS, x_slot);
|
status = ioctl (fd, CDROM_DRIVE_STATUS, x_slot);
|
||||||
if (status<0) {
|
if (status<0) {
|
||||||
perror(" CDROM_DRIVE_STATUS");
|
perror(" CDROM_DRIVE_STATUS");
|
||||||
} else switch(status) {
|
} else switch(status) {
|
||||||
case CDS_DISC_OK:
|
case CDS_DISC_OK:
|
||||||
printf ("Disc present.");
|
printf ("Disc present.");
|
||||||
break;
|
break;
|
||||||
case CDS_NO_DISC:
|
case CDS_NO_DISC:
|
||||||
printf ("Empty slot.");
|
printf ("Empty slot.");
|
||||||
break;
|
break;
|
||||||
case CDS_TRAY_OPEN:
|
case CDS_TRAY_OPEN:
|
||||||
|
@ -507,11 +511,11 @@ main (int argc, char **argv)
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
status = ioctl (fd, CDROM_MEDIA_CHANGED, x_slot);
|
status = ioctl (fd, CDROM_MEDIA_CHANGED, x_slot);
|
||||||
if (status<0) {
|
if (status<0) {
|
||||||
perror(" CDROM_MEDIA_CHANGED");
|
perror(" CDROM_MEDIA_CHANGED");
|
||||||
}
|
}
|
||||||
switch (status) {
|
switch (status) {
|
||||||
case 1:
|
case 1:
|
||||||
printf ("Changed.\n");
|
printf ("Changed.\n");
|
||||||
break;
|
break;
|
||||||
|
@ -525,10 +529,10 @@ main (int argc, char **argv)
|
||||||
/* close device */
|
/* close device */
|
||||||
status = close (fd);
|
status = close (fd);
|
||||||
if (status != 0) {
|
if (status != 0) {
|
||||||
fprintf (stderr, "%s: close failed for `%s': %s\n",
|
fprintf (stderr, "%s: close failed for `%s`: %s\n",
|
||||||
program, device, strerror (errno));
|
program, device, strerror (errno));
|
||||||
exit (1);
|
exit (1);
|
||||||
}
|
}
|
||||||
|
|
||||||
exit (0);
|
exit (0);
|
||||||
}
|
}
|
|
@ -0,0 +1,19 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
=====
|
||||||
|
cdrom
|
||||||
|
=====
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
cdrom-standard
|
||||||
|
ide-cd
|
||||||
|
packet-writing
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,3 +1,7 @@
|
||||||
|
==============
|
||||||
|
Packet writing
|
||||||
|
==============
|
||||||
|
|
||||||
Getting started quick
|
Getting started quick
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
|
@ -10,13 +14,16 @@ Getting started quick
|
||||||
Download from http://sourceforge.net/projects/linux-udf/
|
Download from http://sourceforge.net/projects/linux-udf/
|
||||||
|
|
||||||
- Grab a new CD-RW disc and format it (assuming CD-RW is hdc, substitute
|
- Grab a new CD-RW disc and format it (assuming CD-RW is hdc, substitute
|
||||||
as appropriate):
|
as appropriate)::
|
||||||
|
|
||||||
# cdrwtool -d /dev/hdc -q
|
# cdrwtool -d /dev/hdc -q
|
||||||
|
|
||||||
- Setup your writer
|
- Setup your writer::
|
||||||
|
|
||||||
# pktsetup dev_name /dev/hdc
|
# pktsetup dev_name /dev/hdc
|
||||||
|
|
||||||
- Now you can mount /dev/pktcdvd/dev_name and copy files to it. Enjoy!
|
- Now you can mount /dev/pktcdvd/dev_name and copy files to it. Enjoy::
|
||||||
|
|
||||||
# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
|
# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
|
||||||
|
|
||||||
|
|
||||||
|
@ -25,11 +32,11 @@ Packet writing for DVD-RW media
|
||||||
|
|
||||||
DVD-RW discs can be written to much like CD-RW discs if they are in
|
DVD-RW discs can be written to much like CD-RW discs if they are in
|
||||||
the so called "restricted overwrite" mode. To put a disc in restricted
|
the so called "restricted overwrite" mode. To put a disc in restricted
|
||||||
overwrite mode, run:
|
overwrite mode, run::
|
||||||
|
|
||||||
# dvd+rw-format /dev/hdc
|
# dvd+rw-format /dev/hdc
|
||||||
|
|
||||||
You can then use the disc the same way you would use a CD-RW disc:
|
You can then use the disc the same way you would use a CD-RW disc::
|
||||||
|
|
||||||
# pktsetup dev_name /dev/hdc
|
# pktsetup dev_name /dev/hdc
|
||||||
# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
|
# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
|
||||||
|
@ -41,7 +48,7 @@ Packet writing for DVD+RW media
|
||||||
According to the DVD+RW specification, a drive supporting DVD+RW discs
|
According to the DVD+RW specification, a drive supporting DVD+RW discs
|
||||||
shall implement "true random writes with 2KB granularity", which means
|
shall implement "true random writes with 2KB granularity", which means
|
||||||
that it should be possible to put any filesystem with a block size >=
|
that it should be possible to put any filesystem with a block size >=
|
||||||
2KB on such a disc. For example, it should be possible to do:
|
2KB on such a disc. For example, it should be possible to do::
|
||||||
|
|
||||||
# dvd+rw-format /dev/hdc (only needed if the disc has never
|
# dvd+rw-format /dev/hdc (only needed if the disc has never
|
||||||
been formatted)
|
been formatted)
|
||||||
|
@ -54,7 +61,7 @@ follow the specification, but suffer bad performance problems if the
|
||||||
writes are not 32KB aligned.
|
writes are not 32KB aligned.
|
||||||
|
|
||||||
Both problems can be solved by using the pktcdvd driver, which always
|
Both problems can be solved by using the pktcdvd driver, which always
|
||||||
generates aligned writes.
|
generates aligned writes::
|
||||||
|
|
||||||
# dvd+rw-format /dev/hdc
|
# dvd+rw-format /dev/hdc
|
||||||
# pktsetup dev_name /dev/hdc
|
# pktsetup dev_name /dev/hdc
|
||||||
|
@ -83,7 +90,7 @@ Notes
|
||||||
|
|
||||||
- Since the pktcdvd driver makes the disc appear as a regular block
|
- Since the pktcdvd driver makes the disc appear as a regular block
|
||||||
device with a 2KB block size, you can put any filesystem you like on
|
device with a 2KB block size, you can put any filesystem you like on
|
||||||
the disc. For example, run:
|
the disc. For example, run::
|
||||||
|
|
||||||
# /sbin/mke2fs /dev/pktcdvd/dev_name
|
# /sbin/mke2fs /dev/pktcdvd/dev_name
|
||||||
|
|
||||||
|
@ -97,7 +104,7 @@ Since Linux 2.6.20, the pktcdvd module has a sysfs interface
|
||||||
and can be controlled by it. For example the "pktcdvd" tool uses
|
and can be controlled by it. For example the "pktcdvd" tool uses
|
||||||
this interface. (see http://tom.ist-im-web.de/download/pktcdvd )
|
this interface. (see http://tom.ist-im-web.de/download/pktcdvd )
|
||||||
|
|
||||||
"pktcdvd" works similar to "pktsetup", e.g.:
|
"pktcdvd" works similar to "pktsetup", e.g.::
|
||||||
|
|
||||||
# pktcdvd -a dev_name /dev/hdc
|
# pktcdvd -a dev_name /dev/hdc
|
||||||
# mkudffs /dev/pktcdvd/dev_name
|
# mkudffs /dev/pktcdvd/dev_name
|
||||||
|
@ -115,7 +122,7 @@ For a description of the sysfs interface look into the file:
|
||||||
Using the pktcdvd debugfs interface
|
Using the pktcdvd debugfs interface
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
|
|
||||||
To read pktcdvd device infos in human readable form, do:
|
To read pktcdvd device infos in human readable form, do::
|
||||||
|
|
||||||
# cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
|
# cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
|
||||||
|
|
|
@ -34,7 +34,8 @@ needs_sphinx = '1.3'
|
||||||
# Add any Sphinx extension module names here, as strings. They can be
|
# Add any Sphinx extension module names here, as strings. They can be
|
||||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||||
# ones.
|
# ones.
|
||||||
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
|
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
|
||||||
|
'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
|
||||||
|
|
||||||
# The name of the math extension changed on Sphinx 1.4
|
# The name of the math extension changed on Sphinx 1.4
|
||||||
if (major == 1 and minor > 3) or (major > 1):
|
if (major == 1 and minor > 3) or (major > 1):
|
||||||
|
@ -200,7 +201,7 @@ html_context = {
|
||||||
|
|
||||||
# If true, SmartyPants will be used to convert quotes and dashes to
|
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||||
# typographically correct entities.
|
# typographically correct entities.
|
||||||
#html_use_smartypants = True
|
html_use_smartypants = False
|
||||||
|
|
||||||
# Custom sidebar templates, maps document names to template names.
|
# Custom sidebar templates, maps document names to template names.
|
||||||
#html_sidebars = {}
|
#html_sidebars = {}
|
||||||
|
|
|
@ -34,6 +34,8 @@ Core utilities
|
||||||
timekeeping
|
timekeeping
|
||||||
boot-time-mm
|
boot-time-mm
|
||||||
memory-hotplug
|
memory-hotplug
|
||||||
|
protection-keys
|
||||||
|
../RCU/index
|
||||||
|
|
||||||
|
|
||||||
Interfaces for kernel debugging
|
Interfaces for kernel debugging
|
||||||
|
|
|
@ -33,6 +33,9 @@ String Conversions
|
||||||
.. kernel-doc:: lib/kstrtox.c
|
.. kernel-doc:: lib/kstrtox.c
|
||||||
:export:
|
:export:
|
||||||
|
|
||||||
|
.. kernel-doc:: lib/string_helpers.c
|
||||||
|
:export:
|
||||||
|
|
||||||
String Manipulation
|
String Manipulation
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
|
@ -138,6 +141,15 @@ Base 2 log and power Functions
|
||||||
.. kernel-doc:: include/linux/log2.h
|
.. kernel-doc:: include/linux/log2.h
|
||||||
:internal:
|
:internal:
|
||||||
|
|
||||||
|
Integer power Functions
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
.. kernel-doc:: lib/math/int_pow.c
|
||||||
|
:export:
|
||||||
|
|
||||||
|
.. kernel-doc:: lib/math/int_sqrt.c
|
||||||
|
:export:
|
||||||
|
|
||||||
Division Functions
|
Division Functions
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
|
@ -358,8 +370,6 @@ Read-Copy Update (RCU)
|
||||||
|
|
||||||
.. kernel-doc:: kernel/rcu/tree.c
|
.. kernel-doc:: kernel/rcu/tree.c
|
||||||
|
|
||||||
.. kernel-doc:: kernel/rcu/tree_plugin.h
|
|
||||||
|
|
||||||
.. kernel-doc:: kernel/rcu/tree_exp.h
|
.. kernel-doc:: kernel/rcu/tree_exp.h
|
||||||
|
|
||||||
.. kernel-doc:: kernel/rcu/update.c
|
.. kernel-doc:: kernel/rcu/update.c
|
||||||
|
|
|
@ -115,7 +115,7 @@ Some additional variants exist for more specialized cases:
|
||||||
void ktime_get_coarse_clocktai_ts64( struct timespec64 * )
|
void ktime_get_coarse_clocktai_ts64( struct timespec64 * )
|
||||||
|
|
||||||
These are quicker than the non-coarse versions, but less accurate,
|
These are quicker than the non-coarse versions, but less accurate,
|
||||||
corresponding to CLOCK_MONONOTNIC_COARSE and CLOCK_REALTIME_COARSE
|
corresponding to CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE
|
||||||
in user space, along with the equivalent boottime/tai/raw
|
in user space, along with the equivalent boottime/tai/raw
|
||||||
timebase not available in user space.
|
timebase not available in user space.
|
||||||
|
|
||||||
|
|
|
@ -30,27 +30,27 @@ it called marks. Each mark may be set or cleared independently of
|
||||||
the others. You can iterate over entries which are marked.
|
the others. You can iterate over entries which are marked.
|
||||||
|
|
||||||
Normal pointers may be stored in the XArray directly. They must be 4-byte
|
Normal pointers may be stored in the XArray directly. They must be 4-byte
|
||||||
aligned, which is true for any pointer returned from :c:func:`kmalloc` and
|
aligned, which is true for any pointer returned from kmalloc() and
|
||||||
:c:func:`alloc_page`. It isn't true for arbitrary user-space pointers,
|
alloc_page(). It isn't true for arbitrary user-space pointers,
|
||||||
nor for function pointers. You can store pointers to statically allocated
|
nor for function pointers. You can store pointers to statically allocated
|
||||||
objects, as long as those objects have an alignment of at least 4.
|
objects, as long as those objects have an alignment of at least 4.
|
||||||
|
|
||||||
You can also store integers between 0 and ``LONG_MAX`` in the XArray.
|
You can also store integers between 0 and ``LONG_MAX`` in the XArray.
|
||||||
You must first convert it into an entry using :c:func:`xa_mk_value`.
|
You must first convert it into an entry using xa_mk_value().
|
||||||
When you retrieve an entry from the XArray, you can check whether it is
|
When you retrieve an entry from the XArray, you can check whether it is
|
||||||
a value entry by calling :c:func:`xa_is_value`, and convert it back to
|
a value entry by calling xa_is_value(), and convert it back to
|
||||||
an integer by calling :c:func:`xa_to_value`.
|
an integer by calling xa_to_value().
|
||||||
|
|
||||||
Some users want to store tagged pointers instead of using the marks
|
Some users want to store tagged pointers instead of using the marks
|
||||||
described above. They can call :c:func:`xa_tag_pointer` to create an
|
described above. They can call xa_tag_pointer() to create an
|
||||||
entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
|
entry with a tag, xa_untag_pointer() to turn a tagged entry
|
||||||
back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
|
back into an untagged pointer and xa_pointer_tag() to retrieve
|
||||||
the tag of an entry. Tagged pointers use the same bits that are used
|
the tag of an entry. Tagged pointers use the same bits that are used
|
||||||
to distinguish value entries from normal pointers, so each user must
|
to distinguish value entries from normal pointers, so each user must
|
||||||
decide whether they want to store value entries or tagged pointers in
|
decide whether they want to store value entries or tagged pointers in
|
||||||
any particular XArray.
|
any particular XArray.
|
||||||
|
|
||||||
The XArray does not support storing :c:func:`IS_ERR` pointers as some
|
The XArray does not support storing IS_ERR() pointers as some
|
||||||
conflict with value entries or internal entries.
|
conflict with value entries or internal entries.
|
||||||
|
|
||||||
An unusual feature of the XArray is the ability to create entries which
|
An unusual feature of the XArray is the ability to create entries which
|
||||||
|
@ -64,89 +64,89 @@ entry will cause the XArray to forget about the range.
|
||||||
Normal API
|
Normal API
|
||||||
==========
|
==========
|
||||||
|
|
||||||
Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
|
Start by initialising an XArray, either with DEFINE_XARRAY()
|
||||||
for statically allocated XArrays or :c:func:`xa_init` for dynamically
|
for statically allocated XArrays or xa_init() for dynamically
|
||||||
allocated ones. A freshly-initialised XArray contains a ``NULL``
|
allocated ones. A freshly-initialised XArray contains a ``NULL``
|
||||||
pointer at every index.
|
pointer at every index.
|
||||||
|
|
||||||
You can then set entries using :c:func:`xa_store` and get entries
|
You can then set entries using xa_store() and get entries
|
||||||
using :c:func:`xa_load`. xa_store will overwrite any entry with the
|
using xa_load(). xa_store will overwrite any entry with the
|
||||||
new entry and return the previous entry stored at that index. You can
|
new entry and return the previous entry stored at that index. You can
|
||||||
use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
|
use xa_erase() instead of calling xa_store() with a
|
||||||
``NULL`` entry. There is no difference between an entry that has never
|
``NULL`` entry. There is no difference between an entry that has never
|
||||||
been stored to, one that has been erased and one that has most recently
|
been stored to, one that has been erased and one that has most recently
|
||||||
had ``NULL`` stored to it.
|
had ``NULL`` stored to it.
|
||||||
|
|
||||||
You can conditionally replace an entry at an index by using
|
You can conditionally replace an entry at an index by using
|
||||||
:c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if
|
xa_cmpxchg(). Like cmpxchg(), it will only succeed if
|
||||||
the entry at that index has the 'old' value. It also returns the entry
|
the entry at that index has the 'old' value. It also returns the entry
|
||||||
which was at that index; if it returns the same entry which was passed as
|
which was at that index; if it returns the same entry which was passed as
|
||||||
'old', then :c:func:`xa_cmpxchg` succeeded.
|
'old', then xa_cmpxchg() succeeded.
|
||||||
|
|
||||||
If you want to only store a new entry to an index if the current entry
|
If you want to only store a new entry to an index if the current entry
|
||||||
at that index is ``NULL``, you can use :c:func:`xa_insert` which
|
at that index is ``NULL``, you can use xa_insert() which
|
||||||
returns ``-EBUSY`` if the entry is not empty.
|
returns ``-EBUSY`` if the entry is not empty.
|
||||||
|
|
||||||
You can enquire whether a mark is set on an entry by using
|
You can enquire whether a mark is set on an entry by using
|
||||||
:c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark
|
xa_get_mark(). If the entry is not ``NULL``, you can set a mark
|
||||||
on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
|
on it by using xa_set_mark() and remove the mark from an entry by
|
||||||
calling :c:func:`xa_clear_mark`. You can ask whether any entry in the
|
calling xa_clear_mark(). You can ask whether any entry in the
|
||||||
XArray has a particular mark set by calling :c:func:`xa_marked`.
|
XArray has a particular mark set by calling xa_marked().
|
||||||
|
|
||||||
You can copy entries out of the XArray into a plain array by calling
|
You can copy entries out of the XArray into a plain array by calling
|
||||||
:c:func:`xa_extract`. Or you can iterate over the present entries in
|
xa_extract(). Or you can iterate over the present entries in
|
||||||
the XArray by calling :c:func:`xa_for_each`. You may prefer to use
|
the XArray by calling xa_for_each(). You may prefer to use
|
||||||
:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
|
xa_find() or xa_find_after() to move to the next present
|
||||||
entry in the XArray.
|
entry in the XArray.
|
||||||
|
|
||||||
Calling :c:func:`xa_store_range` stores the same entry in a range
|
Calling xa_store_range() stores the same entry in a range
|
||||||
of indices. If you do this, some of the other operations will behave
|
of indices. If you do this, some of the other operations will behave
|
||||||
in a slightly odd way. For example, marking the entry at one index
|
in a slightly odd way. For example, marking the entry at one index
|
||||||
may result in the entry being marked at some, but not all of the other
|
may result in the entry being marked at some, but not all of the other
|
||||||
indices. Storing into one index may result in the entry retrieved by
|
indices. Storing into one index may result in the entry retrieved by
|
||||||
some, but not all of the other indices changing.
|
some, but not all of the other indices changing.
|
||||||
|
|
||||||
Sometimes you need to ensure that a subsequent call to :c:func:`xa_store`
|
Sometimes you need to ensure that a subsequent call to xa_store()
|
||||||
will not need to allocate memory. The :c:func:`xa_reserve` function
|
will not need to allocate memory. The xa_reserve() function
|
||||||
will store a reserved entry at the indicated index. Users of the
|
will store a reserved entry at the indicated index. Users of the
|
||||||
normal API will see this entry as containing ``NULL``. If you do
|
normal API will see this entry as containing ``NULL``. If you do
|
||||||
not need to use the reserved entry, you can call :c:func:`xa_release`
|
not need to use the reserved entry, you can call xa_release()
|
||||||
to remove the unused entry. If another user has stored to the entry
|
to remove the unused entry. If another user has stored to the entry
|
||||||
in the meantime, :c:func:`xa_release` will do nothing; if instead you
|
in the meantime, xa_release() will do nothing; if instead you
|
||||||
want the entry to become ``NULL``, you should use :c:func:`xa_erase`.
|
want the entry to become ``NULL``, you should use xa_erase().
|
||||||
Using :c:func:`xa_insert` on a reserved entry will fail.
|
Using xa_insert() on a reserved entry will fail.
|
||||||
|
|
||||||
If all entries in the array are ``NULL``, the :c:func:`xa_empty` function
|
If all entries in the array are ``NULL``, the xa_empty() function
|
||||||
will return ``true``.
|
will return ``true``.
|
||||||
|
|
||||||
Finally, you can remove all entries from an XArray by calling
|
Finally, you can remove all entries from an XArray by calling
|
||||||
:c:func:`xa_destroy`. If the XArray entries are pointers, you may wish
|
xa_destroy(). If the XArray entries are pointers, you may wish
|
||||||
to free the entries first. You can do this by iterating over all present
|
to free the entries first. You can do this by iterating over all present
|
||||||
entries in the XArray using the :c:func:`xa_for_each` iterator.
|
entries in the XArray using the xa_for_each() iterator.
|
||||||
|
|
||||||
Allocating XArrays
|
Allocating XArrays
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
If you use :c:func:`DEFINE_XARRAY_ALLOC` to define the XArray, or
|
If you use DEFINE_XARRAY_ALLOC() to define the XArray, or
|
||||||
initialise it by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`,
|
initialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
|
||||||
the XArray changes to track whether entries are in use or not.
|
the XArray changes to track whether entries are in use or not.
|
||||||
|
|
||||||
You can call :c:func:`xa_alloc` to store the entry at an unused index
|
You can call xa_alloc() to store the entry at an unused index
|
||||||
in the XArray. If you need to modify the array from interrupt context,
|
in the XArray. If you need to modify the array from interrupt context,
|
||||||
you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable
|
you can use xa_alloc_bh() or xa_alloc_irq() to disable
|
||||||
interrupts while allocating the ID.
|
interrupts while allocating the ID.
|
||||||
|
|
||||||
Using :c:func:`xa_store`, :c:func:`xa_cmpxchg` or :c:func:`xa_insert` will
|
Using xa_store(), xa_cmpxchg() or xa_insert() will
|
||||||
also mark the entry as being allocated. Unlike a normal XArray, storing
|
also mark the entry as being allocated. Unlike a normal XArray, storing
|
||||||
``NULL`` will mark the entry as being in use, like :c:func:`xa_reserve`.
|
``NULL`` will mark the entry as being in use, like xa_reserve().
|
||||||
To free an entry, use :c:func:`xa_erase` (or :c:func:`xa_release` if
|
To free an entry, use xa_erase() (or xa_release() if
|
||||||
you only want to free the entry if it's ``NULL``).
|
you only want to free the entry if it's ``NULL``).
|
||||||
|
|
||||||
By default, the lowest free entry is allocated starting from 0. If you
|
By default, the lowest free entry is allocated starting from 0. If you
|
||||||
want to allocate entries starting at 1, it is more efficient to use
|
want to allocate entries starting at 1, it is more efficient to use
|
||||||
:c:func:`DEFINE_XARRAY_ALLOC1` or ``XA_FLAGS_ALLOC1``. If you want to
|
DEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``. If you want to
|
||||||
allocate IDs up to a maximum, then wrap back around to the lowest free
|
allocate IDs up to a maximum, then wrap back around to the lowest free
|
||||||
ID, you can use :c:func:`xa_alloc_cyclic`.
|
ID, you can use xa_alloc_cyclic().
|
||||||
|
|
||||||
You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
|
You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
|
||||||
is used to track whether an entry is free or not. The other marks are
|
is used to track whether an entry is free or not. The other marks are
|
||||||
|
@ -155,17 +155,17 @@ available for your use.
|
||||||
Memory allocation
|
Memory allocation
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`,
|
The xa_store(), xa_cmpxchg(), xa_alloc(),
|
||||||
:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t
|
xa_reserve() and xa_insert() functions take a gfp_t
|
||||||
parameter in case the XArray needs to allocate memory to store this entry.
|
parameter in case the XArray needs to allocate memory to store this entry.
|
||||||
If the entry is being deleted, no memory allocation needs to be performed,
|
If the entry is being deleted, no memory allocation needs to be performed,
|
||||||
and the GFP flags specified will be ignored.
|
and the GFP flags specified will be ignored.
|
||||||
|
|
||||||
It is possible for no memory to be allocatable, particularly if you pass
|
It is possible for no memory to be allocatable, particularly if you pass
|
||||||
a restrictive set of GFP flags. In that case, the functions return a
|
a restrictive set of GFP flags. In that case, the functions return a
|
||||||
special value which can be turned into an errno using :c:func:`xa_err`.
|
special value which can be turned into an errno using xa_err().
|
||||||
If you don't need to know exactly which error occurred, using
|
If you don't need to know exactly which error occurred, using
|
||||||
:c:func:`xa_is_err` is slightly more efficient.
|
xa_is_err() is slightly more efficient.
|
||||||
|
|
||||||
Locking
|
Locking
|
||||||
-------
|
-------
|
||||||
|
@ -174,54 +174,54 @@ When using the Normal API, you do not have to worry about locking.
|
||||||
The XArray uses RCU and an internal spinlock to synchronise access:
|
The XArray uses RCU and an internal spinlock to synchronise access:
|
||||||
|
|
||||||
No lock needed:
|
No lock needed:
|
||||||
* :c:func:`xa_empty`
|
* xa_empty()
|
||||||
* :c:func:`xa_marked`
|
* xa_marked()
|
||||||
|
|
||||||
Takes RCU read lock:
|
Takes RCU read lock:
|
||||||
* :c:func:`xa_load`
|
* xa_load()
|
||||||
* :c:func:`xa_for_each`
|
* xa_for_each()
|
||||||
* :c:func:`xa_find`
|
* xa_find()
|
||||||
* :c:func:`xa_find_after`
|
* xa_find_after()
|
||||||
* :c:func:`xa_extract`
|
* xa_extract()
|
||||||
* :c:func:`xa_get_mark`
|
* xa_get_mark()
|
||||||
|
|
||||||
Takes xa_lock internally:
|
Takes xa_lock internally:
|
||||||
* :c:func:`xa_store`
|
* xa_store()
|
||||||
* :c:func:`xa_store_bh`
|
* xa_store_bh()
|
||||||
* :c:func:`xa_store_irq`
|
* xa_store_irq()
|
||||||
* :c:func:`xa_insert`
|
* xa_insert()
|
||||||
* :c:func:`xa_insert_bh`
|
* xa_insert_bh()
|
||||||
* :c:func:`xa_insert_irq`
|
* xa_insert_irq()
|
||||||
* :c:func:`xa_erase`
|
* xa_erase()
|
||||||
* :c:func:`xa_erase_bh`
|
* xa_erase_bh()
|
||||||
* :c:func:`xa_erase_irq`
|
* xa_erase_irq()
|
||||||
* :c:func:`xa_cmpxchg`
|
* xa_cmpxchg()
|
||||||
* :c:func:`xa_cmpxchg_bh`
|
* xa_cmpxchg_bh()
|
||||||
* :c:func:`xa_cmpxchg_irq`
|
* xa_cmpxchg_irq()
|
||||||
* :c:func:`xa_store_range`
|
* xa_store_range()
|
||||||
* :c:func:`xa_alloc`
|
* xa_alloc()
|
||||||
* :c:func:`xa_alloc_bh`
|
* xa_alloc_bh()
|
||||||
* :c:func:`xa_alloc_irq`
|
* xa_alloc_irq()
|
||||||
* :c:func:`xa_reserve`
|
* xa_reserve()
|
||||||
* :c:func:`xa_reserve_bh`
|
* xa_reserve_bh()
|
||||||
* :c:func:`xa_reserve_irq`
|
* xa_reserve_irq()
|
||||||
* :c:func:`xa_destroy`
|
* xa_destroy()
|
||||||
* :c:func:`xa_set_mark`
|
* xa_set_mark()
|
||||||
* :c:func:`xa_clear_mark`
|
* xa_clear_mark()
|
||||||
|
|
||||||
Assumes xa_lock held on entry:
|
Assumes xa_lock held on entry:
|
||||||
* :c:func:`__xa_store`
|
* __xa_store()
|
||||||
* :c:func:`__xa_insert`
|
* __xa_insert()
|
||||||
* :c:func:`__xa_erase`
|
* __xa_erase()
|
||||||
* :c:func:`__xa_cmpxchg`
|
* __xa_cmpxchg()
|
||||||
* :c:func:`__xa_alloc`
|
* __xa_alloc()
|
||||||
* :c:func:`__xa_set_mark`
|
* __xa_set_mark()
|
||||||
* :c:func:`__xa_clear_mark`
|
* __xa_clear_mark()
|
||||||
|
|
||||||
If you want to take advantage of the lock to protect the data structures
|
If you want to take advantage of the lock to protect the data structures
|
||||||
that you are storing in the XArray, you can call :c:func:`xa_lock`
|
that you are storing in the XArray, you can call xa_lock()
|
||||||
before calling :c:func:`xa_load`, then take a reference count on the
|
before calling xa_load(), then take a reference count on the
|
||||||
object you have found before calling :c:func:`xa_unlock`. This will
|
object you have found before calling xa_unlock(). This will
|
||||||
prevent stores from removing the object from the array between looking
|
prevent stores from removing the object from the array between looking
|
||||||
up the object and incrementing the refcount. You can also use RCU to
|
up the object and incrementing the refcount. You can also use RCU to
|
||||||
avoid dereferencing freed memory, but an explanation of that is beyond
|
avoid dereferencing freed memory, but an explanation of that is beyond
|
||||||
|
@ -261,7 +261,7 @@ context and then erase them in softirq context, you can do that this way::
|
||||||
}
|
}
|
||||||
|
|
||||||
If you are going to modify the XArray from interrupt or softirq context,
|
If you are going to modify the XArray from interrupt or softirq context,
|
||||||
you need to initialise the array using :c:func:`xa_init_flags`, passing
|
you need to initialise the array using xa_init_flags(), passing
|
||||||
``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
|
``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
|
||||||
|
|
||||||
The above example also shows a common pattern of wanting to extend the
|
The above example also shows a common pattern of wanting to extend the
|
||||||
|
@ -269,20 +269,20 @@ coverage of the xa_lock on the store side to protect some statistics
|
||||||
associated with the array.
|
associated with the array.
|
||||||
|
|
||||||
Sharing the XArray with interrupt context is also possible, either
|
Sharing the XArray with interrupt context is also possible, either
|
||||||
using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
|
using xa_lock_irqsave() in both the interrupt handler and process
|
||||||
context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
|
context, or xa_lock_irq() in process context and xa_lock()
|
||||||
in the interrupt handler. Some of the more common patterns have helper
|
in the interrupt handler. Some of the more common patterns have helper
|
||||||
functions such as :c:func:`xa_store_bh`, :c:func:`xa_store_irq`,
|
functions such as xa_store_bh(), xa_store_irq(),
|
||||||
:c:func:`xa_erase_bh`, :c:func:`xa_erase_irq`, :c:func:`xa_cmpxchg_bh`
|
xa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
|
||||||
and :c:func:`xa_cmpxchg_irq`.
|
and xa_cmpxchg_irq().
|
||||||
|
|
||||||
Sometimes you need to protect access to the XArray with a mutex because
|
Sometimes you need to protect access to the XArray with a mutex because
|
||||||
that lock sits above another mutex in the locking hierarchy. That does
|
that lock sits above another mutex in the locking hierarchy. That does
|
||||||
not entitle you to use functions like :c:func:`__xa_erase` without taking
|
not entitle you to use functions like __xa_erase() without taking
|
||||||
the xa_lock; the xa_lock is used for lockdep validation and will be used
|
the xa_lock; the xa_lock is used for lockdep validation and will be used
|
||||||
for other purposes in the future.
|
for other purposes in the future.
|
||||||
|
|
||||||
The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
|
The __xa_set_mark() and __xa_clear_mark() functions are also
|
||||||
available for situations where you look up an entry and want to atomically
|
available for situations where you look up an entry and want to atomically
|
||||||
set or clear a mark. It may be more efficient to use the advanced API
|
set or clear a mark. It may be more efficient to use the advanced API
|
||||||
in this case, as it will save you from walking the tree twice.
|
in this case, as it will save you from walking the tree twice.
|
||||||
|
@ -300,27 +300,27 @@ indeed the normal API is implemented in terms of the advanced API. The
|
||||||
advanced API is only available to modules with a GPL-compatible license.
|
advanced API is only available to modules with a GPL-compatible license.
|
||||||
|
|
||||||
The advanced API is based around the xa_state. This is an opaque data
|
The advanced API is based around the xa_state. This is an opaque data
|
||||||
structure which you declare on the stack using the :c:func:`XA_STATE`
|
structure which you declare on the stack using the XA_STATE()
|
||||||
macro. This macro initialises the xa_state ready to start walking
|
macro. This macro initialises the xa_state ready to start walking
|
||||||
around the XArray. It is used as a cursor to maintain the position
|
around the XArray. It is used as a cursor to maintain the position
|
||||||
in the XArray and let you compose various operations together without
|
in the XArray and let you compose various operations together without
|
||||||
having to restart from the top every time.
|
having to restart from the top every time.
|
||||||
|
|
||||||
The xa_state is also used to store errors. You can call
|
The xa_state is also used to store errors. You can call
|
||||||
:c:func:`xas_error` to retrieve the error. All operations check whether
|
xas_error() to retrieve the error. All operations check whether
|
||||||
the xa_state is in an error state before proceeding, so there's no need
|
the xa_state is in an error state before proceeding, so there's no need
|
||||||
for you to check for an error after each call; you can make multiple
|
for you to check for an error after each call; you can make multiple
|
||||||
calls in succession and only check at a convenient point. The only
|
calls in succession and only check at a convenient point. The only
|
||||||
errors currently generated by the XArray code itself are ``ENOMEM`` and
|
errors currently generated by the XArray code itself are ``ENOMEM`` and
|
||||||
``EINVAL``, but it supports arbitrary errors in case you want to call
|
``EINVAL``, but it supports arbitrary errors in case you want to call
|
||||||
:c:func:`xas_set_err` yourself.
|
xas_set_err() yourself.
|
||||||
|
|
||||||
If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
|
If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
|
||||||
will attempt to allocate more memory using the specified gfp flags and
|
will attempt to allocate more memory using the specified gfp flags and
|
||||||
cache it in the xa_state for the next attempt. The idea is that you take
|
cache it in the xa_state for the next attempt. The idea is that you take
|
||||||
the xa_lock, attempt the operation and drop the lock. The operation
|
the xa_lock, attempt the operation and drop the lock. The operation
|
||||||
attempts to allocate memory while holding the lock, but it is more
|
attempts to allocate memory while holding the lock, but it is more
|
||||||
likely to fail. Once you have dropped the lock, :c:func:`xas_nomem`
|
likely to fail. Once you have dropped the lock, xas_nomem()
|
||||||
can try harder to allocate more memory. It will return ``true`` if it
|
can try harder to allocate more memory. It will return ``true`` if it
|
||||||
is worth retrying the operation (i.e. that there was a memory error *and*
|
is worth retrying the operation (i.e. that there was a memory error *and*
|
||||||
more memory was allocated). If it has previously allocated memory, and
|
more memory was allocated). If it has previously allocated memory, and
|
||||||
|
@ -333,7 +333,7 @@ Internal Entries
|
||||||
The XArray reserves some entries for its own purposes. These are never
|
The XArray reserves some entries for its own purposes. These are never
|
||||||
exposed through the normal API, but when using the advanced API, it's
|
exposed through the normal API, but when using the advanced API, it's
|
||||||
possible to see them. Usually the best way to handle them is to pass them
|
possible to see them. Usually the best way to handle them is to pass them
|
||||||
to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
|
to xas_retry(), and retry the operation if it returns ``true``.
|
||||||
|
|
||||||
.. flat-table::
|
.. flat-table::
|
||||||
:widths: 1 1 6
|
:widths: 1 1 6
|
||||||
|
@ -343,89 +343,89 @@ to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
|
||||||
- Usage
|
- Usage
|
||||||
|
|
||||||
* - Node
|
* - Node
|
||||||
- :c:func:`xa_is_node`
|
- xa_is_node()
|
||||||
- An XArray node. May be visible when using a multi-index xa_state.
|
- An XArray node. May be visible when using a multi-index xa_state.
|
||||||
|
|
||||||
* - Sibling
|
* - Sibling
|
||||||
- :c:func:`xa_is_sibling`
|
- xa_is_sibling()
|
||||||
- A non-canonical entry for a multi-index entry. The value indicates
|
- A non-canonical entry for a multi-index entry. The value indicates
|
||||||
which slot in this node has the canonical entry.
|
which slot in this node has the canonical entry.
|
||||||
|
|
||||||
* - Retry
|
* - Retry
|
||||||
- :c:func:`xa_is_retry`
|
- xa_is_retry()
|
||||||
- This entry is currently being modified by a thread which has the
|
- This entry is currently being modified by a thread which has the
|
||||||
xa_lock. The node containing this entry may be freed at the end
|
xa_lock. The node containing this entry may be freed at the end
|
||||||
of this RCU period. You should restart the lookup from the head
|
of this RCU period. You should restart the lookup from the head
|
||||||
of the array.
|
of the array.
|
||||||
|
|
||||||
* - Zero
|
* - Zero
|
||||||
- :c:func:`xa_is_zero`
|
- xa_is_zero()
|
||||||
- Zero entries appear as ``NULL`` through the Normal API, but occupy
|
- Zero entries appear as ``NULL`` through the Normal API, but occupy
|
||||||
an entry in the XArray which can be used to reserve the index for
|
an entry in the XArray which can be used to reserve the index for
|
||||||
future use. This is used by allocating XArrays for allocated entries
|
future use. This is used by allocating XArrays for allocated entries
|
||||||
which are ``NULL``.
|
which are ``NULL``.
|
||||||
|
|
||||||
Other internal entries may be added in the future. As far as possible, they
|
Other internal entries may be added in the future. As far as possible, they
|
||||||
will be handled by :c:func:`xas_retry`.
|
will be handled by xas_retry().
|
||||||
|
|
||||||
Additional functionality
|
Additional functionality
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
The :c:func:`xas_create_range` function allocates all the necessary memory
|
The xas_create_range() function allocates all the necessary memory
|
||||||
to store every entry in a range. It will set ENOMEM in the xa_state if
|
to store every entry in a range. It will set ENOMEM in the xa_state if
|
||||||
it cannot allocate memory.
|
it cannot allocate memory.
|
||||||
|
|
||||||
You can use :c:func:`xas_init_marks` to reset the marks on an entry
|
You can use xas_init_marks() to reset the marks on an entry
|
||||||
to their default state. This is usually all marks clear, unless the
|
to their default state. This is usually all marks clear, unless the
|
||||||
XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
|
XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
|
||||||
and all other marks are clear. Replacing one entry with another using
|
and all other marks are clear. Replacing one entry with another using
|
||||||
:c:func:`xas_store` will not reset the marks on that entry; if you want
|
xas_store() will not reset the marks on that entry; if you want
|
||||||
the marks reset, you should do that explicitly.
|
the marks reset, you should do that explicitly.
|
||||||
|
|
||||||
The :c:func:`xas_load` will walk the xa_state as close to the entry
|
The xas_load() will walk the xa_state as close to the entry
|
||||||
as it can. If you know the xa_state has already been walked to the
|
as it can. If you know the xa_state has already been walked to the
|
||||||
entry and need to check that the entry hasn't changed, you can use
|
entry and need to check that the entry hasn't changed, you can use
|
||||||
:c:func:`xas_reload` to save a function call.
|
xas_reload() to save a function call.
|
||||||
|
|
||||||
If you need to move to a different index in the XArray, call
|
If you need to move to a different index in the XArray, call
|
||||||
:c:func:`xas_set`. This resets the cursor to the top of the tree, which
|
xas_set(). This resets the cursor to the top of the tree, which
|
||||||
will generally make the next operation walk the cursor to the desired
|
will generally make the next operation walk the cursor to the desired
|
||||||
spot in the tree. If you want to move to the next or previous index,
|
spot in the tree. If you want to move to the next or previous index,
|
||||||
call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does
|
call xas_next() or xas_prev(). Setting the index does
|
||||||
not walk the cursor around the array so does not require a lock to be
|
not walk the cursor around the array so does not require a lock to be
|
||||||
held, while moving to the next or previous index does.
|
held, while moving to the next or previous index does.
|
||||||
|
|
||||||
You can search for the next present entry using :c:func:`xas_find`. This
|
You can search for the next present entry using xas_find(). This
|
||||||
is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
|
is the equivalent of both xa_find() and xa_find_after();
|
||||||
if the cursor has been walked to an entry, then it will find the next
|
if the cursor has been walked to an entry, then it will find the next
|
||||||
entry after the one currently referenced. If not, it will return the
|
entry after the one currently referenced. If not, it will return the
|
||||||
entry at the index of the xa_state. Using :c:func:`xas_next_entry` to
|
entry at the index of the xa_state. Using xas_next_entry() to
|
||||||
move to the next present entry instead of :c:func:`xas_find` will save
|
move to the next present entry instead of xas_find() will save
|
||||||
a function call in the majority of cases at the expense of emitting more
|
a function call in the majority of cases at the expense of emitting more
|
||||||
inline code.
|
inline code.
|
||||||
|
|
||||||
The :c:func:`xas_find_marked` function is similar. If the xa_state has
|
The xas_find_marked() function is similar. If the xa_state has
|
||||||
not been walked, it will return the entry at the index of the xa_state,
|
not been walked, it will return the entry at the index of the xa_state,
|
||||||
if it is marked. Otherwise, it will return the first marked entry after
|
if it is marked. Otherwise, it will return the first marked entry after
|
||||||
the entry referenced by the xa_state. The :c:func:`xas_next_marked`
|
the entry referenced by the xa_state. The xas_next_marked()
|
||||||
function is the equivalent of :c:func:`xas_next_entry`.
|
function is the equivalent of xas_next_entry().
|
||||||
|
|
||||||
When iterating over a range of the XArray using :c:func:`xas_for_each`
|
When iterating over a range of the XArray using xas_for_each()
|
||||||
or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
|
or xas_for_each_marked(), it may be necessary to temporarily stop
|
||||||
the iteration. The :c:func:`xas_pause` function exists for this purpose.
|
the iteration. The xas_pause() function exists for this purpose.
|
||||||
After you have done the necessary work and wish to resume, the xa_state
|
After you have done the necessary work and wish to resume, the xa_state
|
||||||
is in an appropriate state to continue the iteration after the entry
|
is in an appropriate state to continue the iteration after the entry
|
||||||
you last processed. If you have interrupts disabled while iterating,
|
you last processed. If you have interrupts disabled while iterating,
|
||||||
then it is good manners to pause the iteration and reenable interrupts
|
then it is good manners to pause the iteration and reenable interrupts
|
||||||
every ``XA_CHECK_SCHED`` entries.
|
every ``XA_CHECK_SCHED`` entries.
|
||||||
|
|
||||||
The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
|
The xas_get_mark(), xas_set_mark() and
|
||||||
:c:func:`xas_clear_mark` functions require the xa_state cursor to have
|
xas_clear_mark() functions require the xa_state cursor to have
|
||||||
been moved to the appropriate location in the xarray; they will do
|
been moved to the appropriate location in the xarray; they will do
|
||||||
nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
|
nothing if you have called xas_pause() or xas_set()
|
||||||
immediately before.
|
immediately before.
|
||||||
|
|
||||||
You can call :c:func:`xas_set_update` to have a callback function
|
You can call xas_set_update() to have a callback function
|
||||||
called each time the XArray updates a node. This is used by the page
|
called each time the XArray updates a node. This is used by the page
|
||||||
cache workingset code to maintain its list of nodes which contain only
|
cache workingset code to maintain its list of nodes which contain only
|
||||||
shadow entries.
|
shadow entries.
|
||||||
|
@ -443,25 +443,25 @@ eg indices 64-127 may be tied together, but 2-6 may not be. This may
|
||||||
save substantial quantities of memory; for example tying 512 entries
|
save substantial quantities of memory; for example tying 512 entries
|
||||||
together will save over 4kB.
|
together will save over 4kB.
|
||||||
|
|
||||||
You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
|
You can create a multi-index entry by using XA_STATE_ORDER()
|
||||||
or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
|
or xas_set_order() followed by a call to xas_store().
|
||||||
Calling :c:func:`xas_load` with a multi-index xa_state will walk the
|
Calling xas_load() with a multi-index xa_state will walk the
|
||||||
xa_state to the right location in the tree, but the return value is not
|
xa_state to the right location in the tree, but the return value is not
|
||||||
meaningful, potentially being an internal entry or ``NULL`` even when there
|
meaningful, potentially being an internal entry or ``NULL`` even when there
|
||||||
is an entry stored within the range. Calling :c:func:`xas_find_conflict`
|
is an entry stored within the range. Calling xas_find_conflict()
|
||||||
will return the first entry within the range or ``NULL`` if there are no
|
will return the first entry within the range or ``NULL`` if there are no
|
||||||
entries in the range. The :c:func:`xas_for_each_conflict` iterator will
|
entries in the range. The xas_for_each_conflict() iterator will
|
||||||
iterate over every entry which overlaps the specified range.
|
iterate over every entry which overlaps the specified range.
|
||||||
|
|
||||||
If :c:func:`xas_load` encounters a multi-index entry, the xa_index
|
If xas_load() encounters a multi-index entry, the xa_index
|
||||||
in the xa_state will not be changed. When iterating over an XArray
|
in the xa_state will not be changed. When iterating over an XArray
|
||||||
or calling :c:func:`xas_find`, if the initial index is in the middle
|
or calling xas_find(), if the initial index is in the middle
|
||||||
of a multi-index entry, it will not be altered. Subsequent calls
|
of a multi-index entry, it will not be altered. Subsequent calls
|
||||||
or iterations will move the index to the first index in the range.
|
or iterations will move the index to the first index in the range.
|
||||||
Each entry will only be returned once, no matter how many indices it
|
Each entry will only be returned once, no matter how many indices it
|
||||||
occupies.
|
occupies.
|
||||||
|
|
||||||
Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
|
Using xas_next() or xas_prev() with a multi-index xa_state
|
||||||
is not supported. Using either of these functions on a multi-index entry
|
is not supported. Using either of these functions on a multi-index entry
|
||||||
will reveal sibling entries; these should be skipped over by the caller.
|
will reveal sibling entries; these should be skipped over by the caller.
|
||||||
|
|
||||||
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
=============================
|
||||||
Guidance for writing policies
|
Guidance for writing policies
|
||||||
=============================
|
=============================
|
||||||
|
|
||||||
|
@ -30,7 +31,7 @@ multiqueue (mq)
|
||||||
|
|
||||||
This policy is now an alias for smq (see below).
|
This policy is now an alias for smq (see below).
|
||||||
|
|
||||||
The following tunables are accepted, but have no effect:
|
The following tunables are accepted, but have no effect::
|
||||||
|
|
||||||
'sequential_threshold <#nr_sequential_ios>'
|
'sequential_threshold <#nr_sequential_ios>'
|
||||||
'random_threshold <#nr_random_ios>'
|
'random_threshold <#nr_random_ios>'
|
||||||
|
@ -56,7 +57,9 @@ mq policy's hints to be dropped. Also, performance of the cache may
|
||||||
degrade slightly until smq recalculates the origin device's hotspots
|
degrade slightly until smq recalculates the origin device's hotspots
|
||||||
that should be cached.
|
that should be cached.
|
||||||
|
|
||||||
Memory usage:
|
Memory usage
|
||||||
|
^^^^^^^^^^^^
|
||||||
|
|
||||||
The mq policy used a lot of memory; 88 bytes per cache block on a 64
|
The mq policy used a lot of memory; 88 bytes per cache block on a 64
|
||||||
bit machine.
|
bit machine.
|
||||||
|
|
||||||
|
@ -69,7 +72,9 @@ cache block).
|
||||||
All this means smq uses ~25bytes per cache block. Still a lot of
|
All this means smq uses ~25bytes per cache block. Still a lot of
|
||||||
memory, but a substantial improvement nontheless.
|
memory, but a substantial improvement nontheless.
|
||||||
|
|
||||||
Level balancing:
|
Level balancing
|
||||||
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
mq placed entries in different levels of the multiqueue structures
|
mq placed entries in different levels of the multiqueue structures
|
||||||
based on their hit count (~ln(hit count)). This meant the bottom
|
based on their hit count (~ln(hit count)). This meant the bottom
|
||||||
levels generally had the most entries, and the top ones had very
|
levels generally had the most entries, and the top ones had very
|
||||||
|
@ -94,7 +99,9 @@ is used to decide which blocks to promote. If the hotspot queue is
|
||||||
performing badly then it starts moving entries more quickly between
|
performing badly then it starts moving entries more quickly between
|
||||||
levels. This lets it adapt to new IO patterns very quickly.
|
levels. This lets it adapt to new IO patterns very quickly.
|
||||||
|
|
||||||
Performance:
|
Performance
|
||||||
|
^^^^^^^^^^^
|
||||||
|
|
||||||
Testing smq shows substantially better performance than mq.
|
Testing smq shows substantially better performance than mq.
|
||||||
|
|
||||||
cleaner
|
cleaner
|
||||||
|
@ -105,16 +112,19 @@ The cleaner writes back all dirty blocks in a cache to decommission it.
|
||||||
Examples
|
Examples
|
||||||
========
|
========
|
||||||
|
|
||||||
The syntax for a table is:
|
The syntax for a table is::
|
||||||
|
|
||||||
cache <metadata dev> <cache dev> <origin dev> <block size>
|
cache <metadata dev> <cache dev> <origin dev> <block size>
|
||||||
<#feature_args> [<feature arg>]*
|
<#feature_args> [<feature arg>]*
|
||||||
<policy> <#policy_args> [<policy arg>]*
|
<policy> <#policy_args> [<policy arg>]*
|
||||||
|
|
||||||
The syntax to send a message using the dmsetup command is:
|
The syntax to send a message using the dmsetup command is::
|
||||||
|
|
||||||
dmsetup message <mapped device> 0 sequential_threshold 1024
|
dmsetup message <mapped device> 0 sequential_threshold 1024
|
||||||
dmsetup message <mapped device> 0 random_threshold 8
|
dmsetup message <mapped device> 0 random_threshold 8
|
||||||
|
|
||||||
Using dmsetup:
|
Using dmsetup::
|
||||||
|
|
||||||
dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
|
dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
|
||||||
/dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
|
/dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
|
||||||
creates a 128GB large mapped device named 'blah' with the
|
creates a 128GB large mapped device named 'blah' with the
|
|
@ -1,3 +1,7 @@
|
||||||
|
=====
|
||||||
|
Cache
|
||||||
|
=====
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
|
@ -24,10 +28,13 @@ scenarios (eg. a vm image server).
|
||||||
Glossary
|
Glossary
|
||||||
========
|
========
|
||||||
|
|
||||||
Migration - Movement of the primary copy of a logical block from one
|
Migration
|
||||||
|
Movement of the primary copy of a logical block from one
|
||||||
device to the other.
|
device to the other.
|
||||||
Promotion - Migration from slow device to fast device.
|
Promotion
|
||||||
Demotion - Migration from fast device to slow device.
|
Migration from slow device to fast device.
|
||||||
|
Demotion
|
||||||
|
Migration from fast device to slow device.
|
||||||
|
|
||||||
The origin device always contains a copy of the logical block, which
|
The origin device always contains a copy of the logical block, which
|
||||||
may be out of date or kept in sync with the copy on the cache device
|
may be out of date or kept in sync with the copy on the cache device
|
||||||
|
@ -169,45 +176,53 @@ Target interface
|
||||||
Constructor
|
Constructor
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
cache <metadata dev> <cache dev> <origin dev> <block size>
|
::
|
||||||
<#feature args> [<feature arg>]*
|
|
||||||
<policy> <#policy args> [policy args]*
|
|
||||||
|
|
||||||
metadata dev : fast device holding the persistent metadata
|
cache <metadata dev> <cache dev> <origin dev> <block size>
|
||||||
cache dev : fast device holding cached data blocks
|
<#feature args> [<feature arg>]*
|
||||||
origin dev : slow device holding original data blocks
|
<policy> <#policy args> [policy args]*
|
||||||
block size : cache unit size in sectors
|
|
||||||
|
|
||||||
#feature args : number of feature arguments passed
|
================ =======================================================
|
||||||
feature args : writethrough or passthrough (The default is writeback.)
|
metadata dev fast device holding the persistent metadata
|
||||||
|
cache dev fast device holding cached data blocks
|
||||||
|
origin dev slow device holding original data blocks
|
||||||
|
block size cache unit size in sectors
|
||||||
|
|
||||||
policy : the replacement policy to use
|
#feature args number of feature arguments passed
|
||||||
#policy args : an even number of arguments corresponding to
|
feature args writethrough or passthrough (The default is writeback.)
|
||||||
key/value pairs passed to the policy
|
|
||||||
policy args : key/value pairs passed to the policy
|
policy the replacement policy to use
|
||||||
E.g. 'sequential_threshold 1024'
|
#policy args an even number of arguments corresponding to
|
||||||
See cache-policies.txt for details.
|
key/value pairs passed to the policy
|
||||||
|
policy args key/value pairs passed to the policy
|
||||||
|
E.g. 'sequential_threshold 1024'
|
||||||
|
See cache-policies.txt for details.
|
||||||
|
================ =======================================================
|
||||||
|
|
||||||
Optional feature arguments are:
|
Optional feature arguments are:
|
||||||
writethrough : write through caching that prohibits cache block
|
|
||||||
content from being different from origin block content.
|
|
||||||
Without this argument, the default behaviour is to write
|
|
||||||
back cache block contents later for performance reasons,
|
|
||||||
so they may differ from the corresponding origin blocks.
|
|
||||||
|
|
||||||
passthrough : a degraded mode useful for various cache coherency
|
|
||||||
situations (e.g., rolling back snapshots of
|
|
||||||
underlying storage). Reads and writes always go to
|
|
||||||
the origin. If a write goes to a cached origin
|
|
||||||
block, then the cache block is invalidated.
|
|
||||||
To enable passthrough mode the cache must be clean.
|
|
||||||
|
|
||||||
metadata2 : use version 2 of the metadata. This stores the dirty bits
|
==================== ========================================================
|
||||||
in a separate btree, which improves speed of shutting
|
writethrough write through caching that prohibits cache block
|
||||||
down the cache.
|
content from being different from origin block content.
|
||||||
|
Without this argument, the default behaviour is to write
|
||||||
|
back cache block contents later for performance reasons,
|
||||||
|
so they may differ from the corresponding origin blocks.
|
||||||
|
|
||||||
no_discard_passdown : disable passing down discards from the cache
|
passthrough a degraded mode useful for various cache coherency
|
||||||
to the origin's data device.
|
situations (e.g., rolling back snapshots of
|
||||||
|
underlying storage). Reads and writes always go to
|
||||||
|
the origin. If a write goes to a cached origin
|
||||||
|
block, then the cache block is invalidated.
|
||||||
|
To enable passthrough mode the cache must be clean.
|
||||||
|
|
||||||
|
metadata2 use version 2 of the metadata. This stores the dirty
|
||||||
|
bits in a separate btree, which improves speed of
|
||||||
|
shutting down the cache.
|
||||||
|
|
||||||
|
no_discard_passdown disable passing down discards from the cache
|
||||||
|
to the origin's data device.
|
||||||
|
==================== ========================================================
|
||||||
|
|
||||||
A policy called 'default' is always registered. This is an alias for
|
A policy called 'default' is always registered. This is an alias for
|
||||||
the policy we currently think is giving best all round performance.
|
the policy we currently think is giving best all round performance.
|
||||||
|
@ -218,54 +233,61 @@ the characteristics of a specific policy, always request it by name.
|
||||||
Status
|
Status
|
||||||
------
|
------
|
||||||
|
|
||||||
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
|
::
|
||||||
<cache block size> <#used cache blocks>/<#total cache blocks>
|
|
||||||
<#read hits> <#read misses> <#write hits> <#write misses>
|
|
||||||
<#demotions> <#promotions> <#dirty> <#features> <features>*
|
|
||||||
<#core args> <core args>* <policy name> <#policy args> <policy args>*
|
|
||||||
<cache metadata mode>
|
|
||||||
|
|
||||||
metadata block size : Fixed block size for each metadata block in
|
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
|
||||||
sectors
|
<cache block size> <#used cache blocks>/<#total cache blocks>
|
||||||
#used metadata blocks : Number of metadata blocks used
|
<#read hits> <#read misses> <#write hits> <#write misses>
|
||||||
#total metadata blocks : Total number of metadata blocks
|
<#demotions> <#promotions> <#dirty> <#features> <features>*
|
||||||
cache block size : Configurable block size for the cache device
|
<#core args> <core args>* <policy name> <#policy args> <policy args>*
|
||||||
in sectors
|
<cache metadata mode>
|
||||||
#used cache blocks : Number of blocks resident in the cache
|
|
||||||
#total cache blocks : Total number of cache blocks
|
|
||||||
#read hits : Number of times a READ bio has been mapped
|
========================= =====================================================
|
||||||
to the cache
|
metadata block size Fixed block size for each metadata block in
|
||||||
#read misses : Number of times a READ bio has been mapped
|
sectors
|
||||||
to the origin
|
#used metadata blocks Number of metadata blocks used
|
||||||
#write hits : Number of times a WRITE bio has been mapped
|
#total metadata blocks Total number of metadata blocks
|
||||||
to the cache
|
cache block size Configurable block size for the cache device
|
||||||
#write misses : Number of times a WRITE bio has been
|
in sectors
|
||||||
mapped to the origin
|
#used cache blocks Number of blocks resident in the cache
|
||||||
#demotions : Number of times a block has been removed
|
#total cache blocks Total number of cache blocks
|
||||||
from the cache
|
#read hits Number of times a READ bio has been mapped
|
||||||
#promotions : Number of times a block has been moved to
|
to the cache
|
||||||
the cache
|
#read misses Number of times a READ bio has been mapped
|
||||||
#dirty : Number of blocks in the cache that differ
|
to the origin
|
||||||
from the origin
|
#write hits Number of times a WRITE bio has been mapped
|
||||||
#feature args : Number of feature args to follow
|
to the cache
|
||||||
feature args : 'writethrough' (optional)
|
#write misses Number of times a WRITE bio has been
|
||||||
#core args : Number of core arguments (must be even)
|
mapped to the origin
|
||||||
core args : Key/value pairs for tuning the core
|
#demotions Number of times a block has been removed
|
||||||
e.g. migration_threshold
|
from the cache
|
||||||
policy name : Name of the policy
|
#promotions Number of times a block has been moved to
|
||||||
#policy args : Number of policy arguments to follow (must be even)
|
the cache
|
||||||
policy args : Key/value pairs e.g. sequential_threshold
|
#dirty Number of blocks in the cache that differ
|
||||||
cache metadata mode : ro if read-only, rw if read-write
|
from the origin
|
||||||
In serious cases where even a read-only mode is deemed unsafe
|
#feature args Number of feature args to follow
|
||||||
no further I/O will be permitted and the status will just
|
feature args 'writethrough' (optional)
|
||||||
contain the string 'Fail'. The userspace recovery tools
|
#core args Number of core arguments (must be even)
|
||||||
should then be used.
|
core args Key/value pairs for tuning the core
|
||||||
needs_check : 'needs_check' if set, '-' if not set
|
e.g. migration_threshold
|
||||||
A metadata operation has failed, resulting in the needs_check
|
policy name Name of the policy
|
||||||
flag being set in the metadata's superblock. The metadata
|
#policy args Number of policy arguments to follow (must be even)
|
||||||
device must be deactivated and checked/repaired before the
|
policy args Key/value pairs e.g. sequential_threshold
|
||||||
cache can be made fully operational again. '-' indicates
|
cache metadata mode ro if read-only, rw if read-write
|
||||||
needs_check is not set.
|
|
||||||
|
In serious cases where even a read-only mode is
|
||||||
|
deemed unsafe no further I/O will be permitted and
|
||||||
|
the status will just contain the string 'Fail'.
|
||||||
|
The userspace recovery tools should then be used.
|
||||||
|
needs_check 'needs_check' if set, '-' if not set
|
||||||
|
A metadata operation has failed, resulting in the
|
||||||
|
needs_check flag being set in the metadata's
|
||||||
|
superblock. The metadata device must be
|
||||||
|
deactivated and checked/repaired before the
|
||||||
|
cache can be made fully operational again.
|
||||||
|
'-' indicates needs_check is not set.
|
||||||
|
========================= =====================================================
|
||||||
|
|
||||||
Messages
|
Messages
|
||||||
--------
|
--------
|
||||||
|
@ -274,11 +296,12 @@ Policies will have different tunables, specific to each one, so we
|
||||||
need a generic way of getting and setting these. Device-mapper
|
need a generic way of getting and setting these. Device-mapper
|
||||||
messages are used. (A sysfs interface would also be possible.)
|
messages are used. (A sysfs interface would also be possible.)
|
||||||
|
|
||||||
The message format is:
|
The message format is::
|
||||||
|
|
||||||
<key> <value>
|
<key> <value>
|
||||||
|
|
||||||
E.g.
|
E.g.::
|
||||||
|
|
||||||
dmsetup message my_cache 0 sequential_threshold 1024
|
dmsetup message my_cache 0 sequential_threshold 1024
|
||||||
|
|
||||||
|
|
||||||
|
@ -290,11 +313,12 @@ of values from 5 to 9. Each cblock must be expressed as a decimal
|
||||||
value, in the future a variant message that takes cblock ranges
|
value, in the future a variant message that takes cblock ranges
|
||||||
expressed in hexadecimal may be needed to better support efficient
|
expressed in hexadecimal may be needed to better support efficient
|
||||||
invalidation of larger caches. The cache must be in passthrough mode
|
invalidation of larger caches. The cache must be in passthrough mode
|
||||||
when invalidate_cblocks is used.
|
when invalidate_cblocks is used::
|
||||||
|
|
||||||
invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
|
invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
|
||||||
|
|
||||||
E.g.
|
E.g.::
|
||||||
|
|
||||||
dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
|
dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
|
@ -304,8 +328,10 @@ The test suite can be found here:
|
||||||
|
|
||||||
https://github.com/jthornber/device-mapper-test-suite
|
https://github.com/jthornber/device-mapper-test-suite
|
||||||
|
|
||||||
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
|
::
|
||||||
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
|
|
||||||
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
|
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
|
||||||
/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
|
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
|
||||||
mq 4 sequential_threshold 1024 random_threshold 8'
|
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
|
||||||
|
/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
|
||||||
|
mq 4 sequential_threshold 1024 random_threshold 8'
|
|
@ -1,10 +1,12 @@
|
||||||
|
========
|
||||||
dm-delay
|
dm-delay
|
||||||
========
|
========
|
||||||
|
|
||||||
Device-Mapper's "delay" target delays reads and/or writes
|
Device-Mapper's "delay" target delays reads and/or writes
|
||||||
and maps them to different devices.
|
and maps them to different devices.
|
||||||
|
|
||||||
Parameters:
|
Parameters::
|
||||||
|
|
||||||
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>
|
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>
|
||||||
[<flush_device> <flush_offset> <flush_delay>]]
|
[<flush_device> <flush_offset> <flush_delay>]]
|
||||||
|
|
||||||
|
@ -14,15 +16,16 @@ Delays are specified in milliseconds.
|
||||||
|
|
||||||
Example scripts
|
Example scripts
|
||||||
===============
|
===============
|
||||||
[[
|
|
||||||
#!/bin/sh
|
|
||||||
# Create device delaying rw operation for 500ms
|
|
||||||
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
|
|
||||||
]]
|
|
||||||
|
|
||||||
[[
|
::
|
||||||
#!/bin/sh
|
|
||||||
# Create device delaying only write operation for 500ms and
|
#!/bin/sh
|
||||||
# splitting reads and writes to different devices $1 $2
|
# Create device delaying rw operation for 500ms
|
||||||
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
|
echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
|
||||||
]]
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#!/bin/sh
|
||||||
|
# Create device delaying only write operation for 500ms and
|
||||||
|
# splitting reads and writes to different devices $1 $2
|
||||||
|
echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
|
|
@ -1,5 +1,6 @@
|
||||||
|
========
|
||||||
dm-crypt
|
dm-crypt
|
||||||
=========
|
========
|
||||||
|
|
||||||
Device-Mapper's "crypt" target provides transparent encryption of block devices
|
Device-Mapper's "crypt" target provides transparent encryption of block devices
|
||||||
using the kernel crypto API.
|
using the kernel crypto API.
|
||||||
|
@ -7,15 +8,20 @@ using the kernel crypto API.
|
||||||
For a more detailed description of supported parameters see:
|
For a more detailed description of supported parameters see:
|
||||||
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
|
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
|
||||||
|
|
||||||
Parameters: <cipher> <key> <iv_offset> <device path> \
|
Parameters::
|
||||||
|
|
||||||
|
<cipher> <key> <iv_offset> <device path> \
|
||||||
<offset> [<#opt_params> <opt_params>]
|
<offset> [<#opt_params> <opt_params>]
|
||||||
|
|
||||||
<cipher>
|
<cipher>
|
||||||
Encryption cipher, encryption mode and Initial Vector (IV) generator.
|
Encryption cipher, encryption mode and Initial Vector (IV) generator.
|
||||||
|
|
||||||
The cipher specifications format is:
|
The cipher specifications format is::
|
||||||
|
|
||||||
cipher[:keycount]-chainmode-ivmode[:ivopts]
|
cipher[:keycount]-chainmode-ivmode[:ivopts]
|
||||||
Examples:
|
|
||||||
|
Examples::
|
||||||
|
|
||||||
aes-cbc-essiv:sha256
|
aes-cbc-essiv:sha256
|
||||||
aes-xts-plain64
|
aes-xts-plain64
|
||||||
serpent-xts-plain64
|
serpent-xts-plain64
|
||||||
|
@ -25,12 +31,17 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
|
||||||
as for the first format type.
|
as for the first format type.
|
||||||
This format is mainly used for specification of authenticated modes.
|
This format is mainly used for specification of authenticated modes.
|
||||||
|
|
||||||
The crypto API cipher specifications format is:
|
The crypto API cipher specifications format is::
|
||||||
|
|
||||||
capi:cipher_api_spec-ivmode[:ivopts]
|
capi:cipher_api_spec-ivmode[:ivopts]
|
||||||
Examples:
|
|
||||||
|
Examples::
|
||||||
|
|
||||||
capi:cbc(aes)-essiv:sha256
|
capi:cbc(aes)-essiv:sha256
|
||||||
capi:xts(aes)-plain64
|
capi:xts(aes)-plain64
|
||||||
Examples of authenticated modes:
|
|
||||||
|
Examples of authenticated modes::
|
||||||
|
|
||||||
capi:gcm(aes)-random
|
capi:gcm(aes)-random
|
||||||
capi:authenc(hmac(sha256),xts(aes))-random
|
capi:authenc(hmac(sha256),xts(aes))-random
|
||||||
capi:rfc7539(chacha20,poly1305)-random
|
capi:rfc7539(chacha20,poly1305)-random
|
||||||
|
@ -142,21 +153,21 @@ LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
|
||||||
encryption with dm-crypt using the 'cryptsetup' utility, see
|
encryption with dm-crypt using the 'cryptsetup' utility, see
|
||||||
https://gitlab.com/cryptsetup/cryptsetup
|
https://gitlab.com/cryptsetup/cryptsetup
|
||||||
|
|
||||||
[[
|
::
|
||||||
#!/bin/sh
|
|
||||||
# Create a crypt device using dmsetup
|
|
||||||
dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
|
|
||||||
]]
|
|
||||||
|
|
||||||
[[
|
#!/bin/sh
|
||||||
#!/bin/sh
|
# Create a crypt device using dmsetup
|
||||||
# Create a crypt device using dmsetup when encryption key is stored in keyring service
|
dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
|
||||||
dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
|
|
||||||
]]
|
|
||||||
|
|
||||||
[[
|
::
|
||||||
#!/bin/sh
|
|
||||||
# Create a crypt device using cryptsetup and LUKS header with default cipher
|
#!/bin/sh
|
||||||
cryptsetup luksFormat $1
|
# Create a crypt device using dmsetup when encryption key is stored in keyring service
|
||||||
cryptsetup luksOpen $1 crypt1
|
dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
|
||||||
]]
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#!/bin/sh
|
||||||
|
# Create a crypt device using cryptsetup and LUKS header with default cipher
|
||||||
|
cryptsetup luksFormat $1
|
||||||
|
cryptsetup luksOpen $1 crypt1
|
|
@ -1,3 +1,4 @@
|
||||||
|
=========
|
||||||
dm-flakey
|
dm-flakey
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
@ -15,17 +16,26 @@ underlying devices.
|
||||||
|
|
||||||
Table parameters
|
Table parameters
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
<dev path> <offset> <up interval> <down interval> \
|
<dev path> <offset> <up interval> <down interval> \
|
||||||
[<num_features> [<feature arguments>]]
|
[<num_features> [<feature arguments>]]
|
||||||
|
|
||||||
Mandatory parameters:
|
Mandatory parameters:
|
||||||
<dev path>: Full pathname to the underlying block-device, or a
|
|
||||||
"major:minor" device-number.
|
<dev path>:
|
||||||
<offset>: Starting sector within the device.
|
Full pathname to the underlying block-device, or a
|
||||||
<up interval>: Number of seconds device is available.
|
"major:minor" device-number.
|
||||||
<down interval>: Number of seconds device returns errors.
|
<offset>:
|
||||||
|
Starting sector within the device.
|
||||||
|
<up interval>:
|
||||||
|
Number of seconds device is available.
|
||||||
|
<down interval>:
|
||||||
|
Number of seconds device returns errors.
|
||||||
|
|
||||||
Optional feature parameters:
|
Optional feature parameters:
|
||||||
|
|
||||||
If no feature parameters are present, during the periods of
|
If no feature parameters are present, during the periods of
|
||||||
unreliability, all I/O returns errors.
|
unreliability, all I/O returns errors.
|
||||||
|
|
||||||
|
@ -41,17 +51,24 @@ Optional feature parameters:
|
||||||
During <down interval>, replace <Nth_byte> of the data of
|
During <down interval>, replace <Nth_byte> of the data of
|
||||||
each matching bio with <value>.
|
each matching bio with <value>.
|
||||||
|
|
||||||
<Nth_byte>: The offset of the byte to replace.
|
<Nth_byte>:
|
||||||
Counting starts at 1, to replace the first byte.
|
The offset of the byte to replace.
|
||||||
<direction>: Either 'r' to corrupt reads or 'w' to corrupt writes.
|
Counting starts at 1, to replace the first byte.
|
||||||
'w' is incompatible with drop_writes.
|
<direction>:
|
||||||
<value>: The value (from 0-255) to write.
|
Either 'r' to corrupt reads or 'w' to corrupt writes.
|
||||||
<flags>: Perform the replacement only if bio->bi_opf has all the
|
'w' is incompatible with drop_writes.
|
||||||
selected flags set.
|
<value>:
|
||||||
|
The value (from 0-255) to write.
|
||||||
|
<flags>:
|
||||||
|
Perform the replacement only if bio->bi_opf has all the
|
||||||
|
selected flags set.
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
|
Replaces the 32nd byte of READ bios with the value 1::
|
||||||
|
|
||||||
corrupt_bio_byte 32 r 1 0
|
corrupt_bio_byte 32 r 1 0
|
||||||
- replaces the 32nd byte of READ bios with the value 1
|
|
||||||
|
Replaces the 224th byte of REQ_META (=32) bios with the value 0::
|
||||||
|
|
||||||
corrupt_bio_byte 224 w 0 32
|
corrupt_bio_byte 224 w 0 32
|
||||||
- replaces the 224th byte of REQ_META (=32) bios with the value 0
|
|
|
@ -1,5 +1,6 @@
|
||||||
|
================================
|
||||||
Early creation of mapped devices
|
Early creation of mapped devices
|
||||||
====================================
|
================================
|
||||||
|
|
||||||
It is possible to configure a device-mapper device to act as the root device for
|
It is possible to configure a device-mapper device to act as the root device for
|
||||||
your system in two ways.
|
your system in two ways.
|
||||||
|
@ -12,15 +13,17 @@ The second is to create one or more device-mappers using the module parameter
|
||||||
|
|
||||||
The format is specified as a string of data separated by commas and optionally
|
The format is specified as a string of data separated by commas and optionally
|
||||||
semi-colons, where:
|
semi-colons, where:
|
||||||
|
|
||||||
- a comma is used to separate fields like name, uuid, flags and table
|
- a comma is used to separate fields like name, uuid, flags and table
|
||||||
(specifies one device)
|
(specifies one device)
|
||||||
- a semi-colon is used to separate devices.
|
- a semi-colon is used to separate devices.
|
||||||
|
|
||||||
So the format will look like this:
|
So the format will look like this::
|
||||||
|
|
||||||
dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
|
dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
|
||||||
|
|
||||||
Where,
|
Where::
|
||||||
|
|
||||||
<name> ::= The device name.
|
<name> ::= The device name.
|
||||||
<uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
|
<uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
|
||||||
<minor> ::= The device minor number | ""
|
<minor> ::= The device minor number | ""
|
||||||
|
@ -29,7 +32,7 @@ Where,
|
||||||
<target_type> ::= "verity" | "linear" | ... (see list below)
|
<target_type> ::= "verity" | "linear" | ... (see list below)
|
||||||
|
|
||||||
The dm line should be equivalent to the one used by the dmsetup tool with the
|
The dm line should be equivalent to the one used by the dmsetup tool with the
|
||||||
--concise argument.
|
`--concise` argument.
|
||||||
|
|
||||||
Target types
|
Target types
|
||||||
============
|
============
|
||||||
|
@ -38,32 +41,34 @@ Not all target types are available as there are serious risks in allowing
|
||||||
activation of certain DM targets without first using userspace tools to check
|
activation of certain DM targets without first using userspace tools to check
|
||||||
the validity of associated metadata.
|
the validity of associated metadata.
|
||||||
|
|
||||||
"cache": constrained, userspace should verify cache device
|
======================= =======================================================
|
||||||
"crypt": allowed
|
`cache` constrained, userspace should verify cache device
|
||||||
"delay": allowed
|
`crypt` allowed
|
||||||
"era": constrained, userspace should verify metadata device
|
`delay` allowed
|
||||||
"flakey": constrained, meant for test
|
`era` constrained, userspace should verify metadata device
|
||||||
"linear": allowed
|
`flakey` constrained, meant for test
|
||||||
"log-writes": constrained, userspace should verify metadata device
|
`linear` allowed
|
||||||
"mirror": constrained, userspace should verify main/mirror device
|
`log-writes` constrained, userspace should verify metadata device
|
||||||
"raid": constrained, userspace should verify metadata device
|
`mirror` constrained, userspace should verify main/mirror device
|
||||||
"snapshot": constrained, userspace should verify src/dst device
|
`raid` constrained, userspace should verify metadata device
|
||||||
"snapshot-origin": allowed
|
`snapshot` constrained, userspace should verify src/dst device
|
||||||
"snapshot-merge": constrained, userspace should verify src/dst device
|
`snapshot-origin` allowed
|
||||||
"striped": allowed
|
`snapshot-merge` constrained, userspace should verify src/dst device
|
||||||
"switch": constrained, userspace should verify dev path
|
`striped` allowed
|
||||||
"thin": constrained, requires dm target message from userspace
|
`switch` constrained, userspace should verify dev path
|
||||||
"thin-pool": constrained, requires dm target message from userspace
|
`thin` constrained, requires dm target message from userspace
|
||||||
"verity": allowed
|
`thin-pool` constrained, requires dm target message from userspace
|
||||||
"writecache": constrained, userspace should verify cache device
|
`verity` allowed
|
||||||
"zero": constrained, not meant for rootfs
|
`writecache` constrained, userspace should verify cache device
|
||||||
|
`zero` constrained, not meant for rootfs
|
||||||
|
======================= =======================================================
|
||||||
|
|
||||||
If the target is not listed above, it is constrained by default (not tested).
|
If the target is not listed above, it is constrained by default (not tested).
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
========
|
========
|
||||||
An example of booting to a linear array made up of user-mode linux block
|
An example of booting to a linear array made up of user-mode linux block
|
||||||
devices:
|
devices::
|
||||||
|
|
||||||
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
|
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
|
||||||
|
|
||||||
|
@ -71,43 +76,49 @@ This will boot to a rw dm-linear target of 8192 sectors split across two block
|
||||||
devices identified by their major:minor numbers. After boot, udev will rename
|
devices identified by their major:minor numbers. After boot, udev will rename
|
||||||
this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
|
this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
|
||||||
|
|
||||||
An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here
|
An example of multiple device-mappers, with the dm-mod.create="..." contents
|
||||||
split on multiple lines for readability:
|
is shown here split on multiple lines for readability::
|
||||||
|
|
||||||
vroot,,,ro,
|
dm-linear,,1,rw,
|
||||||
0 1740800 verity 254:0 254:0 1740800 sha1
|
0 32768 linear 8:1 0,
|
||||||
76e9be054b15884a9fa85973e9cb274c93afadb6
|
32768 1024000 linear 8:2 0;
|
||||||
5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe;
|
dm-verity,,3,ro,
|
||||||
vram,,,rw,
|
0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256
|
||||||
0 32768 linear 1:0 0,
|
ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42
|
||||||
32768 32768 linear 1:1 0
|
5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51
|
||||||
|
|
||||||
Other examples (per target):
|
Other examples (per target):
|
||||||
|
|
||||||
"crypt":
|
"crypt"::
|
||||||
|
|
||||||
dm-crypt,,8,ro,
|
dm-crypt,,8,ro,
|
||||||
0 1048576 crypt aes-xts-plain64
|
0 1048576 crypt aes-xts-plain64
|
||||||
babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
|
babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
|
||||||
/dev/sda 0 1 allow_discards
|
/dev/sda 0 1 allow_discards
|
||||||
|
|
||||||
"delay":
|
"delay"::
|
||||||
|
|
||||||
dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
|
dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
|
||||||
|
|
||||||
"linear":
|
"linear"::
|
||||||
|
|
||||||
dm-linear,,,rw,
|
dm-linear,,,rw,
|
||||||
0 32768 linear /dev/sda1 0,
|
0 32768 linear /dev/sda1 0,
|
||||||
32768 1024000 linear /dev/sda2 0,
|
32768 1024000 linear /dev/sda2 0,
|
||||||
1056768 204800 linear /dev/sda3 0,
|
1056768 204800 linear /dev/sda3 0,
|
||||||
1261568 512000 linear /dev/sda4 0
|
1261568 512000 linear /dev/sda4 0
|
||||||
|
|
||||||
"snapshot-origin":
|
"snapshot-origin"::
|
||||||
|
|
||||||
dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
|
dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
|
||||||
|
|
||||||
"striped":
|
"striped"::
|
||||||
|
|
||||||
dm-striped,,4,ro,0 1638400 striped 4 4096
|
dm-striped,,4,ro,0 1638400 striped 4 4096
|
||||||
/dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
|
/dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
|
||||||
|
|
||||||
"verity":
|
"verity"::
|
||||||
|
|
||||||
dm-verity,,4,ro,
|
dm-verity,,4,ro,
|
||||||
0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
|
0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
|
||||||
fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
|
fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
|
|
@ -1,3 +1,7 @@
|
||||||
|
============
|
||||||
|
dm-integrity
|
||||||
|
============
|
||||||
|
|
||||||
The dm-integrity target emulates a block device that has additional
|
The dm-integrity target emulates a block device that has additional
|
||||||
per-sector tags that can be used for storing integrity information.
|
per-sector tags that can be used for storing integrity information.
|
||||||
|
|
||||||
|
@ -35,15 +39,16 @@ zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
|
||||||
target can't be loaded.
|
target can't be loaded.
|
||||||
|
|
||||||
To use the target for the first time:
|
To use the target for the first time:
|
||||||
|
|
||||||
1. overwrite the superblock with zeroes
|
1. overwrite the superblock with zeroes
|
||||||
2. load the dm-integrity target with one-sector size, the kernel driver
|
2. load the dm-integrity target with one-sector size, the kernel driver
|
||||||
will format the device
|
will format the device
|
||||||
3. unload the dm-integrity target
|
3. unload the dm-integrity target
|
||||||
4. read the "provided_data_sectors" value from the superblock
|
4. read the "provided_data_sectors" value from the superblock
|
||||||
5. load the dm-integrity target with the the target size
|
5. load the dm-integrity target with the the target size
|
||||||
"provided_data_sectors"
|
"provided_data_sectors"
|
||||||
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
|
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
|
||||||
with the size "provided_data_sectors"
|
with the size "provided_data_sectors"
|
||||||
|
|
||||||
|
|
||||||
Target arguments:
|
Target arguments:
|
||||||
|
@ -51,17 +56,20 @@ Target arguments:
|
||||||
1. the underlying block device
|
1. the underlying block device
|
||||||
|
|
||||||
2. the number of reserved sector at the beginning of the device - the
|
2. the number of reserved sector at the beginning of the device - the
|
||||||
dm-integrity won't read of write these sectors
|
dm-integrity won't read of write these sectors
|
||||||
|
|
||||||
3. the size of the integrity tag (if "-" is used, the size is taken from
|
3. the size of the integrity tag (if "-" is used, the size is taken from
|
||||||
the internal-hash algorithm)
|
the internal-hash algorithm)
|
||||||
|
|
||||||
4. mode:
|
4. mode:
|
||||||
D - direct writes (without journal) - in this mode, journaling is
|
|
||||||
|
D - direct writes (without journal)
|
||||||
|
in this mode, journaling is
|
||||||
not used and data sectors and integrity tags are written
|
not used and data sectors and integrity tags are written
|
||||||
separately. In case of crash, it is possible that the data
|
separately. In case of crash, it is possible that the data
|
||||||
and integrity tag doesn't match.
|
and integrity tag doesn't match.
|
||||||
J - journaled writes - data and integrity tags are written to the
|
J - journaled writes
|
||||||
|
data and integrity tags are written to the
|
||||||
journal and atomicity is guaranteed. In case of crash,
|
journal and atomicity is guaranteed. In case of crash,
|
||||||
either both data and tag or none of them are written. The
|
either both data and tag or none of them are written. The
|
||||||
journaled mode degrades write throughput twice because the
|
journaled mode degrades write throughput twice because the
|
||||||
|
@ -178,9 +186,12 @@ and the reloaded target would be non-functional.
|
||||||
|
|
||||||
|
|
||||||
The layout of the formatted block device:
|
The layout of the formatted block device:
|
||||||
* reserved sectors (they are not used by this target, they can be used for
|
|
||||||
storing LUKS metadata or for other purpose), the size of the reserved
|
* reserved sectors
|
||||||
area is specified in the target arguments
|
(they are not used by this target, they can be used for
|
||||||
|
storing LUKS metadata or for other purpose), the size of the reserved
|
||||||
|
area is specified in the target arguments
|
||||||
|
|
||||||
* superblock (4kiB)
|
* superblock (4kiB)
|
||||||
* magic string - identifies that the device was formatted
|
* magic string - identifies that the device was formatted
|
||||||
* version
|
* version
|
||||||
|
@ -192,40 +203,55 @@ The layout of the formatted block device:
|
||||||
metadata and padding). The user of this target should not send
|
metadata and padding). The user of this target should not send
|
||||||
bios that access data beyond the "provided data sectors" limit.
|
bios that access data beyond the "provided data sectors" limit.
|
||||||
* flags
|
* flags
|
||||||
SB_FLAG_HAVE_JOURNAL_MAC - a flag is set if journal_mac is used
|
SB_FLAG_HAVE_JOURNAL_MAC
|
||||||
SB_FLAG_RECALCULATING - recalculating is in progress
|
- a flag is set if journal_mac is used
|
||||||
SB_FLAG_DIRTY_BITMAP - journal area contains the bitmap of dirty
|
SB_FLAG_RECALCULATING
|
||||||
blocks
|
- recalculating is in progress
|
||||||
|
SB_FLAG_DIRTY_BITMAP
|
||||||
|
- journal area contains the bitmap of dirty
|
||||||
|
blocks
|
||||||
* log2(sectors per block)
|
* log2(sectors per block)
|
||||||
* a position where recalculating finished
|
* a position where recalculating finished
|
||||||
* journal
|
* journal
|
||||||
The journal is divided into sections, each section contains:
|
The journal is divided into sections, each section contains:
|
||||||
|
|
||||||
* metadata area (4kiB), it contains journal entries
|
* metadata area (4kiB), it contains journal entries
|
||||||
every journal entry contains:
|
|
||||||
|
- every journal entry contains:
|
||||||
|
|
||||||
* logical sector (specifies where the data and tag should
|
* logical sector (specifies where the data and tag should
|
||||||
be written)
|
be written)
|
||||||
* last 8 bytes of data
|
* last 8 bytes of data
|
||||||
* integrity tag (the size is specified in the superblock)
|
* integrity tag (the size is specified in the superblock)
|
||||||
every metadata sector ends with
|
|
||||||
|
- every metadata sector ends with
|
||||||
|
|
||||||
* mac (8-bytes), all the macs in 8 metadata sectors form a
|
* mac (8-bytes), all the macs in 8 metadata sectors form a
|
||||||
64-byte value. It is used to store hmac of sector
|
64-byte value. It is used to store hmac of sector
|
||||||
numbers in the journal section, to protect against a
|
numbers in the journal section, to protect against a
|
||||||
possibility that the attacker tampers with sector
|
possibility that the attacker tampers with sector
|
||||||
numbers in the journal.
|
numbers in the journal.
|
||||||
* commit id
|
* commit id
|
||||||
|
|
||||||
* data area (the size is variable; it depends on how many journal
|
* data area (the size is variable; it depends on how many journal
|
||||||
entries fit into the metadata area)
|
entries fit into the metadata area)
|
||||||
every sector in the data area contains:
|
|
||||||
|
- every sector in the data area contains:
|
||||||
|
|
||||||
* data (504 bytes of data, the last 8 bytes are stored in
|
* data (504 bytes of data, the last 8 bytes are stored in
|
||||||
the journal entry)
|
the journal entry)
|
||||||
* commit id
|
* commit id
|
||||||
|
|
||||||
To test if the whole journal section was written correctly, every
|
To test if the whole journal section was written correctly, every
|
||||||
512-byte sector of the journal ends with 8-byte commit id. If the
|
512-byte sector of the journal ends with 8-byte commit id. If the
|
||||||
commit id matches on all sectors in a journal section, then it is
|
commit id matches on all sectors in a journal section, then it is
|
||||||
assumed that the section was written correctly. If the commit id
|
assumed that the section was written correctly. If the commit id
|
||||||
doesn't match, the section was written partially and it should not
|
doesn't match, the section was written partially and it should not
|
||||||
be replayed.
|
be replayed.
|
||||||
* one or more runs of interleaved tags and data. Each run contains:
|
|
||||||
|
* one or more runs of interleaved tags and data.
|
||||||
|
Each run contains:
|
||||||
|
|
||||||
* tag area - it contains integrity tags. There is one tag for each
|
* tag area - it contains integrity tags. There is one tag for each
|
||||||
sector in the data area
|
sector in the data area
|
||||||
* data area - it contains data sectors. The number of data sectors
|
* data area - it contains data sectors. The number of data sectors
|
|
@ -1,3 +1,4 @@
|
||||||
|
=====
|
||||||
dm-io
|
dm-io
|
||||||
=====
|
=====
|
||||||
|
|
||||||
|
@ -7,7 +8,7 @@ version.
|
||||||
|
|
||||||
The user must set up an io_region structure to describe the desired location
|
The user must set up an io_region structure to describe the desired location
|
||||||
of the I/O. Each io_region indicates a block-device along with the starting
|
of the I/O. Each io_region indicates a block-device along with the starting
|
||||||
sector and size of the region.
|
sector and size of the region::
|
||||||
|
|
||||||
struct io_region {
|
struct io_region {
|
||||||
struct block_device *bdev;
|
struct block_device *bdev;
|
||||||
|
@ -19,7 +20,7 @@ Dm-io can read from one io_region or write to one or more io_regions. Writes
|
||||||
to multiple regions are specified by an array of io_region structures.
|
to multiple regions are specified by an array of io_region structures.
|
||||||
|
|
||||||
The first I/O service type takes a list of memory pages as the data buffer for
|
The first I/O service type takes a list of memory pages as the data buffer for
|
||||||
the I/O, along with an offset into the first page.
|
the I/O, along with an offset into the first page::
|
||||||
|
|
||||||
struct page_list {
|
struct page_list {
|
||||||
struct page_list *next;
|
struct page_list *next;
|
||||||
|
@ -35,7 +36,7 @@ the I/O, along with an offset into the first page.
|
||||||
|
|
||||||
The second I/O service type takes an array of bio vectors as the data buffer
|
The second I/O service type takes an array of bio vectors as the data buffer
|
||||||
for the I/O. This service can be handy if the caller has a pre-assembled bio,
|
for the I/O. This service can be handy if the caller has a pre-assembled bio,
|
||||||
but wants to direct different portions of the bio to different devices.
|
but wants to direct different portions of the bio to different devices::
|
||||||
|
|
||||||
int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
|
int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
|
||||||
int rw, struct bio_vec *bvec,
|
int rw, struct bio_vec *bvec,
|
||||||
|
@ -47,7 +48,7 @@ but wants to direct different portions of the bio to different devices.
|
||||||
The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
|
The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
|
||||||
data buffer for the I/O. This service can be handy if the caller needs to do
|
data buffer for the I/O. This service can be handy if the caller needs to do
|
||||||
I/O to a large region but doesn't want to allocate a large number of individual
|
I/O to a large region but doesn't want to allocate a large number of individual
|
||||||
memory pages.
|
memory pages::
|
||||||
|
|
||||||
int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
|
int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
|
||||||
void *data, unsigned long *error_bits);
|
void *data, unsigned long *error_bits);
|
||||||
|
@ -55,11 +56,11 @@ memory pages.
|
||||||
void *data, io_notify_fn fn, void *context);
|
void *data, io_notify_fn fn, void *context);
|
||||||
|
|
||||||
Callers of the asynchronous I/O services must include the name of a completion
|
Callers of the asynchronous I/O services must include the name of a completion
|
||||||
callback routine and a pointer to some context data for the I/O.
|
callback routine and a pointer to some context data for the I/O::
|
||||||
|
|
||||||
typedef void (*io_notify_fn)(unsigned long error, void *context);
|
typedef void (*io_notify_fn)(unsigned long error, void *context);
|
||||||
|
|
||||||
The "error" parameter in this callback, as well as the "*error" parameter in
|
The "error" parameter in this callback, as well as the `*error` parameter in
|
||||||
all of the synchronous versions, is a bitset (instead of a simple error value).
|
all of the synchronous versions, is a bitset (instead of a simple error value).
|
||||||
In the case of an write-I/O to multiple regions, this bitset allows dm-io to
|
In the case of an write-I/O to multiple regions, this bitset allows dm-io to
|
||||||
indicate success or failure on each individual region.
|
indicate success or failure on each individual region.
|
||||||
|
@ -72,4 +73,3 @@ always available in order to avoid unnecessary waiting while performing I/O.
|
||||||
When the user is finished using the dm-io services, they should call
|
When the user is finished using the dm-io services, they should call
|
||||||
dm_io_put() and specify the same number of pages that were given on the
|
dm_io_put() and specify the same number of pages that were given on the
|
||||||
dm_io_get() call.
|
dm_io_get() call.
|
||||||
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
=====================
|
||||||
Device-Mapper Logging
|
Device-Mapper Logging
|
||||||
=====================
|
=====================
|
||||||
The device-mapper logging code is used by some of the device-mapper
|
The device-mapper logging code is used by some of the device-mapper
|
||||||
|
@ -16,11 +17,13 @@ dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different
|
||||||
logging implementations are available and provide different
|
logging implementations are available and provide different
|
||||||
capabilities. The list includes:
|
capabilities. The list includes:
|
||||||
|
|
||||||
|
============== ==============================================================
|
||||||
Type Files
|
Type Files
|
||||||
==== =====
|
============== ==============================================================
|
||||||
disk drivers/md/dm-log.c
|
disk drivers/md/dm-log.c
|
||||||
core drivers/md/dm-log.c
|
core drivers/md/dm-log.c
|
||||||
userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
|
userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
|
||||||
|
============== ==============================================================
|
||||||
|
|
||||||
The "disk" log type
|
The "disk" log type
|
||||||
-------------------
|
-------------------
|
|
@ -1,3 +1,4 @@
|
||||||
|
===============
|
||||||
dm-queue-length
|
dm-queue-length
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
@ -6,12 +7,18 @@ which selects a path with the least number of in-flight I/Os.
|
||||||
The path selector name is 'queue-length'.
|
The path selector name is 'queue-length'.
|
||||||
|
|
||||||
Table parameters for each path: [<repeat_count>]
|
Table parameters for each path: [<repeat_count>]
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
<repeat_count>: The number of I/Os to dispatch using the selected
|
<repeat_count>: The number of I/Os to dispatch using the selected
|
||||||
path before switching to the next path.
|
path before switching to the next path.
|
||||||
If not given, internal default is used. To check
|
If not given, internal default is used. To check
|
||||||
the default value, see the activated table.
|
the default value, see the activated table.
|
||||||
|
|
||||||
Status for each path: <status> <fail-count> <in-flight>
|
Status for each path: <status> <fail-count> <in-flight>
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
<status>: 'A' if the path is active, 'F' if the path is failed.
|
<status>: 'A' if the path is active, 'F' if the path is failed.
|
||||||
<fail-count>: The number of path failures.
|
<fail-count>: The number of path failures.
|
||||||
<in-flight>: The number of in-flight I/Os on the path.
|
<in-flight>: The number of in-flight I/Os on the path.
|
||||||
|
@ -29,11 +36,13 @@ Examples
|
||||||
========
|
========
|
||||||
In case that 2 paths (sda and sdb) are used with repeat_count == 128.
|
In case that 2 paths (sda and sdb) are used with repeat_count == 128.
|
||||||
|
|
||||||
# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
|
::
|
||||||
dmsetup create test
|
|
||||||
#
|
# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
|
||||||
# dmsetup table
|
dmsetup create test
|
||||||
test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
|
#
|
||||||
#
|
# dmsetup table
|
||||||
# dmsetup status
|
test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
|
||||||
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
|
#
|
||||||
|
# dmsetup status
|
||||||
|
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
|
|
@ -1,3 +1,4 @@
|
||||||
|
=======
|
||||||
dm-raid
|
dm-raid
|
||||||
=======
|
=======
|
||||||
|
|
||||||
|
@ -8,49 +9,66 @@ interface.
|
||||||
|
|
||||||
Mapping Table Interface
|
Mapping Table Interface
|
||||||
-----------------------
|
-----------------------
|
||||||
The target is named "raid" and it accepts the following parameters:
|
The target is named "raid" and it accepts the following parameters::
|
||||||
|
|
||||||
<raid_type> <#raid_params> <raid_params> \
|
<raid_type> <#raid_params> <raid_params> \
|
||||||
<#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
|
<#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
|
||||||
|
|
||||||
<raid_type>:
|
<raid_type>:
|
||||||
|
|
||||||
|
============= ===============================================================
|
||||||
raid0 RAID0 striping (no resilience)
|
raid0 RAID0 striping (no resilience)
|
||||||
raid1 RAID1 mirroring
|
raid1 RAID1 mirroring
|
||||||
raid4 RAID4 with dedicated last parity disk
|
raid4 RAID4 with dedicated last parity disk
|
||||||
raid5_n RAID5 with dedicated last parity disk supporting takeover
|
raid5_n RAID5 with dedicated last parity disk supporting takeover
|
||||||
Same as raid4
|
Same as raid4
|
||||||
-Transitory layout
|
|
||||||
|
- Transitory layout
|
||||||
raid5_la RAID5 left asymmetric
|
raid5_la RAID5 left asymmetric
|
||||||
|
|
||||||
- rotating parity 0 with data continuation
|
- rotating parity 0 with data continuation
|
||||||
raid5_ra RAID5 right asymmetric
|
raid5_ra RAID5 right asymmetric
|
||||||
|
|
||||||
- rotating parity N with data continuation
|
- rotating parity N with data continuation
|
||||||
raid5_ls RAID5 left symmetric
|
raid5_ls RAID5 left symmetric
|
||||||
|
|
||||||
- rotating parity 0 with data restart
|
- rotating parity 0 with data restart
|
||||||
raid5_rs RAID5 right symmetric
|
raid5_rs RAID5 right symmetric
|
||||||
|
|
||||||
- rotating parity N with data restart
|
- rotating parity N with data restart
|
||||||
raid6_zr RAID6 zero restart
|
raid6_zr RAID6 zero restart
|
||||||
|
|
||||||
- rotating parity zero (left-to-right) with data restart
|
- rotating parity zero (left-to-right) with data restart
|
||||||
raid6_nr RAID6 N restart
|
raid6_nr RAID6 N restart
|
||||||
|
|
||||||
- rotating parity N (right-to-left) with data restart
|
- rotating parity N (right-to-left) with data restart
|
||||||
raid6_nc RAID6 N continue
|
raid6_nc RAID6 N continue
|
||||||
|
|
||||||
- rotating parity N (right-to-left) with data continuation
|
- rotating parity N (right-to-left) with data continuation
|
||||||
raid6_n_6 RAID6 with dedicate parity disks
|
raid6_n_6 RAID6 with dedicate parity disks
|
||||||
|
|
||||||
- parity and Q-syndrome on the last 2 disks;
|
- parity and Q-syndrome on the last 2 disks;
|
||||||
layout for takeover from/to raid4/raid5_n
|
layout for takeover from/to raid4/raid5_n
|
||||||
raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
|
raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
|
||||||
|
|
||||||
- layout for takeover from raid5_la from/to raid6
|
- layout for takeover from raid5_la from/to raid6
|
||||||
raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
|
raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
|
||||||
|
|
||||||
- layout for takeover from raid5_ra from/to raid6
|
- layout for takeover from raid5_ra from/to raid6
|
||||||
raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk
|
raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk
|
||||||
|
|
||||||
- layout for takeover from raid5_ls from/to raid6
|
- layout for takeover from raid5_ls from/to raid6
|
||||||
raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk
|
raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk
|
||||||
|
|
||||||
- layout for takeover from raid5_rs from/to raid6
|
- layout for takeover from raid5_rs from/to raid6
|
||||||
raid10 Various RAID10 inspired algorithms chosen by additional params
|
raid10 Various RAID10 inspired algorithms chosen by additional params
|
||||||
(see raid10_format and raid10_copies below)
|
(see raid10_format and raid10_copies below)
|
||||||
|
|
||||||
- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
|
- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
|
||||||
- RAID1E: Integrated Adjacent Stripe Mirroring
|
- RAID1E: Integrated Adjacent Stripe Mirroring
|
||||||
- RAID1E: Integrated Offset Stripe Mirroring
|
- RAID1E: Integrated Offset Stripe Mirroring
|
||||||
- and other similar RAID10 variants
|
- and other similar RAID10 variants
|
||||||
|
============= ===============================================================
|
||||||
|
|
||||||
Reference: Chapter 4 of
|
Reference: Chapter 4 of
|
||||||
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
|
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
|
||||||
|
@ -58,33 +76,41 @@ The target is named "raid" and it accepts the following parameters:
|
||||||
<#raid_params>: The number of parameters that follow.
|
<#raid_params>: The number of parameters that follow.
|
||||||
|
|
||||||
<raid_params> consists of
|
<raid_params> consists of
|
||||||
|
|
||||||
Mandatory parameters:
|
Mandatory parameters:
|
||||||
<chunk_size>: Chunk size in sectors. This parameter is often known as
|
<chunk_size>:
|
||||||
|
Chunk size in sectors. This parameter is often known as
|
||||||
"stripe size". It is the only mandatory parameter and
|
"stripe size". It is the only mandatory parameter and
|
||||||
is placed first.
|
is placed first.
|
||||||
|
|
||||||
followed by optional parameters (in any order):
|
followed by optional parameters (in any order):
|
||||||
[sync|nosync] Force or prevent RAID initialization.
|
[sync|nosync]
|
||||||
|
Force or prevent RAID initialization.
|
||||||
|
|
||||||
[rebuild <idx>] Rebuild drive number 'idx' (first drive is 0).
|
[rebuild <idx>]
|
||||||
|
Rebuild drive number 'idx' (first drive is 0).
|
||||||
|
|
||||||
[daemon_sleep <ms>]
|
[daemon_sleep <ms>]
|
||||||
Interval between runs of the bitmap daemon that
|
Interval between runs of the bitmap daemon that
|
||||||
clear bits. A longer interval means less bitmap I/O but
|
clear bits. A longer interval means less bitmap I/O but
|
||||||
resyncing after a failure is likely to take longer.
|
resyncing after a failure is likely to take longer.
|
||||||
|
|
||||||
[min_recovery_rate <kB/sec/disk>] Throttle RAID initialization
|
[min_recovery_rate <kB/sec/disk>]
|
||||||
[max_recovery_rate <kB/sec/disk>] Throttle RAID initialization
|
Throttle RAID initialization
|
||||||
[write_mostly <idx>] Mark drive index 'idx' write-mostly.
|
[max_recovery_rate <kB/sec/disk>]
|
||||||
[max_write_behind <sectors>] See '--write-behind=' (man mdadm)
|
Throttle RAID initialization
|
||||||
[stripe_cache <sectors>] Stripe cache size (RAID 4/5/6 only)
|
[write_mostly <idx>]
|
||||||
|
Mark drive index 'idx' write-mostly.
|
||||||
|
[max_write_behind <sectors>]
|
||||||
|
See '--write-behind=' (man mdadm)
|
||||||
|
[stripe_cache <sectors>]
|
||||||
|
Stripe cache size (RAID 4/5/6 only)
|
||||||
[region_size <sectors>]
|
[region_size <sectors>]
|
||||||
The region_size multiplied by the number of regions is the
|
The region_size multiplied by the number of regions is the
|
||||||
logical size of the array. The bitmap records the device
|
logical size of the array. The bitmap records the device
|
||||||
synchronisation state for each region.
|
synchronisation state for each region.
|
||||||
|
|
||||||
[raid10_copies <# copies>]
|
[raid10_copies <# copies>], [raid10_format <near|far|offset>]
|
||||||
[raid10_format <near|far|offset>]
|
|
||||||
These two options are used to alter the default layout of
|
These two options are used to alter the default layout of
|
||||||
a RAID10 configuration. The number of copies is can be
|
a RAID10 configuration. The number of copies is can be
|
||||||
specified, but the default is 2. There are also three
|
specified, but the default is 2. There are also three
|
||||||
|
@ -93,13 +119,17 @@ The target is named "raid" and it accepts the following parameters:
|
||||||
respect to mirroring. If these options are left unspecified,
|
respect to mirroring. If these options are left unspecified,
|
||||||
or 'raid10_copies 2' and/or 'raid10_format near' are given,
|
or 'raid10_copies 2' and/or 'raid10_format near' are given,
|
||||||
then the layouts for 2, 3 and 4 devices are:
|
then the layouts for 2, 3 and 4 devices are:
|
||||||
|
|
||||||
|
======== ========== ==============
|
||||||
2 drives 3 drives 4 drives
|
2 drives 3 drives 4 drives
|
||||||
-------- ---------- --------------
|
======== ========== ==============
|
||||||
A1 A1 A1 A1 A2 A1 A1 A2 A2
|
A1 A1 A1 A1 A2 A1 A1 A2 A2
|
||||||
A2 A2 A2 A3 A3 A3 A3 A4 A4
|
A2 A2 A2 A3 A3 A3 A3 A4 A4
|
||||||
A3 A3 A4 A4 A5 A5 A5 A6 A6
|
A3 A3 A4 A4 A5 A5 A5 A6 A6
|
||||||
A4 A4 A5 A6 A6 A7 A7 A8 A8
|
A4 A4 A5 A6 A6 A7 A7 A8 A8
|
||||||
.. .. .. .. .. .. .. .. ..
|
.. .. .. .. .. .. .. .. ..
|
||||||
|
======== ========== ==============
|
||||||
|
|
||||||
The 2-device layout is equivalent 2-way RAID1. The 4-device
|
The 2-device layout is equivalent 2-way RAID1. The 4-device
|
||||||
layout is what a traditional RAID10 would look like. The
|
layout is what a traditional RAID10 would look like. The
|
||||||
3-device layout is what might be called a 'RAID1E - Integrated
|
3-device layout is what might be called a 'RAID1E - Integrated
|
||||||
|
@ -107,8 +137,10 @@ The target is named "raid" and it accepts the following parameters:
|
||||||
|
|
||||||
If 'raid10_copies 2' and 'raid10_format far', then the layouts
|
If 'raid10_copies 2' and 'raid10_format far', then the layouts
|
||||||
for 2, 3 and 4 devices are:
|
for 2, 3 and 4 devices are:
|
||||||
|
|
||||||
|
======== ============ ===================
|
||||||
2 drives 3 drives 4 drives
|
2 drives 3 drives 4 drives
|
||||||
-------- -------------- --------------------
|
======== ============ ===================
|
||||||
A1 A2 A1 A2 A3 A1 A2 A3 A4
|
A1 A2 A1 A2 A3 A1 A2 A3 A4
|
||||||
A3 A4 A4 A5 A6 A5 A6 A7 A8
|
A3 A4 A4 A5 A6 A5 A6 A7 A8
|
||||||
A5 A6 A7 A8 A9 A9 A10 A11 A12
|
A5 A6 A7 A8 A9 A9 A10 A11 A12
|
||||||
|
@ -117,11 +149,14 @@ The target is named "raid" and it accepts the following parameters:
|
||||||
A4 A3 A6 A4 A5 A6 A5 A8 A7
|
A4 A3 A6 A4 A5 A6 A5 A8 A7
|
||||||
A6 A5 A9 A7 A8 A10 A9 A12 A11
|
A6 A5 A9 A7 A8 A10 A9 A12 A11
|
||||||
.. .. .. .. .. .. .. .. ..
|
.. .. .. .. .. .. .. .. ..
|
||||||
|
======== ============ ===================
|
||||||
|
|
||||||
If 'raid10_copies 2' and 'raid10_format offset', then the
|
If 'raid10_copies 2' and 'raid10_format offset', then the
|
||||||
layouts for 2, 3 and 4 devices are:
|
layouts for 2, 3 and 4 devices are:
|
||||||
|
|
||||||
|
======== ========== ================
|
||||||
2 drives 3 drives 4 drives
|
2 drives 3 drives 4 drives
|
||||||
-------- ------------ -----------------
|
======== ========== ================
|
||||||
A1 A2 A1 A2 A3 A1 A2 A3 A4
|
A1 A2 A1 A2 A3 A1 A2 A3 A4
|
||||||
A2 A1 A3 A1 A2 A2 A1 A4 A3
|
A2 A1 A3 A1 A2 A2 A1 A4 A3
|
||||||
A3 A4 A4 A5 A6 A5 A6 A7 A8
|
A3 A4 A4 A5 A6 A5 A6 A7 A8
|
||||||
|
@ -129,6 +164,8 @@ The target is named "raid" and it accepts the following parameters:
|
||||||
A5 A6 A7 A8 A9 A9 A10 A11 A12
|
A5 A6 A7 A8 A9 A9 A10 A11 A12
|
||||||
A6 A5 A9 A7 A8 A10 A9 A12 A11
|
A6 A5 A9 A7 A8 A10 A9 A12 A11
|
||||||
.. .. .. .. .. .. .. .. ..
|
.. .. .. .. .. .. .. .. ..
|
||||||
|
======== ========== ================
|
||||||
|
|
||||||
Here we see layouts closely akin to 'RAID1E - Integrated
|
Here we see layouts closely akin to 'RAID1E - Integrated
|
||||||
Offset Stripe Mirroring'.
|
Offset Stripe Mirroring'.
|
||||||
|
|
||||||
|
@ -190,22 +227,25 @@ The target is named "raid" and it accepts the following parameters:
|
||||||
|
|
||||||
Example Tables
|
Example Tables
|
||||||
--------------
|
--------------
|
||||||
# RAID4 - 4 data drives, 1 parity (no metadata devices)
|
|
||||||
# No metadata devices specified to hold superblock/bitmap info
|
|
||||||
# Chunk size of 1MiB
|
|
||||||
# (Lines separated for easy reading)
|
|
||||||
|
|
||||||
0 1960893648 raid \
|
::
|
||||||
raid4 1 2048 \
|
|
||||||
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
|
|
||||||
|
|
||||||
# RAID4 - 4 data drives, 1 parity (with metadata devices)
|
# RAID4 - 4 data drives, 1 parity (no metadata devices)
|
||||||
# Chunk size of 1MiB, force RAID initialization,
|
# No metadata devices specified to hold superblock/bitmap info
|
||||||
# min recovery rate at 20 kiB/sec/disk
|
# Chunk size of 1MiB
|
||||||
|
# (Lines separated for easy reading)
|
||||||
|
|
||||||
0 1960893648 raid \
|
0 1960893648 raid \
|
||||||
raid4 4 2048 sync min_recovery_rate 20 \
|
raid4 1 2048 \
|
||||||
5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
|
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
|
||||||
|
|
||||||
|
# RAID4 - 4 data drives, 1 parity (with metadata devices)
|
||||||
|
# Chunk size of 1MiB, force RAID initialization,
|
||||||
|
# min recovery rate at 20 kiB/sec/disk
|
||||||
|
|
||||||
|
0 1960893648 raid \
|
||||||
|
raid4 4 2048 sync min_recovery_rate 20 \
|
||||||
|
5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
|
||||||
|
|
||||||
|
|
||||||
Status Output
|
Status Output
|
||||||
|
@ -219,41 +259,58 @@ Arguments that can be repeated are ordered by value.
|
||||||
|
|
||||||
'dmsetup status' yields information on the state and health of the array.
|
'dmsetup status' yields information on the state and health of the array.
|
||||||
The output is as follows (normally a single line, but expanded here for
|
The output is as follows (normally a single line, but expanded here for
|
||||||
clarity):
|
clarity)::
|
||||||
1: <s> <l> raid \
|
|
||||||
2: <raid_type> <#devices> <health_chars> \
|
1: <s> <l> raid \
|
||||||
3: <sync_ratio> <sync_action> <mismatch_cnt>
|
2: <raid_type> <#devices> <health_chars> \
|
||||||
|
3: <sync_ratio> <sync_action> <mismatch_cnt>
|
||||||
|
|
||||||
Line 1 is the standard output produced by device-mapper.
|
Line 1 is the standard output produced by device-mapper.
|
||||||
Line 2 & 3 are produced by the raid target and are best explained by example:
|
|
||||||
|
Line 2 & 3 are produced by the raid target and are best explained by example::
|
||||||
|
|
||||||
0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
|
0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
|
||||||
|
|
||||||
Here we can see the RAID type is raid4, there are 5 devices - all of
|
Here we can see the RAID type is raid4, there are 5 devices - all of
|
||||||
which are 'A'live, and the array is 2/490221568 complete with its initial
|
which are 'A'live, and the array is 2/490221568 complete with its initial
|
||||||
recovery. Here is a fuller description of the individual fields:
|
recovery. Here is a fuller description of the individual fields:
|
||||||
|
|
||||||
|
=============== =========================================================
|
||||||
<raid_type> Same as the <raid_type> used to create the array.
|
<raid_type> Same as the <raid_type> used to create the array.
|
||||||
<health_chars> One char for each device, indicating: 'A' = alive and
|
<health_chars> One char for each device, indicating:
|
||||||
in-sync, 'a' = alive but not in-sync, 'D' = dead/failed.
|
|
||||||
|
- 'A' = alive and in-sync
|
||||||
|
- 'a' = alive but not in-sync
|
||||||
|
- 'D' = dead/failed.
|
||||||
<sync_ratio> The ratio indicating how much of the array has undergone
|
<sync_ratio> The ratio indicating how much of the array has undergone
|
||||||
the process described by 'sync_action'. If the
|
the process described by 'sync_action'. If the
|
||||||
'sync_action' is "check" or "repair", then the process
|
'sync_action' is "check" or "repair", then the process
|
||||||
of "resync" or "recover" can be considered complete.
|
of "resync" or "recover" can be considered complete.
|
||||||
<sync_action> One of the following possible states:
|
<sync_action> One of the following possible states:
|
||||||
idle - No synchronization action is being performed.
|
|
||||||
frozen - The current action has been halted.
|
idle
|
||||||
resync - Array is undergoing its initial synchronization
|
- No synchronization action is being performed.
|
||||||
|
frozen
|
||||||
|
- The current action has been halted.
|
||||||
|
resync
|
||||||
|
- Array is undergoing its initial synchronization
|
||||||
or is resynchronizing after an unclean shutdown
|
or is resynchronizing after an unclean shutdown
|
||||||
(possibly aided by a bitmap).
|
(possibly aided by a bitmap).
|
||||||
recover - A device in the array is being rebuilt or
|
recover
|
||||||
|
- A device in the array is being rebuilt or
|
||||||
replaced.
|
replaced.
|
||||||
check - A user-initiated full check of the array is
|
check
|
||||||
|
- A user-initiated full check of the array is
|
||||||
being performed. All blocks are read and
|
being performed. All blocks are read and
|
||||||
checked for consistency. The number of
|
checked for consistency. The number of
|
||||||
discrepancies found are recorded in
|
discrepancies found are recorded in
|
||||||
<mismatch_cnt>. No changes are made to the
|
<mismatch_cnt>. No changes are made to the
|
||||||
array by this action.
|
array by this action.
|
||||||
repair - The same as "check", but discrepancies are
|
repair
|
||||||
|
- The same as "check", but discrepancies are
|
||||||
corrected.
|
corrected.
|
||||||
reshape - The array is undergoing a reshape.
|
reshape
|
||||||
|
- The array is undergoing a reshape.
|
||||||
<mismatch_cnt> The number of discrepancies found between mirror copies
|
<mismatch_cnt> The number of discrepancies found between mirror copies
|
||||||
in RAID1/10 or wrong parity values found in RAID4/5/6.
|
in RAID1/10 or wrong parity values found in RAID4/5/6.
|
||||||
This value is valid only after a "check" of the array
|
This value is valid only after a "check" of the array
|
||||||
|
@ -261,10 +318,11 @@ recovery. Here is a fuller description of the individual fields:
|
||||||
<data_offset> The current data offset to the start of the user data on
|
<data_offset> The current data offset to the start of the user data on
|
||||||
each component device of a raid set (see the respective
|
each component device of a raid set (see the respective
|
||||||
raid parameter to support out-of-place reshaping).
|
raid parameter to support out-of-place reshaping).
|
||||||
<journal_char> 'A' - active write-through journal device.
|
<journal_char> - 'A' - active write-through journal device.
|
||||||
'a' - active write-back journal device.
|
- 'a' - active write-back journal device.
|
||||||
'D' - dead journal device.
|
- 'D' - dead journal device.
|
||||||
'-' - no journal device.
|
- '-' - no journal device.
|
||||||
|
=============== =========================================================
|
||||||
|
|
||||||
|
|
||||||
Message Interface
|
Message Interface
|
||||||
|
@ -272,12 +330,15 @@ Message Interface
|
||||||
The dm-raid target will accept certain actions through the 'message' interface.
|
The dm-raid target will accept certain actions through the 'message' interface.
|
||||||
('man dmsetup' for more information on the message interface.) These actions
|
('man dmsetup' for more information on the message interface.) These actions
|
||||||
include:
|
include:
|
||||||
"idle" - Halt the current sync action.
|
|
||||||
"frozen" - Freeze the current sync action.
|
========= ================================================
|
||||||
"resync" - Initiate/continue a resync.
|
"idle" Halt the current sync action.
|
||||||
"recover"- Initiate/continue a recover process.
|
"frozen" Freeze the current sync action.
|
||||||
"check" - Initiate a check (i.e. a "scrub") of the array.
|
"resync" Initiate/continue a resync.
|
||||||
"repair" - Initiate a repair of the array.
|
"recover" Initiate/continue a recover process.
|
||||||
|
"check" Initiate a check (i.e. a "scrub") of the array.
|
||||||
|
"repair" Initiate a repair of the array.
|
||||||
|
========= ================================================
|
||||||
|
|
||||||
|
|
||||||
Discard Support
|
Discard Support
|
||||||
|
@ -307,48 +368,52 @@ increasingly whitelisted in the kernel and can thus be trusted.
|
||||||
|
|
||||||
For trusted devices, the following dm-raid module parameter can be set
|
For trusted devices, the following dm-raid module parameter can be set
|
||||||
to safely enable discard support for RAID 4/5/6:
|
to safely enable discard support for RAID 4/5/6:
|
||||||
|
|
||||||
'devices_handle_discards_safely'
|
'devices_handle_discards_safely'
|
||||||
|
|
||||||
|
|
||||||
Version History
|
Version History
|
||||||
---------------
|
---------------
|
||||||
1.0.0 Initial version. Support for RAID 4/5/6
|
|
||||||
1.1.0 Added support for RAID 1
|
::
|
||||||
1.2.0 Handle creation of arrays that contain failed devices.
|
|
||||||
1.3.0 Added support for RAID 10
|
1.0.0 Initial version. Support for RAID 4/5/6
|
||||||
1.3.1 Allow device replacement/rebuild for RAID 10
|
1.1.0 Added support for RAID 1
|
||||||
1.3.2 Fix/improve redundancy checking for RAID10
|
1.2.0 Handle creation of arrays that contain failed devices.
|
||||||
1.4.0 Non-functional change. Removes arg from mapping function.
|
1.3.0 Added support for RAID 10
|
||||||
1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5).
|
1.3.1 Allow device replacement/rebuild for RAID 10
|
||||||
1.4.2 Add RAID10 "far" and "offset" algorithm support.
|
1.3.2 Fix/improve redundancy checking for RAID10
|
||||||
1.5.0 Add message interface to allow manipulation of the sync_action.
|
1.4.0 Non-functional change. Removes arg from mapping function.
|
||||||
|
1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5).
|
||||||
|
1.4.2 Add RAID10 "far" and "offset" algorithm support.
|
||||||
|
1.5.0 Add message interface to allow manipulation of the sync_action.
|
||||||
New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
|
New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
|
||||||
1.5.1 Add ability to restore transiently failed devices on resume.
|
1.5.1 Add ability to restore transiently failed devices on resume.
|
||||||
1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
|
1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
|
||||||
1.6.0 Add discard support (and devices_handle_discard_safely module param).
|
1.6.0 Add discard support (and devices_handle_discard_safely module param).
|
||||||
1.7.0 Add support for MD RAID0 mappings.
|
1.7.0 Add support for MD RAID0 mappings.
|
||||||
1.8.0 Explicitly check for compatible flags in the superblock metadata
|
1.8.0 Explicitly check for compatible flags in the superblock metadata
|
||||||
and reject to start the raid set if any are set by a newer
|
and reject to start the raid set if any are set by a newer
|
||||||
target version, thus avoiding data corruption on a raid set
|
target version, thus avoiding data corruption on a raid set
|
||||||
with a reshape in progress.
|
with a reshape in progress.
|
||||||
1.9.0 Add support for RAID level takeover/reshape/region size
|
1.9.0 Add support for RAID level takeover/reshape/region size
|
||||||
and set size reduction.
|
and set size reduction.
|
||||||
1.9.1 Fix activation of existing RAID 4/10 mapped devices
|
1.9.1 Fix activation of existing RAID 4/10 mapped devices
|
||||||
1.9.2 Don't emit '- -' on the status table line in case the constructor
|
1.9.2 Don't emit '- -' on the status table line in case the constructor
|
||||||
fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
|
fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
|
||||||
'D' on the status line. If '- -' is passed into the constructor, emit
|
'D' on the status line. If '- -' is passed into the constructor, emit
|
||||||
'- -' on the table line and '-' as the status line health character.
|
'- -' on the table line and '-' as the status line health character.
|
||||||
1.10.0 Add support for raid4/5/6 journal device
|
1.10.0 Add support for raid4/5/6 journal device
|
||||||
1.10.1 Fix data corruption on reshape request
|
1.10.1 Fix data corruption on reshape request
|
||||||
1.11.0 Fix table line argument order
|
1.11.0 Fix table line argument order
|
||||||
(wrong raid10_copies/raid10_format sequence)
|
(wrong raid10_copies/raid10_format sequence)
|
||||||
1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
|
1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
|
||||||
1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
|
1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
|
||||||
1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
|
1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
|
||||||
1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an
|
1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an
|
||||||
state races.
|
state races.
|
||||||
1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
|
1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
|
||||||
1.14.0 Fix reshape race on small devices. Fix stripe adding reshape
|
1.14.0 Fix reshape race on small devices. Fix stripe adding reshape
|
||||||
deadlock/potential data corruption. Update superblock when
|
deadlock/potential data corruption. Update superblock when
|
||||||
specific devices are requested via rebuild. Fix RAID leg
|
specific devices are requested via rebuild. Fix RAID leg
|
||||||
rebuild errors.
|
rebuild errors.
|
|
@ -1,3 +1,4 @@
|
||||||
|
===============
|
||||||
dm-service-time
|
dm-service-time
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
@ -12,25 +13,34 @@ in a path-group, and it can be specified as a table argument.
|
||||||
|
|
||||||
The path selector name is 'service-time'.
|
The path selector name is 'service-time'.
|
||||||
|
|
||||||
Table parameters for each path: [<repeat_count> [<relative_throughput>]]
|
Table parameters for each path:
|
||||||
<repeat_count>: The number of I/Os to dispatch using the selected
|
|
||||||
|
[<repeat_count> [<relative_throughput>]]
|
||||||
|
<repeat_count>:
|
||||||
|
The number of I/Os to dispatch using the selected
|
||||||
path before switching to the next path.
|
path before switching to the next path.
|
||||||
If not given, internal default is used. To check
|
If not given, internal default is used. To check
|
||||||
the default value, see the activated table.
|
the default value, see the activated table.
|
||||||
<relative_throughput>: The relative throughput value of the path
|
<relative_throughput>:
|
||||||
|
The relative throughput value of the path
|
||||||
among all paths in the path-group.
|
among all paths in the path-group.
|
||||||
The valid range is 0-100.
|
The valid range is 0-100.
|
||||||
If not given, minimum value '1' is used.
|
If not given, minimum value '1' is used.
|
||||||
If '0' is given, the path isn't selected while
|
If '0' is given, the path isn't selected while
|
||||||
other paths having a positive value are available.
|
other paths having a positive value are available.
|
||||||
|
|
||||||
Status for each path: <status> <fail-count> <in-flight-size> \
|
Status for each path:
|
||||||
<relative_throughput>
|
|
||||||
<status>: 'A' if the path is active, 'F' if the path is failed.
|
<status> <fail-count> <in-flight-size> <relative_throughput>
|
||||||
<fail-count>: The number of path failures.
|
<status>:
|
||||||
<in-flight-size>: The size of in-flight I/Os on the path.
|
'A' if the path is active, 'F' if the path is failed.
|
||||||
<relative_throughput>: The relative throughput value of the path
|
<fail-count>:
|
||||||
among all paths in the path-group.
|
The number of path failures.
|
||||||
|
<in-flight-size>:
|
||||||
|
The size of in-flight I/Os on the path.
|
||||||
|
<relative_throughput>:
|
||||||
|
The relative throughput value of the path
|
||||||
|
among all paths in the path-group.
|
||||||
|
|
||||||
|
|
||||||
Algorithm
|
Algorithm
|
||||||
|
@ -39,7 +49,7 @@ Algorithm
|
||||||
dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
|
dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
|
||||||
dispatched and subtracts when completed.
|
dispatched and subtracts when completed.
|
||||||
Basically, dm-service-time selects a path having minimum service time
|
Basically, dm-service-time selects a path having minimum service time
|
||||||
which is calculated by:
|
which is calculated by::
|
||||||
|
|
||||||
('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
|
('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
|
||||||
|
|
||||||
|
@ -67,25 +77,25 @@ Examples
|
||||||
========
|
========
|
||||||
In case that 2 paths (sda and sdb) are used with repeat_count == 128
|
In case that 2 paths (sda and sdb) are used with repeat_count == 128
|
||||||
and sda has an average throughput 1GB/s and sdb has 4GB/s,
|
and sda has an average throughput 1GB/s and sdb has 4GB/s,
|
||||||
'relative_throughput' value may be '1' for sda and '4' for sdb.
|
'relative_throughput' value may be '1' for sda and '4' for sdb::
|
||||||
|
|
||||||
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
|
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
|
||||||
dmsetup create test
|
dmsetup create test
|
||||||
#
|
#
|
||||||
# dmsetup table
|
# dmsetup table
|
||||||
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
|
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
|
||||||
#
|
#
|
||||||
# dmsetup status
|
# dmsetup status
|
||||||
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
|
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
|
||||||
|
|
||||||
|
|
||||||
Or '2' for sda and '8' for sdb would be also true.
|
Or '2' for sda and '8' for sdb would be also true::
|
||||||
|
|
||||||
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
|
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
|
||||||
dmsetup create test
|
dmsetup create test
|
||||||
#
|
#
|
||||||
# dmsetup table
|
# dmsetup table
|
||||||
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
|
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
|
||||||
#
|
#
|
||||||
# dmsetup status
|
# dmsetup status
|
||||||
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
|
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
|
|
@ -0,0 +1,110 @@
|
||||||
|
====================
|
||||||
|
device-mapper uevent
|
||||||
|
====================
|
||||||
|
|
||||||
|
The device-mapper uevent code adds the capability to device-mapper to create
|
||||||
|
and send kobject uevents (uevents). Previously device-mapper events were only
|
||||||
|
available through the ioctl interface. The advantage of the uevents interface
|
||||||
|
is the event contains environment attributes providing increased context for
|
||||||
|
the event avoiding the need to query the state of the device-mapper device after
|
||||||
|
the event is received.
|
||||||
|
|
||||||
|
There are two functions currently for device-mapper events. The first function
|
||||||
|
listed creates the event and the second function sends the event(s)::
|
||||||
|
|
||||||
|
void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
|
||||||
|
const char *path, unsigned nr_valid_paths)
|
||||||
|
|
||||||
|
void dm_send_uevents(struct list_head *events, struct kobject *kobj)
|
||||||
|
|
||||||
|
|
||||||
|
The variables added to the uevent environment are:
|
||||||
|
|
||||||
|
Variable Name: DM_TARGET
|
||||||
|
------------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: string
|
||||||
|
:Description:
|
||||||
|
:Value: Name of device-mapper target that generated the event.
|
||||||
|
|
||||||
|
Variable Name: DM_ACTION
|
||||||
|
------------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: string
|
||||||
|
:Description:
|
||||||
|
:Value: Device-mapper specific action that caused the uevent action.
|
||||||
|
PATH_FAILED - A path has failed;
|
||||||
|
PATH_REINSTATED - A path has been reinstated.
|
||||||
|
|
||||||
|
Variable Name: DM_SEQNUM
|
||||||
|
------------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: unsigned integer
|
||||||
|
:Description: A sequence number for this specific device-mapper device.
|
||||||
|
:Value: Valid unsigned integer range.
|
||||||
|
|
||||||
|
Variable Name: DM_PATH
|
||||||
|
----------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: string
|
||||||
|
:Description: Major and minor number of the path device pertaining to this
|
||||||
|
event.
|
||||||
|
:Value: Path name in the form of "Major:Minor"
|
||||||
|
|
||||||
|
Variable Name: DM_NR_VALID_PATHS
|
||||||
|
--------------------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: unsigned integer
|
||||||
|
:Description:
|
||||||
|
:Value: Valid unsigned integer range.
|
||||||
|
|
||||||
|
Variable Name: DM_NAME
|
||||||
|
----------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: string
|
||||||
|
:Description: Name of the device-mapper device.
|
||||||
|
:Value: Name
|
||||||
|
|
||||||
|
Variable Name: DM_UUID
|
||||||
|
----------------------
|
||||||
|
:Uevent Action(s): KOBJ_CHANGE
|
||||||
|
:Type: string
|
||||||
|
:Description: UUID of the device-mapper device.
|
||||||
|
:Value: UUID. (Empty string if there isn't one.)
|
||||||
|
|
||||||
|
An example of the uevents generated as captured by udevmonitor is shown
|
||||||
|
below
|
||||||
|
|
||||||
|
1.) Path failure::
|
||||||
|
|
||||||
|
UEVENT[1192521009.711215] change@/block/dm-3
|
||||||
|
ACTION=change
|
||||||
|
DEVPATH=/block/dm-3
|
||||||
|
SUBSYSTEM=block
|
||||||
|
DM_TARGET=multipath
|
||||||
|
DM_ACTION=PATH_FAILED
|
||||||
|
DM_SEQNUM=1
|
||||||
|
DM_PATH=8:32
|
||||||
|
DM_NR_VALID_PATHS=0
|
||||||
|
DM_NAME=mpath2
|
||||||
|
DM_UUID=mpath-35333333000002328
|
||||||
|
MINOR=3
|
||||||
|
MAJOR=253
|
||||||
|
SEQNUM=1130
|
||||||
|
|
||||||
|
2.) Path reinstate::
|
||||||
|
|
||||||
|
UEVENT[1192521132.989927] change@/block/dm-3
|
||||||
|
ACTION=change
|
||||||
|
DEVPATH=/block/dm-3
|
||||||
|
SUBSYSTEM=block
|
||||||
|
DM_TARGET=multipath
|
||||||
|
DM_ACTION=PATH_REINSTATED
|
||||||
|
DM_SEQNUM=2
|
||||||
|
DM_PATH=8:32
|
||||||
|
DM_NR_VALID_PATHS=1
|
||||||
|
DM_NAME=mpath2
|
||||||
|
DM_UUID=mpath-35333333000002328
|
||||||
|
MINOR=3
|
||||||
|
MAJOR=253
|
||||||
|
SEQNUM=1131
|
|
@ -1,97 +0,0 @@
|
||||||
The device-mapper uevent code adds the capability to device-mapper to create
|
|
||||||
and send kobject uevents (uevents). Previously device-mapper events were only
|
|
||||||
available through the ioctl interface. The advantage of the uevents interface
|
|
||||||
is the event contains environment attributes providing increased context for
|
|
||||||
the event avoiding the need to query the state of the device-mapper device after
|
|
||||||
the event is received.
|
|
||||||
|
|
||||||
There are two functions currently for device-mapper events. The first function
|
|
||||||
listed creates the event and the second function sends the event(s).
|
|
||||||
|
|
||||||
void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
|
|
||||||
const char *path, unsigned nr_valid_paths)
|
|
||||||
|
|
||||||
void dm_send_uevents(struct list_head *events, struct kobject *kobj)
|
|
||||||
|
|
||||||
|
|
||||||
The variables added to the uevent environment are:
|
|
||||||
|
|
||||||
Variable Name: DM_TARGET
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: string
|
|
||||||
Description:
|
|
||||||
Value: Name of device-mapper target that generated the event.
|
|
||||||
|
|
||||||
Variable Name: DM_ACTION
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: string
|
|
||||||
Description:
|
|
||||||
Value: Device-mapper specific action that caused the uevent action.
|
|
||||||
PATH_FAILED - A path has failed.
|
|
||||||
PATH_REINSTATED - A path has been reinstated.
|
|
||||||
|
|
||||||
Variable Name: DM_SEQNUM
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: unsigned integer
|
|
||||||
Description: A sequence number for this specific device-mapper device.
|
|
||||||
Value: Valid unsigned integer range.
|
|
||||||
|
|
||||||
Variable Name: DM_PATH
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: string
|
|
||||||
Description: Major and minor number of the path device pertaining to this
|
|
||||||
event.
|
|
||||||
Value: Path name in the form of "Major:Minor"
|
|
||||||
|
|
||||||
Variable Name: DM_NR_VALID_PATHS
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: unsigned integer
|
|
||||||
Description:
|
|
||||||
Value: Valid unsigned integer range.
|
|
||||||
|
|
||||||
Variable Name: DM_NAME
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: string
|
|
||||||
Description: Name of the device-mapper device.
|
|
||||||
Value: Name
|
|
||||||
|
|
||||||
Variable Name: DM_UUID
|
|
||||||
Uevent Action(s): KOBJ_CHANGE
|
|
||||||
Type: string
|
|
||||||
Description: UUID of the device-mapper device.
|
|
||||||
Value: UUID. (Empty string if there isn't one.)
|
|
||||||
|
|
||||||
An example of the uevents generated as captured by udevmonitor is shown
|
|
||||||
below.
|
|
||||||
|
|
||||||
1.) Path failure.
|
|
||||||
UEVENT[1192521009.711215] change@/block/dm-3
|
|
||||||
ACTION=change
|
|
||||||
DEVPATH=/block/dm-3
|
|
||||||
SUBSYSTEM=block
|
|
||||||
DM_TARGET=multipath
|
|
||||||
DM_ACTION=PATH_FAILED
|
|
||||||
DM_SEQNUM=1
|
|
||||||
DM_PATH=8:32
|
|
||||||
DM_NR_VALID_PATHS=0
|
|
||||||
DM_NAME=mpath2
|
|
||||||
DM_UUID=mpath-35333333000002328
|
|
||||||
MINOR=3
|
|
||||||
MAJOR=253
|
|
||||||
SEQNUM=1130
|
|
||||||
|
|
||||||
2.) Path reinstate.
|
|
||||||
UEVENT[1192521132.989927] change@/block/dm-3
|
|
||||||
ACTION=change
|
|
||||||
DEVPATH=/block/dm-3
|
|
||||||
SUBSYSTEM=block
|
|
||||||
DM_TARGET=multipath
|
|
||||||
DM_ACTION=PATH_REINSTATED
|
|
||||||
DM_SEQNUM=2
|
|
||||||
DM_PATH=8:32
|
|
||||||
DM_NR_VALID_PATHS=1
|
|
||||||
DM_NAME=mpath2
|
|
||||||
DM_UUID=mpath-35333333000002328
|
|
||||||
MINOR=3
|
|
||||||
MAJOR=253
|
|
||||||
SEQNUM=1131
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
========
|
||||||
dm-zoned
|
dm-zoned
|
||||||
========
|
========
|
||||||
|
|
||||||
|
@ -133,12 +134,13 @@ A zoned block device must first be formatted using the dmzadm tool. This
|
||||||
will analyze the device zone configuration, determine where to place the
|
will analyze the device zone configuration, determine where to place the
|
||||||
metadata sets on the device and initialize the metadata sets.
|
metadata sets on the device and initialize the metadata sets.
|
||||||
|
|
||||||
Ex:
|
Ex::
|
||||||
|
|
||||||
dmzadm --format /dev/sdxx
|
dmzadm --format /dev/sdxx
|
||||||
|
|
||||||
For a formatted device, the target can be created normally with the
|
For a formatted device, the target can be created normally with the
|
||||||
dmsetup utility. The only parameter that dm-zoned requires is the
|
dmsetup utility. The only parameter that dm-zoned requires is the
|
||||||
underlying zoned block device name. Ex:
|
underlying zoned block device name. Ex::
|
||||||
|
|
||||||
echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | dmsetup create dmz-`basename ${dev}`
|
echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
|
||||||
|
dmsetup create dmz-`basename ${dev}`
|
|
@ -1,3 +1,7 @@
|
||||||
|
======
|
||||||
|
dm-era
|
||||||
|
======
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
|
@ -14,12 +18,14 @@ coherency after rolling back a vendor snapshot.
|
||||||
Constructor
|
Constructor
|
||||||
===========
|
===========
|
||||||
|
|
||||||
era <metadata dev> <origin dev> <block size>
|
era <metadata dev> <origin dev> <block size>
|
||||||
|
|
||||||
metadata dev : fast device holding the persistent metadata
|
================ ======================================================
|
||||||
origin dev : device holding data blocks that may change
|
metadata dev fast device holding the persistent metadata
|
||||||
block size : block size of origin data device, granularity that is
|
origin dev device holding data blocks that may change
|
||||||
tracked by the target
|
block size block size of origin data device, granularity that is
|
||||||
|
tracked by the target
|
||||||
|
================ ======================================================
|
||||||
|
|
||||||
Messages
|
Messages
|
||||||
========
|
========
|
||||||
|
@ -49,14 +55,16 @@ Status
|
||||||
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
|
<metadata block size> <#used metadata blocks>/<#total metadata blocks>
|
||||||
<current era> <held metadata root | '-'>
|
<current era> <held metadata root | '-'>
|
||||||
|
|
||||||
metadata block size : Fixed block size for each metadata block in
|
========================= ==============================================
|
||||||
sectors
|
metadata block size Fixed block size for each metadata block in
|
||||||
#used metadata blocks : Number of metadata blocks used
|
sectors
|
||||||
#total metadata blocks : Total number of metadata blocks
|
#used metadata blocks Number of metadata blocks used
|
||||||
current era : The current era
|
#total metadata blocks Total number of metadata blocks
|
||||||
held metadata root : The location, in blocks, of the metadata root
|
current era The current era
|
||||||
that has been 'held' for userspace read
|
held metadata root The location, in blocks, of the metadata root
|
||||||
access. '-' indicates there is no held root
|
that has been 'held' for userspace read
|
||||||
|
access. '-' indicates there is no held root
|
||||||
|
========================= ==============================================
|
||||||
|
|
||||||
Detailed use case
|
Detailed use case
|
||||||
=================
|
=================
|
||||||
|
@ -88,7 +96,7 @@ Memory usage
|
||||||
|
|
||||||
The target uses a bitset to record writes in the current era. It also
|
The target uses a bitset to record writes in the current era. It also
|
||||||
has a spare bitset ready for switching over to a new era. Other than
|
has a spare bitset ready for switching over to a new era. Other than
|
||||||
that it uses a few 4k blocks for updating metadata.
|
that it uses a few 4k blocks for updating metadata::
|
||||||
|
|
||||||
(4 * nr_blocks) bytes + buffers
|
(4 * nr_blocks) bytes + buffers
|
||||||
|
|
|
@ -0,0 +1,44 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
=============
|
||||||
|
Device Mapper
|
||||||
|
=============
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
cache-policies
|
||||||
|
cache
|
||||||
|
delay
|
||||||
|
dm-crypt
|
||||||
|
dm-flakey
|
||||||
|
dm-init
|
||||||
|
dm-integrity
|
||||||
|
dm-io
|
||||||
|
dm-log
|
||||||
|
dm-queue-length
|
||||||
|
dm-raid
|
||||||
|
dm-service-time
|
||||||
|
dm-uevent
|
||||||
|
dm-zoned
|
||||||
|
era
|
||||||
|
kcopyd
|
||||||
|
linear
|
||||||
|
log-writes
|
||||||
|
persistent-data
|
||||||
|
snapshot
|
||||||
|
statistics
|
||||||
|
striped
|
||||||
|
switch
|
||||||
|
thin-provisioning
|
||||||
|
unstriped
|
||||||
|
verity
|
||||||
|
writecache
|
||||||
|
zero
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,3 +1,4 @@
|
||||||
|
======
|
||||||
kcopyd
|
kcopyd
|
||||||
======
|
======
|
||||||
|
|
||||||
|
@ -7,7 +8,7 @@ notification. It is used by dm-snapshot and dm-mirror.
|
||||||
|
|
||||||
Users of kcopyd must first create a client and indicate how many memory pages
|
Users of kcopyd must first create a client and indicate how many memory pages
|
||||||
to set aside for their copy jobs. This is done with a call to
|
to set aside for their copy jobs. This is done with a call to
|
||||||
kcopyd_client_create().
|
kcopyd_client_create()::
|
||||||
|
|
||||||
int kcopyd_client_create(unsigned int num_pages,
|
int kcopyd_client_create(unsigned int num_pages,
|
||||||
struct kcopyd_client **result);
|
struct kcopyd_client **result);
|
||||||
|
@ -16,7 +17,7 @@ To start a copy job, the user must set up io_region structures to describe
|
||||||
the source and destinations of the copy. Each io_region indicates a
|
the source and destinations of the copy. Each io_region indicates a
|
||||||
block-device along with the starting sector and size of the region. The source
|
block-device along with the starting sector and size of the region. The source
|
||||||
of the copy is given as one io_region structure, and the destinations of the
|
of the copy is given as one io_region structure, and the destinations of the
|
||||||
copy are given as an array of io_region structures.
|
copy are given as an array of io_region structures::
|
||||||
|
|
||||||
struct io_region {
|
struct io_region {
|
||||||
struct block_device *bdev;
|
struct block_device *bdev;
|
||||||
|
@ -26,7 +27,7 @@ copy are given as an array of io_region structures.
|
||||||
|
|
||||||
To start the copy, the user calls kcopyd_copy(), passing in the client
|
To start the copy, the user calls kcopyd_copy(), passing in the client
|
||||||
pointer, pointers to the source and destination io_regions, the name of a
|
pointer, pointers to the source and destination io_regions, the name of a
|
||||||
completion callback routine, and a pointer to some context data for the copy.
|
completion callback routine, and a pointer to some context data for the copy::
|
||||||
|
|
||||||
int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
|
int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
|
||||||
unsigned int num_dests, struct io_region *dests,
|
unsigned int num_dests, struct io_region *dests,
|
||||||
|
@ -41,7 +42,6 @@ write error occurred during the copy.
|
||||||
|
|
||||||
When a user is done with all their copy jobs, they should call
|
When a user is done with all their copy jobs, they should call
|
||||||
kcopyd_client_destroy() to delete the kcopyd client, which will release the
|
kcopyd_client_destroy() to delete the kcopyd client, which will release the
|
||||||
associated memory pages.
|
associated memory pages::
|
||||||
|
|
||||||
void kcopyd_client_destroy(struct kcopyd_client *kc);
|
void kcopyd_client_destroy(struct kcopyd_client *kc);
|
||||||
|
|
|
@ -0,0 +1,63 @@
|
||||||
|
=========
|
||||||
|
dm-linear
|
||||||
|
=========
|
||||||
|
|
||||||
|
Device-Mapper's "linear" target maps a linear range of the Device-Mapper
|
||||||
|
device onto a linear range of another device. This is the basic building
|
||||||
|
block of logical volume managers.
|
||||||
|
|
||||||
|
Parameters: <dev path> <offset>
|
||||||
|
<dev path>:
|
||||||
|
Full pathname to the underlying block-device, or a
|
||||||
|
"major:minor" device-number.
|
||||||
|
<offset>:
|
||||||
|
Starting sector within the device.
|
||||||
|
|
||||||
|
|
||||||
|
Example scripts
|
||||||
|
===============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#!/bin/sh
|
||||||
|
# Create an identity mapping for a device
|
||||||
|
echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#!/bin/sh
|
||||||
|
# Join 2 devices together
|
||||||
|
size1=`blockdev --getsz $1`
|
||||||
|
size2=`blockdev --getsz $2`
|
||||||
|
echo "0 $size1 linear $1 0
|
||||||
|
$size1 $size2 linear $2 0" | dmsetup create joined
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#!/usr/bin/perl -w
|
||||||
|
# Split a device into 4M chunks and then join them together in reverse order.
|
||||||
|
|
||||||
|
my $name = "reverse";
|
||||||
|
my $extent_size = 4 * 1024 * 2;
|
||||||
|
my $dev = $ARGV[0];
|
||||||
|
my $table = "";
|
||||||
|
my $count = 0;
|
||||||
|
|
||||||
|
if (!defined($dev)) {
|
||||||
|
die("Please specify a device.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
my $dev_size = `blockdev --getsz $dev`;
|
||||||
|
my $extents = int($dev_size / $extent_size) -
|
||||||
|
(($dev_size % $extent_size) ? 1 : 0);
|
||||||
|
|
||||||
|
while ($extents > 0) {
|
||||||
|
my $this_start = $count * $extent_size;
|
||||||
|
$extents--;
|
||||||
|
$count++;
|
||||||
|
my $this_offset = $extents * $extent_size;
|
||||||
|
|
||||||
|
$table .= "$this_start $extent_size linear $dev $this_offset\n";
|
||||||
|
}
|
||||||
|
|
||||||
|
`echo \"$table\" | dmsetup create $name`;
|
|
@ -1,61 +0,0 @@
|
||||||
dm-linear
|
|
||||||
=========
|
|
||||||
|
|
||||||
Device-Mapper's "linear" target maps a linear range of the Device-Mapper
|
|
||||||
device onto a linear range of another device. This is the basic building
|
|
||||||
block of logical volume managers.
|
|
||||||
|
|
||||||
Parameters: <dev path> <offset>
|
|
||||||
<dev path>: Full pathname to the underlying block-device, or a
|
|
||||||
"major:minor" device-number.
|
|
||||||
<offset>: Starting sector within the device.
|
|
||||||
|
|
||||||
|
|
||||||
Example scripts
|
|
||||||
===============
|
|
||||||
[[
|
|
||||||
#!/bin/sh
|
|
||||||
# Create an identity mapping for a device
|
|
||||||
echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
|
|
||||||
]]
|
|
||||||
|
|
||||||
|
|
||||||
[[
|
|
||||||
#!/bin/sh
|
|
||||||
# Join 2 devices together
|
|
||||||
size1=`blockdev --getsz $1`
|
|
||||||
size2=`blockdev --getsz $2`
|
|
||||||
echo "0 $size1 linear $1 0
|
|
||||||
$size1 $size2 linear $2 0" | dmsetup create joined
|
|
||||||
]]
|
|
||||||
|
|
||||||
|
|
||||||
[[
|
|
||||||
#!/usr/bin/perl -w
|
|
||||||
# Split a device into 4M chunks and then join them together in reverse order.
|
|
||||||
|
|
||||||
my $name = "reverse";
|
|
||||||
my $extent_size = 4 * 1024 * 2;
|
|
||||||
my $dev = $ARGV[0];
|
|
||||||
my $table = "";
|
|
||||||
my $count = 0;
|
|
||||||
|
|
||||||
if (!defined($dev)) {
|
|
||||||
die("Please specify a device.\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
my $dev_size = `blockdev --getsz $dev`;
|
|
||||||
my $extents = int($dev_size / $extent_size) -
|
|
||||||
(($dev_size % $extent_size) ? 1 : 0);
|
|
||||||
|
|
||||||
while ($extents > 0) {
|
|
||||||
my $this_start = $count * $extent_size;
|
|
||||||
$extents--;
|
|
||||||
$count++;
|
|
||||||
my $this_offset = $extents * $extent_size;
|
|
||||||
|
|
||||||
$table .= "$this_start $extent_size linear $dev $this_offset\n";
|
|
||||||
}
|
|
||||||
|
|
||||||
`echo \"$table\" | dmsetup create $name`;
|
|
||||||
]]
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
=============
|
||||||
dm-log-writes
|
dm-log-writes
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
@ -25,11 +26,11 @@ completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
|
||||||
simulate the worst case scenario with regard to power failures. Consider the
|
simulate the worst case scenario with regard to power failures. Consider the
|
||||||
following example (W means write, C means complete):
|
following example (W means write, C means complete):
|
||||||
|
|
||||||
W1,W2,W3,C3,C2,Wflush,C1,Cflush
|
W1,W2,W3,C3,C2,Wflush,C1,Cflush
|
||||||
|
|
||||||
The log would show the following
|
The log would show the following:
|
||||||
|
|
||||||
W3,W2,flush,W1....
|
W3,W2,flush,W1....
|
||||||
|
|
||||||
Again this is to simulate what is actually on disk, this allows us to detect
|
Again this is to simulate what is actually on disk, this allows us to detect
|
||||||
cases where a power failure at a particular point in time would create an
|
cases where a power failure at a particular point in time would create an
|
||||||
|
@ -42,11 +43,11 @@ Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would
|
||||||
have all the DISCARD requests, and then the WRITE requests and then the FLUSH
|
have all the DISCARD requests, and then the WRITE requests and then the FLUSH
|
||||||
request. Consider the following example:
|
request. Consider the following example:
|
||||||
|
|
||||||
WRITE block 1, DISCARD block 1, FLUSH
|
WRITE block 1, DISCARD block 1, FLUSH
|
||||||
|
|
||||||
If we logged DISCARD when it completed, the replay would look like this
|
If we logged DISCARD when it completed, the replay would look like this:
|
||||||
|
|
||||||
DISCARD 1, WRITE 1, FLUSH
|
DISCARD 1, WRITE 1, FLUSH
|
||||||
|
|
||||||
which isn't quite what happened and wouldn't be caught during the log replay.
|
which isn't quite what happened and wouldn't be caught during the log replay.
|
||||||
|
|
||||||
|
@ -57,15 +58,19 @@ i) Constructor
|
||||||
|
|
||||||
log-writes <dev_path> <log_dev_path>
|
log-writes <dev_path> <log_dev_path>
|
||||||
|
|
||||||
dev_path : Device that all of the IO will go to normally.
|
============= ==============================================
|
||||||
log_dev_path : Device where the log entries are written to.
|
dev_path Device that all of the IO will go to normally.
|
||||||
|
log_dev_path Device where the log entries are written to.
|
||||||
|
============= ==============================================
|
||||||
|
|
||||||
ii) Status
|
ii) Status
|
||||||
|
|
||||||
<#logged entries> <highest allocated sector>
|
<#logged entries> <highest allocated sector>
|
||||||
|
|
||||||
#logged entries : Number of logged entries
|
=========================== ========================
|
||||||
highest allocated sector : Highest allocated sector
|
#logged entries Number of logged entries
|
||||||
|
highest allocated sector Highest allocated sector
|
||||||
|
=========================== ========================
|
||||||
|
|
||||||
iii) Messages
|
iii) Messages
|
||||||
|
|
||||||
|
@ -75,15 +80,15 @@ iii) Messages
|
||||||
For example say you want to fsck a file system after every
|
For example say you want to fsck a file system after every
|
||||||
write, but first you need to replay up to the mkfs to make sure
|
write, but first you need to replay up to the mkfs to make sure
|
||||||
we're fsck'ing something reasonable, you would do something like
|
we're fsck'ing something reasonable, you would do something like
|
||||||
this:
|
this::
|
||||||
|
|
||||||
mkfs.btrfs -f /dev/mapper/log
|
mkfs.btrfs -f /dev/mapper/log
|
||||||
dmsetup message log 0 mark mkfs
|
dmsetup message log 0 mark mkfs
|
||||||
<run test>
|
<run test>
|
||||||
|
|
||||||
This would allow you to replay the log up to the mkfs mark and
|
This would allow you to replay the log up to the mkfs mark and
|
||||||
then replay from that point on doing the fsck check in the
|
then replay from that point on doing the fsck check in the
|
||||||
interval that you want.
|
interval that you want.
|
||||||
|
|
||||||
Every log has a mark at the end labeled "dm-log-writes-end".
|
Every log has a mark at the end labeled "dm-log-writes-end".
|
||||||
|
|
||||||
|
@ -97,42 +102,42 @@ Example usage
|
||||||
=============
|
=============
|
||||||
|
|
||||||
Say you want to test fsync on your file system. You would do something like
|
Say you want to test fsync on your file system. You would do something like
|
||||||
this:
|
this::
|
||||||
|
|
||||||
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
|
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
|
||||||
dmsetup create log --table "$TABLE"
|
dmsetup create log --table "$TABLE"
|
||||||
mkfs.btrfs -f /dev/mapper/log
|
mkfs.btrfs -f /dev/mapper/log
|
||||||
dmsetup message log 0 mark mkfs
|
dmsetup message log 0 mark mkfs
|
||||||
|
|
||||||
mount /dev/mapper/log /mnt/btrfs-test
|
mount /dev/mapper/log /mnt/btrfs-test
|
||||||
<some test that does fsync at the end>
|
<some test that does fsync at the end>
|
||||||
dmsetup message log 0 mark fsync
|
dmsetup message log 0 mark fsync
|
||||||
md5sum /mnt/btrfs-test/foo
|
md5sum /mnt/btrfs-test/foo
|
||||||
umount /mnt/btrfs-test
|
umount /mnt/btrfs-test
|
||||||
|
|
||||||
dmsetup remove log
|
dmsetup remove log
|
||||||
replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
|
replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
|
||||||
mount /dev/sdb /mnt/btrfs-test
|
mount /dev/sdb /mnt/btrfs-test
|
||||||
md5sum /mnt/btrfs-test/foo
|
md5sum /mnt/btrfs-test/foo
|
||||||
<verify md5sum's are correct>
|
<verify md5sum's are correct>
|
||||||
|
|
||||||
Another option is to do a complicated file system operation and verify the file
|
Another option is to do a complicated file system operation and verify the file
|
||||||
system is consistent during the entire operation. You could do this with:
|
system is consistent during the entire operation. You could do this with:
|
||||||
|
|
||||||
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
|
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
|
||||||
dmsetup create log --table "$TABLE"
|
dmsetup create log --table "$TABLE"
|
||||||
mkfs.btrfs -f /dev/mapper/log
|
mkfs.btrfs -f /dev/mapper/log
|
||||||
dmsetup message log 0 mark mkfs
|
dmsetup message log 0 mark mkfs
|
||||||
|
|
||||||
mount /dev/mapper/log /mnt/btrfs-test
|
mount /dev/mapper/log /mnt/btrfs-test
|
||||||
<fsstress to dirty the fs>
|
<fsstress to dirty the fs>
|
||||||
btrfs filesystem balance /mnt/btrfs-test
|
btrfs filesystem balance /mnt/btrfs-test
|
||||||
umount /mnt/btrfs-test
|
umount /mnt/btrfs-test
|
||||||
dmsetup remove log
|
dmsetup remove log
|
||||||
|
|
||||||
replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
|
replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
|
||||||
btrfsck /dev/sdb
|
btrfsck /dev/sdb
|
||||||
replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
|
replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
|
||||||
--fsck "btrfsck /dev/sdb" --check fua
|
--fsck "btrfsck /dev/sdb" --check fua
|
||||||
|
|
||||||
And that will replay the log until it sees a FUA request, run the fsck command
|
And that will replay the log until it sees a FUA request, run the fsck command
|
|
@ -1,3 +1,7 @@
|
||||||
|
===============
|
||||||
|
Persistent data
|
||||||
|
===============
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
|
@ -1,15 +1,16 @@
|
||||||
|
==============================
|
||||||
Device-mapper snapshot support
|
Device-mapper snapshot support
|
||||||
==============================
|
==============================
|
||||||
|
|
||||||
Device-mapper allows you, without massive data copying:
|
Device-mapper allows you, without massive data copying:
|
||||||
|
|
||||||
*) To create snapshots of any block device i.e. mountable, saved states of
|
- To create snapshots of any block device i.e. mountable, saved states of
|
||||||
the block device which are also writable without interfering with the
|
the block device which are also writable without interfering with the
|
||||||
original content;
|
original content;
|
||||||
*) To create device "forks", i.e. multiple different versions of the
|
- To create device "forks", i.e. multiple different versions of the
|
||||||
same data stream.
|
same data stream.
|
||||||
*) To merge a snapshot of a block device back into the snapshot's origin
|
- To merge a snapshot of a block device back into the snapshot's origin
|
||||||
device.
|
device.
|
||||||
|
|
||||||
In the first two cases, dm copies only the chunks of data that get
|
In the first two cases, dm copies only the chunks of data that get
|
||||||
changed and uses a separate copy-on-write (COW) block device for
|
changed and uses a separate copy-on-write (COW) block device for
|
||||||
|
@ -22,7 +23,7 @@ the origin device.
|
||||||
There are three dm targets available:
|
There are three dm targets available:
|
||||||
snapshot, snapshot-origin, and snapshot-merge.
|
snapshot, snapshot-origin, and snapshot-merge.
|
||||||
|
|
||||||
*) snapshot-origin <origin>
|
- snapshot-origin <origin>
|
||||||
|
|
||||||
which will normally have one or more snapshots based on it.
|
which will normally have one or more snapshots based on it.
|
||||||
Reads will be mapped directly to the backing device. For each write, the
|
Reads will be mapped directly to the backing device. For each write, the
|
||||||
|
@ -30,7 +31,7 @@ original data will be saved in the <COW device> of each snapshot to keep
|
||||||
its visible content unchanged, at least until the <COW device> fills up.
|
its visible content unchanged, at least until the <COW device> fills up.
|
||||||
|
|
||||||
|
|
||||||
*) snapshot <origin> <COW device> <persistent?> <chunksize>
|
- snapshot <origin> <COW device> <persistent?> <chunksize>
|
||||||
|
|
||||||
A snapshot of the <origin> block device is created. Changed chunks of
|
A snapshot of the <origin> block device is created. Changed chunks of
|
||||||
<chunksize> sectors will be stored on the <COW device>. Writes will
|
<chunksize> sectors will be stored on the <COW device>. Writes will
|
||||||
|
@ -83,25 +84,25 @@ When you create the first LVM2 snapshot of a volume, four dm devices are used:
|
||||||
source volume), whose table is replaced by a "snapshot-origin" mapping
|
source volume), whose table is replaced by a "snapshot-origin" mapping
|
||||||
from device #1.
|
from device #1.
|
||||||
|
|
||||||
A fixed naming scheme is used, so with the following commands:
|
A fixed naming scheme is used, so with the following commands::
|
||||||
|
|
||||||
lvcreate -L 1G -n base volumeGroup
|
lvcreate -L 1G -n base volumeGroup
|
||||||
lvcreate -L 100M --snapshot -n snap volumeGroup/base
|
lvcreate -L 100M --snapshot -n snap volumeGroup/base
|
||||||
|
|
||||||
we'll have this situation (with volumes in above order):
|
we'll have this situation (with volumes in above order)::
|
||||||
|
|
||||||
# dmsetup table|grep volumeGroup
|
# dmsetup table|grep volumeGroup
|
||||||
|
|
||||||
volumeGroup-base-real: 0 2097152 linear 8:19 384
|
volumeGroup-base-real: 0 2097152 linear 8:19 384
|
||||||
volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
|
volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
|
||||||
volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
|
volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
|
||||||
volumeGroup-base: 0 2097152 snapshot-origin 254:11
|
volumeGroup-base: 0 2097152 snapshot-origin 254:11
|
||||||
|
|
||||||
# ls -lL /dev/mapper/volumeGroup-*
|
# ls -lL /dev/mapper/volumeGroup-*
|
||||||
brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
|
brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
|
||||||
brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
|
brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
|
||||||
brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
|
brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
|
||||||
brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
|
brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
|
||||||
|
|
||||||
|
|
||||||
How snapshot-merge is used by LVM2
|
How snapshot-merge is used by LVM2
|
||||||
|
@ -114,27 +115,28 @@ merging snapshot after it completes. The "snapshot" that hands over its
|
||||||
COW device to the "snapshot-merge" is deactivated (unless using lvchange
|
COW device to the "snapshot-merge" is deactivated (unless using lvchange
|
||||||
--refresh); but if it is left active it will simply return I/O errors.
|
--refresh); but if it is left active it will simply return I/O errors.
|
||||||
|
|
||||||
A snapshot will merge into its origin with the following command:
|
A snapshot will merge into its origin with the following command::
|
||||||
|
|
||||||
lvconvert --merge volumeGroup/snap
|
lvconvert --merge volumeGroup/snap
|
||||||
|
|
||||||
we'll now have this situation:
|
we'll now have this situation::
|
||||||
|
|
||||||
# dmsetup table|grep volumeGroup
|
# dmsetup table|grep volumeGroup
|
||||||
|
|
||||||
volumeGroup-base-real: 0 2097152 linear 8:19 384
|
volumeGroup-base-real: 0 2097152 linear 8:19 384
|
||||||
volumeGroup-base-cow: 0 204800 linear 8:19 2097536
|
volumeGroup-base-cow: 0 204800 linear 8:19 2097536
|
||||||
volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
|
volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
|
||||||
|
|
||||||
# ls -lL /dev/mapper/volumeGroup-*
|
# ls -lL /dev/mapper/volumeGroup-*
|
||||||
brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
|
brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
|
||||||
brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
|
brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
|
||||||
brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
|
brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
|
||||||
|
|
||||||
|
|
||||||
How to determine when a merging is complete
|
How to determine when a merging is complete
|
||||||
===========================================
|
===========================================
|
||||||
The snapshot-merge and snapshot status lines end with:
|
The snapshot-merge and snapshot status lines end with:
|
||||||
|
|
||||||
<sectors_allocated>/<total_sectors> <metadata_sectors>
|
<sectors_allocated>/<total_sectors> <metadata_sectors>
|
||||||
|
|
||||||
Both <sectors_allocated> and <total_sectors> include both data and metadata.
|
Both <sectors_allocated> and <total_sectors> include both data and metadata.
|
||||||
|
@ -142,35 +144,37 @@ During merging, the number of sectors allocated gets smaller and
|
||||||
smaller. Merging has finished when the number of sectors holding data
|
smaller. Merging has finished when the number of sectors holding data
|
||||||
is zero, in other words <sectors_allocated> == <metadata_sectors>.
|
is zero, in other words <sectors_allocated> == <metadata_sectors>.
|
||||||
|
|
||||||
Here is a practical example (using a hybrid of lvm and dmsetup commands):
|
Here is a practical example (using a hybrid of lvm and dmsetup commands)::
|
||||||
|
|
||||||
# lvs
|
# lvs
|
||||||
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
|
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
|
||||||
base volumeGroup owi-a- 4.00g
|
base volumeGroup owi-a- 4.00g
|
||||||
snap volumeGroup swi-a- 1.00g base 18.97
|
snap volumeGroup swi-a- 1.00g base 18.97
|
||||||
|
|
||||||
# dmsetup status volumeGroup-snap
|
# dmsetup status volumeGroup-snap
|
||||||
0 8388608 snapshot 397896/2097152 1560
|
0 8388608 snapshot 397896/2097152 1560
|
||||||
^^^^ metadata sectors
|
^^^^ metadata sectors
|
||||||
|
|
||||||
# lvconvert --merge -b volumeGroup/snap
|
# lvconvert --merge -b volumeGroup/snap
|
||||||
Merging of volume snap started.
|
Merging of volume snap started.
|
||||||
|
|
||||||
# lvs volumeGroup/snap
|
# lvs volumeGroup/snap
|
||||||
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
|
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
|
||||||
base volumeGroup Owi-a- 4.00g 17.23
|
base volumeGroup Owi-a- 4.00g 17.23
|
||||||
|
|
||||||
# dmsetup status volumeGroup-base
|
# dmsetup status volumeGroup-base
|
||||||
0 8388608 snapshot-merge 281688/2097152 1104
|
0 8388608 snapshot-merge 281688/2097152 1104
|
||||||
|
|
||||||
# dmsetup status volumeGroup-base
|
# dmsetup status volumeGroup-base
|
||||||
0 8388608 snapshot-merge 180480/2097152 712
|
0 8388608 snapshot-merge 180480/2097152 712
|
||||||
|
|
||||||
# dmsetup status volumeGroup-base
|
# dmsetup status volumeGroup-base
|
||||||
0 8388608 snapshot-merge 16/2097152 16
|
0 8388608 snapshot-merge 16/2097152 16
|
||||||
|
|
||||||
Merging has finished.
|
Merging has finished.
|
||||||
|
|
||||||
# lvs
|
::
|
||||||
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
|
|
||||||
base volumeGroup owi-a- 4.00g
|
# lvs
|
||||||
|
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
|
||||||
|
base volumeGroup owi-a- 4.00g
|
|
@ -1,3 +1,4 @@
|
||||||
|
=============
|
||||||
DM statistics
|
DM statistics
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
@ -11,7 +12,7 @@ Individual statistics will be collected for each step-sized area within
|
||||||
the range specified.
|
the range specified.
|
||||||
|
|
||||||
The I/O statistics counters for each step-sized area of a region are
|
The I/O statistics counters for each step-sized area of a region are
|
||||||
in the same format as /sys/block/*/stat or /proc/diskstats (see:
|
in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
|
||||||
Documentation/iostats.txt). But two extra counters (12 and 13) are
|
Documentation/iostats.txt). But two extra counters (12 and 13) are
|
||||||
provided: total time spent reading and writing. When the histogram
|
provided: total time spent reading and writing. When the histogram
|
||||||
argument is used, the 14th parameter is reported that represents the
|
argument is used, the 14th parameter is reported that represents the
|
||||||
|
@ -32,40 +33,45 @@ on each other's data.
|
||||||
The creation of DM statistics will allocate memory via kmalloc or
|
The creation of DM statistics will allocate memory via kmalloc or
|
||||||
fallback to using vmalloc space. At most, 1/4 of the overall system
|
fallback to using vmalloc space. At most, 1/4 of the overall system
|
||||||
memory may be allocated by DM statistics. The admin can see how much
|
memory may be allocated by DM statistics. The admin can see how much
|
||||||
memory is used by reading
|
memory is used by reading:
|
||||||
/sys/module/dm_mod/parameters/stats_current_allocated_bytes
|
|
||||||
|
/sys/module/dm_mod/parameters/stats_current_allocated_bytes
|
||||||
|
|
||||||
Messages
|
Messages
|
||||||
========
|
========
|
||||||
|
|
||||||
@stats_create <range> <step>
|
@stats_create <range> <step> [<number_of_optional_arguments> <optional_arguments>...] [<program_id> [<aux_data>]]
|
||||||
[<number_of_optional_arguments> <optional_arguments>...]
|
|
||||||
[<program_id> [<aux_data>]]
|
|
||||||
|
|
||||||
Create a new region and return the region_id.
|
Create a new region and return the region_id.
|
||||||
|
|
||||||
<range>
|
<range>
|
||||||
"-" - whole device
|
"-"
|
||||||
"<start_sector>+<length>" - a range of <length> 512-byte sectors
|
whole device
|
||||||
starting with <start_sector>.
|
"<start_sector>+<length>"
|
||||||
|
a range of <length> 512-byte sectors
|
||||||
|
starting with <start_sector>.
|
||||||
|
|
||||||
<step>
|
<step>
|
||||||
"<area_size>" - the range is subdivided into areas each containing
|
"<area_size>"
|
||||||
<area_size> sectors.
|
the range is subdivided into areas each containing
|
||||||
"/<number_of_areas>" - the range is subdivided into the specified
|
<area_size> sectors.
|
||||||
number of areas.
|
"/<number_of_areas>"
|
||||||
|
the range is subdivided into the specified
|
||||||
|
number of areas.
|
||||||
|
|
||||||
<number_of_optional_arguments>
|
<number_of_optional_arguments>
|
||||||
The number of optional arguments
|
The number of optional arguments
|
||||||
|
|
||||||
<optional_arguments>
|
<optional_arguments>
|
||||||
The following optional arguments are supported
|
The following optional arguments are supported:
|
||||||
precise_timestamps - use precise timer with nanosecond resolution
|
|
||||||
|
precise_timestamps
|
||||||
|
use precise timer with nanosecond resolution
|
||||||
instead of the "jiffies" variable. When this argument is
|
instead of the "jiffies" variable. When this argument is
|
||||||
used, the resulting times are in nanoseconds instead of
|
used, the resulting times are in nanoseconds instead of
|
||||||
milliseconds. Precise timestamps are a little bit slower
|
milliseconds. Precise timestamps are a little bit slower
|
||||||
to obtain than jiffies-based timestamps.
|
to obtain than jiffies-based timestamps.
|
||||||
histogram:n1,n2,n3,n4,... - collect histogram of latencies. The
|
histogram:n1,n2,n3,n4,...
|
||||||
|
collect histogram of latencies. The
|
||||||
numbers n1, n2, etc are times that represent the boundaries
|
numbers n1, n2, etc are times that represent the boundaries
|
||||||
of the histogram. If precise_timestamps is not used, the
|
of the histogram. If precise_timestamps is not used, the
|
||||||
times are in milliseconds, otherwise they are in
|
times are in milliseconds, otherwise they are in
|
||||||
|
@ -96,21 +102,18 @@ Messages
|
||||||
@stats_list message, but it doesn't use this value for anything.
|
@stats_list message, but it doesn't use this value for anything.
|
||||||
|
|
||||||
@stats_delete <region_id>
|
@stats_delete <region_id>
|
||||||
|
|
||||||
Delete the region with the specified id.
|
Delete the region with the specified id.
|
||||||
|
|
||||||
<region_id>
|
<region_id>
|
||||||
region_id returned from @stats_create
|
region_id returned from @stats_create
|
||||||
|
|
||||||
@stats_clear <region_id>
|
@stats_clear <region_id>
|
||||||
|
|
||||||
Clear all the counters except the in-flight i/o counters.
|
Clear all the counters except the in-flight i/o counters.
|
||||||
|
|
||||||
<region_id>
|
<region_id>
|
||||||
region_id returned from @stats_create
|
region_id returned from @stats_create
|
||||||
|
|
||||||
@stats_list [<program_id>]
|
@stats_list [<program_id>]
|
||||||
|
|
||||||
List all regions registered with @stats_create.
|
List all regions registered with @stats_create.
|
||||||
|
|
||||||
<program_id>
|
<program_id>
|
||||||
|
@ -127,7 +130,6 @@ Messages
|
||||||
if they were specified when creating the region.
|
if they were specified when creating the region.
|
||||||
|
|
||||||
@stats_print <region_id> [<starting_line> <number_of_lines>]
|
@stats_print <region_id> [<starting_line> <number_of_lines>]
|
||||||
|
|
||||||
Print counters for each step-sized area of a region.
|
Print counters for each step-sized area of a region.
|
||||||
|
|
||||||
<region_id>
|
<region_id>
|
||||||
|
@ -143,10 +145,11 @@ Messages
|
||||||
|
|
||||||
Output format for each step-sized area of a region:
|
Output format for each step-sized area of a region:
|
||||||
|
|
||||||
<start_sector>+<length> counters
|
<start_sector>+<length>
|
||||||
|
counters
|
||||||
|
|
||||||
The first 11 counters have the same meaning as
|
The first 11 counters have the same meaning as
|
||||||
/sys/block/*/stat or /proc/diskstats.
|
`/sys/block/*/stat or /proc/diskstats`.
|
||||||
|
|
||||||
Please refer to Documentation/iostats.txt for details.
|
Please refer to Documentation/iostats.txt for details.
|
||||||
|
|
||||||
|
@ -163,11 +166,11 @@ Messages
|
||||||
11. the weighted number of milliseconds spent doing I/Os
|
11. the weighted number of milliseconds spent doing I/Os
|
||||||
|
|
||||||
Additional counters:
|
Additional counters:
|
||||||
|
|
||||||
12. the total time spent reading in milliseconds
|
12. the total time spent reading in milliseconds
|
||||||
13. the total time spent writing in milliseconds
|
13. the total time spent writing in milliseconds
|
||||||
|
|
||||||
@stats_print_clear <region_id> [<starting_line> <number_of_lines>]
|
@stats_print_clear <region_id> [<starting_line> <number_of_lines>]
|
||||||
|
|
||||||
Atomically print and then clear all the counters except the
|
Atomically print and then clear all the counters except the
|
||||||
in-flight i/o counters. Useful when the client consuming the
|
in-flight i/o counters. Useful when the client consuming the
|
||||||
statistics does not want to lose any statistics (those updated
|
statistics does not want to lose any statistics (those updated
|
||||||
|
@ -185,7 +188,6 @@ Messages
|
||||||
If omitted, all lines are printed and then cleared.
|
If omitted, all lines are printed and then cleared.
|
||||||
|
|
||||||
@stats_set_aux <region_id> <aux_data>
|
@stats_set_aux <region_id> <aux_data>
|
||||||
|
|
||||||
Store auxiliary data aux_data for the specified region.
|
Store auxiliary data aux_data for the specified region.
|
||||||
|
|
||||||
<region_id>
|
<region_id>
|
||||||
|
@ -201,23 +203,23 @@ Examples
|
||||||
========
|
========
|
||||||
|
|
||||||
Subdivide the DM device 'vol' into 100 pieces and start collecting
|
Subdivide the DM device 'vol' into 100 pieces and start collecting
|
||||||
statistics on them:
|
statistics on them::
|
||||||
|
|
||||||
dmsetup message vol 0 @stats_create - /100
|
dmsetup message vol 0 @stats_create - /100
|
||||||
|
|
||||||
Set the auxiliary data string to "foo bar baz" (the escape for each
|
Set the auxiliary data string to "foo bar baz" (the escape for each
|
||||||
space must also be escaped, otherwise the shell will consume them):
|
space must also be escaped, otherwise the shell will consume them)::
|
||||||
|
|
||||||
dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
|
dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
|
||||||
|
|
||||||
List the statistics:
|
List the statistics::
|
||||||
|
|
||||||
dmsetup message vol 0 @stats_list
|
dmsetup message vol 0 @stats_list
|
||||||
|
|
||||||
Print the statistics:
|
Print the statistics::
|
||||||
|
|
||||||
dmsetup message vol 0 @stats_print 0
|
dmsetup message vol 0 @stats_print 0
|
||||||
|
|
||||||
Delete the statistics:
|
Delete the statistics::
|
||||||
|
|
||||||
dmsetup message vol 0 @stats_delete 0
|
dmsetup message vol 0 @stats_delete 0
|
|
@ -0,0 +1,61 @@
|
||||||
|
=========
|
||||||
|
dm-stripe
|
||||||
|
=========
|
||||||
|
|
||||||
|
Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0)
|
||||||
|
device across one or more underlying devices. Data is written in "chunks",
|
||||||
|
with consecutive chunks rotating among the underlying devices. This can
|
||||||
|
potentially provide improved I/O throughput by utilizing several physical
|
||||||
|
devices in parallel.
|
||||||
|
|
||||||
|
Parameters: <num devs> <chunk size> [<dev path> <offset>]+
|
||||||
|
<num devs>:
|
||||||
|
Number of underlying devices.
|
||||||
|
<chunk size>:
|
||||||
|
Size of each chunk of data. Must be at least as
|
||||||
|
large as the system's PAGE_SIZE.
|
||||||
|
<dev path>:
|
||||||
|
Full pathname to the underlying block-device, or a
|
||||||
|
"major:minor" device-number.
|
||||||
|
<offset>:
|
||||||
|
Starting sector within the device.
|
||||||
|
|
||||||
|
One or more underlying devices can be specified. The striped device size must
|
||||||
|
be a multiple of the chunk size multiplied by the number of underlying devices.
|
||||||
|
|
||||||
|
|
||||||
|
Example scripts
|
||||||
|
===============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#!/usr/bin/perl -w
|
||||||
|
# Create a striped device across any number of underlying devices. The device
|
||||||
|
# will be called "stripe_dev" and have a chunk-size of 128k.
|
||||||
|
|
||||||
|
my $chunk_size = 128 * 2;
|
||||||
|
my $dev_name = "stripe_dev";
|
||||||
|
my $num_devs = @ARGV;
|
||||||
|
my @devs = @ARGV;
|
||||||
|
my ($min_dev_size, $stripe_dev_size, $i);
|
||||||
|
|
||||||
|
if (!$num_devs) {
|
||||||
|
die("Specify at least one device\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
$min_dev_size = `blockdev --getsz $devs[0]`;
|
||||||
|
for ($i = 1; $i < $num_devs; $i++) {
|
||||||
|
my $this_size = `blockdev --getsz $devs[$i]`;
|
||||||
|
$min_dev_size = ($min_dev_size < $this_size) ?
|
||||||
|
$min_dev_size : $this_size;
|
||||||
|
}
|
||||||
|
|
||||||
|
$stripe_dev_size = $min_dev_size * $num_devs;
|
||||||
|
$stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs);
|
||||||
|
|
||||||
|
$table = "0 $stripe_dev_size striped $num_devs $chunk_size";
|
||||||
|
for ($i = 0; $i < $num_devs; $i++) {
|
||||||
|
$table .= " $devs[$i] 0";
|
||||||
|
}
|
||||||
|
|
||||||
|
`echo $table | dmsetup create $dev_name`;
|
|
@ -1,57 +0,0 @@
|
||||||
dm-stripe
|
|
||||||
=========
|
|
||||||
|
|
||||||
Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0)
|
|
||||||
device across one or more underlying devices. Data is written in "chunks",
|
|
||||||
with consecutive chunks rotating among the underlying devices. This can
|
|
||||||
potentially provide improved I/O throughput by utilizing several physical
|
|
||||||
devices in parallel.
|
|
||||||
|
|
||||||
Parameters: <num devs> <chunk size> [<dev path> <offset>]+
|
|
||||||
<num devs>: Number of underlying devices.
|
|
||||||
<chunk size>: Size of each chunk of data. Must be at least as
|
|
||||||
large as the system's PAGE_SIZE.
|
|
||||||
<dev path>: Full pathname to the underlying block-device, or a
|
|
||||||
"major:minor" device-number.
|
|
||||||
<offset>: Starting sector within the device.
|
|
||||||
|
|
||||||
One or more underlying devices can be specified. The striped device size must
|
|
||||||
be a multiple of the chunk size multiplied by the number of underlying devices.
|
|
||||||
|
|
||||||
|
|
||||||
Example scripts
|
|
||||||
===============
|
|
||||||
|
|
||||||
[[
|
|
||||||
#!/usr/bin/perl -w
|
|
||||||
# Create a striped device across any number of underlying devices. The device
|
|
||||||
# will be called "stripe_dev" and have a chunk-size of 128k.
|
|
||||||
|
|
||||||
my $chunk_size = 128 * 2;
|
|
||||||
my $dev_name = "stripe_dev";
|
|
||||||
my $num_devs = @ARGV;
|
|
||||||
my @devs = @ARGV;
|
|
||||||
my ($min_dev_size, $stripe_dev_size, $i);
|
|
||||||
|
|
||||||
if (!$num_devs) {
|
|
||||||
die("Specify at least one device\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
$min_dev_size = `blockdev --getsz $devs[0]`;
|
|
||||||
for ($i = 1; $i < $num_devs; $i++) {
|
|
||||||
my $this_size = `blockdev --getsz $devs[$i]`;
|
|
||||||
$min_dev_size = ($min_dev_size < $this_size) ?
|
|
||||||
$min_dev_size : $this_size;
|
|
||||||
}
|
|
||||||
|
|
||||||
$stripe_dev_size = $min_dev_size * $num_devs;
|
|
||||||
$stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs);
|
|
||||||
|
|
||||||
$table = "0 $stripe_dev_size striped $num_devs $chunk_size";
|
|
||||||
for ($i = 0; $i < $num_devs; $i++) {
|
|
||||||
$table .= " $devs[$i] 0";
|
|
||||||
}
|
|
||||||
|
|
||||||
`echo $table | dmsetup create $dev_name`;
|
|
||||||
]]
|
|
||||||
|
|
|
@ -1,3 +1,4 @@
|
||||||
|
=========
|
||||||
dm-switch
|
dm-switch
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
@ -67,27 +68,25 @@ b-tree can achieve.
|
||||||
Construction Parameters
|
Construction Parameters
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
<num_paths> <region_size> <num_optional_args> [<optional_args>...]
|
<num_paths> <region_size> <num_optional_args> [<optional_args>...] [<dev_path> <offset>]+
|
||||||
[<dev_path> <offset>]+
|
<num_paths>
|
||||||
|
The number of paths across which to distribute the I/O.
|
||||||
|
|
||||||
<num_paths>
|
<region_size>
|
||||||
The number of paths across which to distribute the I/O.
|
The number of 512-byte sectors in a region. Each region can be redirected
|
||||||
|
to any of the available paths.
|
||||||
|
|
||||||
<region_size>
|
<num_optional_args>
|
||||||
The number of 512-byte sectors in a region. Each region can be redirected
|
The number of optional arguments. Currently, no optional arguments
|
||||||
to any of the available paths.
|
are supported and so this must be zero.
|
||||||
|
|
||||||
<num_optional_args>
|
<dev_path>
|
||||||
The number of optional arguments. Currently, no optional arguments
|
The block device that represents a specific path to the device.
|
||||||
are supported and so this must be zero.
|
|
||||||
|
|
||||||
<dev_path>
|
<offset>
|
||||||
The block device that represents a specific path to the device.
|
The offset of the start of data on the specific <dev_path> (in units
|
||||||
|
of 512-byte sectors). This number is added to the sector number when
|
||||||
<offset>
|
forwarding the request to the specific path. Typically it is zero.
|
||||||
The offset of the start of data on the specific <dev_path> (in units
|
|
||||||
of 512-byte sectors). This number is added to the sector number when
|
|
||||||
forwarding the request to the specific path. Typically it is zero.
|
|
||||||
|
|
||||||
Messages
|
Messages
|
||||||
========
|
========
|
||||||
|
@ -122,17 +121,21 @@ Example
|
||||||
Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
|
Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
|
||||||
the same size.
|
the same size.
|
||||||
|
|
||||||
Create a switch device with 64kB region size:
|
Create a switch device with 64kB region size::
|
||||||
|
|
||||||
dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
|
dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
|
||||||
switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
|
switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
|
||||||
|
|
||||||
Set mappings for the first 7 entries to point to devices switch0, switch1,
|
Set mappings for the first 7 entries to point to devices switch0, switch1,
|
||||||
switch2, switch0, switch1, switch2, switch1:
|
switch2, switch0, switch1, switch2, switch1::
|
||||||
|
|
||||||
dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
|
dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
|
||||||
|
|
||||||
Set repetitive mapping. This command:
|
Set repetitive mapping. This command::
|
||||||
|
|
||||||
dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
|
dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
|
||||||
is equivalent to:
|
|
||||||
|
is equivalent to::
|
||||||
|
|
||||||
dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
|
dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
|
||||||
:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
|
:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
|
||||||
|
|
|
@ -1,3 +1,7 @@
|
||||||
|
=================
|
||||||
|
Thin provisioning
|
||||||
|
=================
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
|
@ -95,6 +99,8 @@ previously.)
|
||||||
Using an existing pool device
|
Using an existing pool device
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dmsetup create pool \
|
dmsetup create pool \
|
||||||
--table "0 20971520 thin-pool $metadata_dev $data_dev \
|
--table "0 20971520 thin-pool $metadata_dev $data_dev \
|
||||||
$data_block_size $low_water_mark"
|
$data_block_size $low_water_mark"
|
||||||
|
@ -154,7 +160,7 @@ Thin provisioning
|
||||||
i) Creating a new thinly-provisioned volume.
|
i) Creating a new thinly-provisioned volume.
|
||||||
|
|
||||||
To create a new thinly- provisioned volume you must send a message to an
|
To create a new thinly- provisioned volume you must send a message to an
|
||||||
active pool device, /dev/mapper/pool in this example.
|
active pool device, /dev/mapper/pool in this example::
|
||||||
|
|
||||||
dmsetup message /dev/mapper/pool 0 "create_thin 0"
|
dmsetup message /dev/mapper/pool 0 "create_thin 0"
|
||||||
|
|
||||||
|
@ -164,7 +170,7 @@ i) Creating a new thinly-provisioned volume.
|
||||||
|
|
||||||
ii) Using a thinly-provisioned volume.
|
ii) Using a thinly-provisioned volume.
|
||||||
|
|
||||||
Thinly-provisioned volumes are activated using the 'thin' target:
|
Thinly-provisioned volumes are activated using the 'thin' target::
|
||||||
|
|
||||||
dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"
|
dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"
|
||||||
|
|
||||||
|
@ -181,6 +187,8 @@ i) Creating an internal snapshot.
|
||||||
must suspend it before creating the snapshot to avoid corruption.
|
must suspend it before creating the snapshot to avoid corruption.
|
||||||
This is NOT enforced at the moment, so please be careful!
|
This is NOT enforced at the moment, so please be careful!
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dmsetup suspend /dev/mapper/thin
|
dmsetup suspend /dev/mapper/thin
|
||||||
dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
|
dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
|
||||||
dmsetup resume /dev/mapper/thin
|
dmsetup resume /dev/mapper/thin
|
||||||
|
@ -198,14 +206,14 @@ ii) Using an internal snapshot.
|
||||||
activating or removing them both. (This differs from conventional
|
activating or removing them both. (This differs from conventional
|
||||||
device-mapper snapshots.)
|
device-mapper snapshots.)
|
||||||
|
|
||||||
Activate it exactly the same way as any other thinly-provisioned volume:
|
Activate it exactly the same way as any other thinly-provisioned volume::
|
||||||
|
|
||||||
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
|
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
|
||||||
|
|
||||||
External snapshots
|
External snapshots
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
You can use an external _read only_ device as an origin for a
|
You can use an external **read only** device as an origin for a
|
||||||
thinly-provisioned volume. Any read to an unprovisioned area of the
|
thinly-provisioned volume. Any read to an unprovisioned area of the
|
||||||
thin device will be passed through to the origin. Writes trigger
|
thin device will be passed through to the origin. Writes trigger
|
||||||
the allocation of new blocks as usual.
|
the allocation of new blocks as usual.
|
||||||
|
@ -223,11 +231,13 @@ i) Creating a snapshot of an external device
|
||||||
This is the same as creating a thin device.
|
This is the same as creating a thin device.
|
||||||
You don't mention the origin at this stage.
|
You don't mention the origin at this stage.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dmsetup message /dev/mapper/pool 0 "create_thin 0"
|
dmsetup message /dev/mapper/pool 0 "create_thin 0"
|
||||||
|
|
||||||
ii) Using a snapshot of an external device.
|
ii) Using a snapshot of an external device.
|
||||||
|
|
||||||
Append an extra parameter to the thin target specifying the origin:
|
Append an extra parameter to the thin target specifying the origin::
|
||||||
|
|
||||||
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
|
dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
|
||||||
|
|
||||||
|
@ -240,6 +250,8 @@ Deactivation
|
||||||
All devices using a pool must be deactivated before the pool itself
|
All devices using a pool must be deactivated before the pool itself
|
||||||
can be.
|
can be.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dmsetup remove thin
|
dmsetup remove thin
|
||||||
dmsetup remove snap
|
dmsetup remove snap
|
||||||
dmsetup remove pool
|
dmsetup remove pool
|
||||||
|
@ -252,25 +264,32 @@ Reference
|
||||||
|
|
||||||
i) Constructor
|
i) Constructor
|
||||||
|
|
||||||
thin-pool <metadata dev> <data dev> <data block size (sectors)> \
|
::
|
||||||
<low water mark (blocks)> [<number of feature args> [<arg>]*]
|
|
||||||
|
thin-pool <metadata dev> <data dev> <data block size (sectors)> \
|
||||||
|
<low water mark (blocks)> [<number of feature args> [<arg>]*]
|
||||||
|
|
||||||
Optional feature arguments:
|
Optional feature arguments:
|
||||||
|
|
||||||
skip_block_zeroing: Skip the zeroing of newly-provisioned blocks.
|
skip_block_zeroing:
|
||||||
|
Skip the zeroing of newly-provisioned blocks.
|
||||||
|
|
||||||
ignore_discard: Disable discard support.
|
ignore_discard:
|
||||||
|
Disable discard support.
|
||||||
|
|
||||||
no_discard_passdown: Don't pass discards down to the underlying
|
no_discard_passdown:
|
||||||
data device, but just remove the mapping.
|
Don't pass discards down to the underlying
|
||||||
|
data device, but just remove the mapping.
|
||||||
|
|
||||||
read_only: Don't allow any changes to be made to the pool
|
read_only:
|
||||||
|
Don't allow any changes to be made to the pool
|
||||||
metadata. This mode is only available after the
|
metadata. This mode is only available after the
|
||||||
thin-pool has been created and first used in full
|
thin-pool has been created and first used in full
|
||||||
read/write mode. It cannot be specified on initial
|
read/write mode. It cannot be specified on initial
|
||||||
thin-pool creation.
|
thin-pool creation.
|
||||||
|
|
||||||
error_if_no_space: Error IOs, instead of queueing, if no space.
|
error_if_no_space:
|
||||||
|
Error IOs, instead of queueing, if no space.
|
||||||
|
|
||||||
Data block size must be between 64KB (128 sectors) and 1GB
|
Data block size must be between 64KB (128 sectors) and 1GB
|
||||||
(2097152 sectors) inclusive.
|
(2097152 sectors) inclusive.
|
||||||
|
@ -278,10 +297,12 @@ i) Constructor
|
||||||
|
|
||||||
ii) Status
|
ii) Status
|
||||||
|
|
||||||
<transaction id> <used metadata blocks>/<total metadata blocks>
|
::
|
||||||
<used data blocks>/<total data blocks> <held metadata root>
|
|
||||||
ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
|
<transaction id> <used metadata blocks>/<total metadata blocks>
|
||||||
needs_check|- metadata_low_watermark
|
<used data blocks>/<total data blocks> <held metadata root>
|
||||||
|
ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
|
||||||
|
needs_check|- metadata_low_watermark
|
||||||
|
|
||||||
transaction id:
|
transaction id:
|
||||||
A 64-bit number used by userspace to help synchronise with metadata
|
A 64-bit number used by userspace to help synchronise with metadata
|
||||||
|
@ -336,13 +357,11 @@ ii) Status
|
||||||
iii) Messages
|
iii) Messages
|
||||||
|
|
||||||
create_thin <dev id>
|
create_thin <dev id>
|
||||||
|
|
||||||
Create a new thinly-provisioned device.
|
Create a new thinly-provisioned device.
|
||||||
<dev id> is an arbitrary unique 24-bit identifier chosen by
|
<dev id> is an arbitrary unique 24-bit identifier chosen by
|
||||||
the caller.
|
the caller.
|
||||||
|
|
||||||
create_snap <dev id> <origin id>
|
create_snap <dev id> <origin id>
|
||||||
|
|
||||||
Create a new snapshot of another thinly-provisioned device.
|
Create a new snapshot of another thinly-provisioned device.
|
||||||
<dev id> is an arbitrary unique 24-bit identifier chosen by
|
<dev id> is an arbitrary unique 24-bit identifier chosen by
|
||||||
the caller.
|
the caller.
|
||||||
|
@ -350,11 +369,9 @@ iii) Messages
|
||||||
of which the new device will be a snapshot.
|
of which the new device will be a snapshot.
|
||||||
|
|
||||||
delete <dev id>
|
delete <dev id>
|
||||||
|
|
||||||
Deletes a thin device. Irreversible.
|
Deletes a thin device. Irreversible.
|
||||||
|
|
||||||
set_transaction_id <current id> <new id>
|
set_transaction_id <current id> <new id>
|
||||||
|
|
||||||
Userland volume managers, such as LVM, need a way to
|
Userland volume managers, such as LVM, need a way to
|
||||||
synchronise their external metadata with the internal metadata of the
|
synchronise their external metadata with the internal metadata of the
|
||||||
pool target. The thin-pool target offers to store an
|
pool target. The thin-pool target offers to store an
|
||||||
|
@ -364,14 +381,12 @@ iii) Messages
|
||||||
compare-and-swap message.
|
compare-and-swap message.
|
||||||
|
|
||||||
reserve_metadata_snap
|
reserve_metadata_snap
|
||||||
|
|
||||||
Reserve a copy of the data mapping btree for use by userland.
|
Reserve a copy of the data mapping btree for use by userland.
|
||||||
This allows userland to inspect the mappings as they were when
|
This allows userland to inspect the mappings as they were when
|
||||||
this message was executed. Use the pool's status command to
|
this message was executed. Use the pool's status command to
|
||||||
get the root block associated with the metadata snapshot.
|
get the root block associated with the metadata snapshot.
|
||||||
|
|
||||||
release_metadata_snap
|
release_metadata_snap
|
||||||
|
|
||||||
Release a previously reserved copy of the data mapping btree.
|
Release a previously reserved copy of the data mapping btree.
|
||||||
|
|
||||||
'thin' target
|
'thin' target
|
||||||
|
@ -379,7 +394,9 @@ iii) Messages
|
||||||
|
|
||||||
i) Constructor
|
i) Constructor
|
||||||
|
|
||||||
thin <pool dev> <dev id> [<external origin dev>]
|
::
|
||||||
|
|
||||||
|
thin <pool dev> <dev id> [<external origin dev>]
|
||||||
|
|
||||||
pool dev:
|
pool dev:
|
||||||
the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
|
the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
|
||||||
|
@ -401,8 +418,7 @@ provisioned as and when needed.
|
||||||
|
|
||||||
ii) Status
|
ii) Status
|
||||||
|
|
||||||
<nr mapped sectors> <highest mapped sector>
|
<nr mapped sectors> <highest mapped sector>
|
||||||
|
|
||||||
If the pool has encountered device errors and failed, the status
|
If the pool has encountered device errors and failed, the status
|
||||||
will just contain the string 'Fail'. The userspace recovery
|
will just contain the string 'Fail'. The userspace recovery
|
||||||
tools should then be used.
|
tools should then be used.
|
|
@ -1,3 +1,7 @@
|
||||||
|
================================
|
||||||
|
Device-mapper "unstriped" target
|
||||||
|
================================
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
|
@ -34,46 +38,46 @@ striped target to combine the 4 devices into one. It then will use
|
||||||
the unstriped target ontop of the striped device to access the
|
the unstriped target ontop of the striped device to access the
|
||||||
individual backing loop devices. We write data to the newly exposed
|
individual backing loop devices. We write data to the newly exposed
|
||||||
unstriped devices and verify the data written matches the correct
|
unstriped devices and verify the data written matches the correct
|
||||||
underlying device on the striped array.
|
underlying device on the striped array::
|
||||||
|
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
MEMBER_SIZE=$((128 * 1024 * 1024))
|
MEMBER_SIZE=$((128 * 1024 * 1024))
|
||||||
NUM=4
|
NUM=4
|
||||||
SEQ_END=$((${NUM}-1))
|
SEQ_END=$((${NUM}-1))
|
||||||
CHUNK=256
|
CHUNK=256
|
||||||
BS=4096
|
BS=4096
|
||||||
|
|
||||||
RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
|
RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
|
||||||
DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
|
DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
|
||||||
COUNT=$((${MEMBER_SIZE} / ${BS}))
|
COUNT=$((${MEMBER_SIZE} / ${BS}))
|
||||||
|
|
||||||
for i in $(seq 0 ${SEQ_END}); do
|
for i in $(seq 0 ${SEQ_END}); do
|
||||||
dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
|
dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
|
||||||
losetup /dev/loop${i} member-${i}
|
losetup /dev/loop${i} member-${i}
|
||||||
DM_PARMS+=" /dev/loop${i} 0"
|
DM_PARMS+=" /dev/loop${i} 0"
|
||||||
done
|
done
|
||||||
|
|
||||||
echo $DM_PARMS | dmsetup create raid0
|
echo $DM_PARMS | dmsetup create raid0
|
||||||
for i in $(seq 0 ${SEQ_END}); do
|
for i in $(seq 0 ${SEQ_END}); do
|
||||||
echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
|
echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
|
||||||
done;
|
done;
|
||||||
|
|
||||||
for i in $(seq 0 ${SEQ_END}); do
|
for i in $(seq 0 ${SEQ_END}); do
|
||||||
dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
|
dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
|
||||||
diff /dev/mapper/set-${i} member-${i}
|
diff /dev/mapper/set-${i} member-${i}
|
||||||
done;
|
done;
|
||||||
|
|
||||||
for i in $(seq 0 ${SEQ_END}); do
|
for i in $(seq 0 ${SEQ_END}); do
|
||||||
dmsetup remove set-${i}
|
dmsetup remove set-${i}
|
||||||
done
|
done
|
||||||
|
|
||||||
dmsetup remove raid0
|
dmsetup remove raid0
|
||||||
|
|
||||||
for i in $(seq 0 ${SEQ_END}); do
|
for i in $(seq 0 ${SEQ_END}); do
|
||||||
losetup -d /dev/loop${i}
|
losetup -d /dev/loop${i}
|
||||||
rm -f member-${i}
|
rm -f member-${i}
|
||||||
done
|
done
|
||||||
|
|
||||||
Another example
|
Another example
|
||||||
---------------
|
---------------
|
||||||
|
@ -81,7 +85,7 @@ Another example
|
||||||
Intel NVMe drives contain two cores on the physical device.
|
Intel NVMe drives contain two cores on the physical device.
|
||||||
Each core of the drive has segregated access to its LBA range.
|
Each core of the drive has segregated access to its LBA range.
|
||||||
The current LBA model has a RAID 0 128k chunk on each core, resulting
|
The current LBA model has a RAID 0 128k chunk on each core, resulting
|
||||||
in a 256k stripe across the two cores:
|
in a 256k stripe across the two cores::
|
||||||
|
|
||||||
Core 0: Core 1:
|
Core 0: Core 1:
|
||||||
__________ __________
|
__________ __________
|
||||||
|
@ -108,17 +112,24 @@ Example dmsetup usage
|
||||||
|
|
||||||
unstriped ontop of Intel NVMe device that has 2 cores
|
unstriped ontop of Intel NVMe device that has 2 cores
|
||||||
-----------------------------------------------------
|
-----------------------------------------------------
|
||||||
dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
|
|
||||||
dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
|
::
|
||||||
|
|
||||||
|
dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
|
||||||
|
dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
|
||||||
|
|
||||||
There will now be two devices that expose Intel NVMe core 0 and 1
|
There will now be two devices that expose Intel NVMe core 0 and 1
|
||||||
respectively:
|
respectively::
|
||||||
/dev/mapper/nvmset0
|
|
||||||
/dev/mapper/nvmset1
|
/dev/mapper/nvmset0
|
||||||
|
/dev/mapper/nvmset1
|
||||||
|
|
||||||
unstriped ontop of striped with 4 drives using 128K chunk size
|
unstriped ontop of striped with 4 drives using 128K chunk size
|
||||||
--------------------------------------------------------------
|
--------------------------------------------------------------
|
||||||
dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
|
|
||||||
dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
|
::
|
||||||
dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
|
|
||||||
dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
|
dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
|
||||||
|
dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
|
||||||
|
dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
|
||||||
|
dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
|
|
@ -1,5 +1,6 @@
|
||||||
|
=========
|
||||||
dm-verity
|
dm-verity
|
||||||
==========
|
=========
|
||||||
|
|
||||||
Device-Mapper's "verity" target provides transparent integrity checking of
|
Device-Mapper's "verity" target provides transparent integrity checking of
|
||||||
block devices using a cryptographic digest provided by the kernel crypto API.
|
block devices using a cryptographic digest provided by the kernel crypto API.
|
||||||
|
@ -7,6 +8,9 @@ This target is read-only.
|
||||||
|
|
||||||
Construction Parameters
|
Construction Parameters
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
<version> <dev> <hash_dev>
|
<version> <dev> <hash_dev>
|
||||||
<data_block_size> <hash_block_size>
|
<data_block_size> <hash_block_size>
|
||||||
<num_data_blocks> <hash_start_block>
|
<num_data_blocks> <hash_start_block>
|
||||||
|
@ -160,7 +164,9 @@ calculating the parent node.
|
||||||
|
|
||||||
The tree looks something like:
|
The tree looks something like:
|
||||||
|
|
||||||
alg = sha256, num_blocks = 32768, block_size = 4096
|
alg = sha256, num_blocks = 32768, block_size = 4096
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
[ root ]
|
[ root ]
|
||||||
/ . . . \
|
/ . . . \
|
||||||
|
@ -189,6 +195,7 @@ block boundary) are the hash blocks which are stored a depth at a time
|
||||||
|
|
||||||
The full specification of kernel parameters and on-disk metadata format
|
The full specification of kernel parameters and on-disk metadata format
|
||||||
is available at the cryptsetup project's wiki page
|
is available at the cryptsetup project's wiki page
|
||||||
|
|
||||||
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity
|
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity
|
||||||
|
|
||||||
Status
|
Status
|
||||||
|
@ -198,7 +205,8 @@ If any check failed, C (for Corruption) is returned.
|
||||||
|
|
||||||
Example
|
Example
|
||||||
=======
|
=======
|
||||||
Set up a device:
|
Set up a device::
|
||||||
|
|
||||||
# dmsetup create vroot --readonly --table \
|
# dmsetup create vroot --readonly --table \
|
||||||
"0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
|
"0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
|
||||||
"4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
|
"4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
|
||||||
|
@ -209,11 +217,13 @@ the hash tree or activate the kernel device. This is available from
|
||||||
the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
|
the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
|
||||||
(as a libcryptsetup extension).
|
(as a libcryptsetup extension).
|
||||||
|
|
||||||
Create hash on the device:
|
Create hash on the device::
|
||||||
|
|
||||||
# veritysetup format /dev/sda1 /dev/sda2
|
# veritysetup format /dev/sda1 /dev/sda2
|
||||||
...
|
...
|
||||||
Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
|
Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
|
||||||
|
|
||||||
Activate the device:
|
Activate the device::
|
||||||
|
|
||||||
# veritysetup create vroot /dev/sda1 /dev/sda2 \
|
# veritysetup create vroot /dev/sda1 /dev/sda2 \
|
||||||
4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
|
4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
|
|
@ -1,3 +1,7 @@
|
||||||
|
=================
|
||||||
|
Writecache target
|
||||||
|
=================
|
||||||
|
|
||||||
The writecache target caches writes on persistent memory or on SSD. It
|
The writecache target caches writes on persistent memory or on SSD. It
|
||||||
doesn't cache reads because reads are supposed to be cached in page cache
|
doesn't cache reads because reads are supposed to be cached in page cache
|
||||||
in normal RAM.
|
in normal RAM.
|
||||||
|
@ -6,15 +10,18 @@ When the device is constructed, the first sector should be zeroed or the
|
||||||
first sector should contain valid superblock from previous invocation.
|
first sector should contain valid superblock from previous invocation.
|
||||||
|
|
||||||
Constructor parameters:
|
Constructor parameters:
|
||||||
|
|
||||||
1. type of the cache device - "p" or "s"
|
1. type of the cache device - "p" or "s"
|
||||||
p - persistent memory
|
|
||||||
s - SSD
|
- p - persistent memory
|
||||||
|
- s - SSD
|
||||||
2. the underlying device that will be cached
|
2. the underlying device that will be cached
|
||||||
3. the cache device
|
3. the cache device
|
||||||
4. block size (4096 is recommended; the maximum block size is the page
|
4. block size (4096 is recommended; the maximum block size is the page
|
||||||
size)
|
size)
|
||||||
5. the number of optional parameters (the parameters with an argument
|
5. the number of optional parameters (the parameters with an argument
|
||||||
count as two)
|
count as two)
|
||||||
|
|
||||||
start_sector n (default: 0)
|
start_sector n (default: 0)
|
||||||
offset from the start of cache device in 512-byte sectors
|
offset from the start of cache device in 512-byte sectors
|
||||||
high_watermark n (default: 50)
|
high_watermark n (default: 50)
|
||||||
|
@ -43,6 +50,7 @@ Constructor parameters:
|
||||||
applicable only to persistent memory - don't use the FUA
|
applicable only to persistent memory - don't use the FUA
|
||||||
flag when writing back data and send the FLUSH request
|
flag when writing back data and send the FLUSH request
|
||||||
afterwards
|
afterwards
|
||||||
|
|
||||||
- some underlying devices perform better with fua, some
|
- some underlying devices perform better with fua, some
|
||||||
with nofua. The user should test it
|
with nofua. The user should test it
|
||||||
|
|
||||||
|
@ -60,6 +68,7 @@ Messages:
|
||||||
flush the cache device on next suspend. Use this message
|
flush the cache device on next suspend. Use this message
|
||||||
when you are going to remove the cache device. The proper
|
when you are going to remove the cache device. The proper
|
||||||
sequence for removing the cache device is:
|
sequence for removing the cache device is:
|
||||||
|
|
||||||
1. send the "flush_on_suspend" message
|
1. send the "flush_on_suspend" message
|
||||||
2. load an inactive table with a linear target that maps
|
2. load an inactive table with a linear target that maps
|
||||||
to the underlying device
|
to the underlying device
|
|
@ -1,3 +1,4 @@
|
||||||
|
=======
|
||||||
dm-zero
|
dm-zero
|
||||||
=======
|
=======
|
||||||
|
|
||||||
|
@ -18,20 +19,19 @@ filesystem limitations.
|
||||||
|
|
||||||
To create a sparse device, start by creating a dm-zero device that's the
|
To create a sparse device, start by creating a dm-zero device that's the
|
||||||
desired size of the sparse device. For this example, we'll assume a 10TB
|
desired size of the sparse device. For this example, we'll assume a 10TB
|
||||||
sparse device.
|
sparse device::
|
||||||
|
|
||||||
TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2` # 10 TB in sectors
|
TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2` # 10 TB in sectors
|
||||||
echo "0 $TEN_TERABYTES zero" | dmsetup create zero1
|
echo "0 $TEN_TERABYTES zero" | dmsetup create zero1
|
||||||
|
|
||||||
Then create a snapshot of the zero device, using any available block-device as
|
Then create a snapshot of the zero device, using any available block-device as
|
||||||
the COW device. The size of the COW device will determine the amount of real
|
the COW device. The size of the COW device will determine the amount of real
|
||||||
space available to the sparse device. For this example, we'll assume /dev/sdb1
|
space available to the sparse device. For this example, we'll assume /dev/sdb1
|
||||||
is an available 10GB partition.
|
is an available 10GB partition::
|
||||||
|
|
||||||
echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \
|
echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \
|
||||||
dmsetup create sparse1
|
dmsetup create sparse1
|
||||||
|
|
||||||
This will create a 10TB sparse device called /dev/mapper/sparse1 that has
|
This will create a 10TB sparse device called /dev/mapper/sparse1 that has
|
||||||
10GB of actual storage space available. If more than 10GB of data is written
|
10GB of actual storage space available. If more than 10GB of data is written
|
||||||
to this device, it will start returning I/O errors.
|
to this device, it will start returning I/O errors.
|
||||||
|
|
|
@ -16,8 +16,8 @@ Required properties:
|
||||||
In this case, the ENETC node should include a "mdio" sub-node
|
In this case, the ENETC node should include a "mdio" sub-node
|
||||||
that in turn should contain the "ethernet-phy" node describing the
|
that in turn should contain the "ethernet-phy" node describing the
|
||||||
external phy. Below properties are required, their bindings
|
external phy. Below properties are required, their bindings
|
||||||
already defined in ethernet.txt or phy.txt, under
|
already defined in Documentation/devicetree/bindings/net/ethernet.txt or
|
||||||
Documentation/devicetree/bindings/net/*.
|
Documentation/devicetree/bindings/net/phy.txt.
|
||||||
|
|
||||||
Required:
|
Required:
|
||||||
|
|
||||||
|
@ -51,8 +51,7 @@ Example:
|
||||||
connection:
|
connection:
|
||||||
|
|
||||||
In this case, the ENETC port node defines a fixed link connection,
|
In this case, the ENETC port node defines a fixed link connection,
|
||||||
as specified by "fixed-link.txt", under
|
as specified by Documentation/devicetree/bindings/net/fixed-link.txt.
|
||||||
Documentation/devicetree/bindings/net/*.
|
|
||||||
|
|
||||||
Required:
|
Required:
|
||||||
|
|
||||||
|
|
|
@ -3,7 +3,7 @@ Amlogic Meson AXG DWC PCIE SoC controller
|
||||||
Amlogic Meson PCIe host controller is based on the Synopsys DesignWare PCI core.
|
Amlogic Meson PCIe host controller is based on the Synopsys DesignWare PCI core.
|
||||||
It shares common functions with the PCIe DesignWare core driver and
|
It shares common functions with the PCIe DesignWare core driver and
|
||||||
inherits common properties defined in
|
inherits common properties defined in
|
||||||
Documentation/devicetree/bindings/pci/designware-pci.txt.
|
Documentation/devicetree/bindings/pci/designware-pcie.txt.
|
||||||
|
|
||||||
Additional properties are described here:
|
Additional properties are described here:
|
||||||
|
|
||||||
|
|
|
@ -97,7 +97,7 @@ Second Level Nodes - Regulators
|
||||||
sent for this regulator including those which are for a
|
sent for this regulator including those which are for a
|
||||||
strictly lower power state.
|
strictly lower power state.
|
||||||
|
|
||||||
Other properties defined in Documentation/devicetree/bindings/regulator.txt
|
Other properties defined in Documentation/devicetree/bindings/regulator/regulator.txt
|
||||||
may also be used. regulator-initial-mode and regulator-allowed-modes may be
|
may also be used. regulator-initial-mode and regulator-allowed-modes may be
|
||||||
specified for VRM regulators using mode values from
|
specified for VRM regulators using mode values from
|
||||||
include/dt-bindings/regulator/qcom,rpmh-regulator.h. regulator-allow-bypass
|
include/dt-bindings/regulator/qcom,rpmh-regulator.h. regulator-allow-bypass
|
||||||
|
|
|
@ -277,7 +277,7 @@ it with special cases.
|
||||||
the decompressor (the real mode entry point goes to the same 32bit
|
the decompressor (the real mode entry point goes to the same 32bit
|
||||||
entry point once it switched into protected mode). That entry point
|
entry point once it switched into protected mode). That entry point
|
||||||
supports one calling convention which is documented in
|
supports one calling convention which is documented in
|
||||||
Documentation/x86/boot.txt
|
Documentation/x86/boot.rst
|
||||||
The physical pointer to the device-tree block (defined in chapter II)
|
The physical pointer to the device-tree block (defined in chapter II)
|
||||||
is passed via setup_data which requires at least boot protocol 2.09.
|
is passed via setup_data which requires at least boot protocol 2.09.
|
||||||
The type filed is defined as
|
The type filed is defined as
|
||||||
|
|
|
@ -359,7 +359,7 @@ Domain`_ references.
|
||||||
``monospaced font``.
|
``monospaced font``.
|
||||||
|
|
||||||
Useful if you need to use special characters that would otherwise have some
|
Useful if you need to use special characters that would otherwise have some
|
||||||
meaning either by kernel-doc script of by reStructuredText.
|
meaning either by kernel-doc script or by reStructuredText.
|
||||||
|
|
||||||
This is particularly useful if you need to use things like ``%ph`` inside
|
This is particularly useful if you need to use things like ``%ph`` inside
|
||||||
a function description.
|
a function description.
|
||||||
|
|
|
@ -27,8 +27,7 @@ Sphinx Install
|
||||||
==============
|
==============
|
||||||
|
|
||||||
The ReST markups currently used by the Documentation/ files are meant to be
|
The ReST markups currently used by the Documentation/ files are meant to be
|
||||||
built with ``Sphinx`` version 1.3 or higher. If you desire to build
|
built with ``Sphinx`` version 1.3 or higher.
|
||||||
PDF output, it is recommended to use version 1.4.6 or higher.
|
|
||||||
|
|
||||||
There's a script that checks for the Sphinx requirements. Please see
|
There's a script that checks for the Sphinx requirements. Please see
|
||||||
:ref:`sphinx-pre-install` for further details.
|
:ref:`sphinx-pre-install` for further details.
|
||||||
|
@ -56,13 +55,13 @@ or ``virtualenv``, depending on how your distribution packaged Python 3.
|
||||||
those expressions are written using LaTeX notation. It needs texlive
|
those expressions are written using LaTeX notation. It needs texlive
|
||||||
installed with amdfonts and amsmath in order to evaluate them.
|
installed with amdfonts and amsmath in order to evaluate them.
|
||||||
|
|
||||||
In summary, if you want to install Sphinx version 1.4.9, you should do::
|
In summary, if you want to install Sphinx version 1.7.9, you should do::
|
||||||
|
|
||||||
$ virtualenv sphinx_1.4
|
$ virtualenv sphinx_1.7.9
|
||||||
$ . sphinx_1.4/bin/activate
|
$ . sphinx_1.7.9/bin/activate
|
||||||
(sphinx_1.4) $ pip install -r Documentation/sphinx/requirements.txt
|
(sphinx_1.7.9) $ pip install -r Documentation/sphinx/requirements.txt
|
||||||
|
|
||||||
After running ``. sphinx_1.4/bin/activate``, the prompt will change,
|
After running ``. sphinx_1.7.9/bin/activate``, the prompt will change,
|
||||||
in order to indicate that you're using the new environment. If you
|
in order to indicate that you're using the new environment. If you
|
||||||
open a new shell, you need to rerun this command to enter again at
|
open a new shell, you need to rerun this command to enter again at
|
||||||
the virtual environment before building the documentation.
|
the virtual environment before building the documentation.
|
||||||
|
@ -105,8 +104,8 @@ command line options for your distro::
|
||||||
You should run:
|
You should run:
|
||||||
|
|
||||||
sudo dnf install -y texlive-luatex85
|
sudo dnf install -y texlive-luatex85
|
||||||
/usr/bin/virtualenv sphinx_1.4
|
/usr/bin/virtualenv sphinx_1.7.9
|
||||||
. sphinx_1.4/bin/activate
|
. sphinx_1.7.9/bin/activate
|
||||||
pip install -r Documentation/sphinx/requirements.txt
|
pip install -r Documentation/sphinx/requirements.txt
|
||||||
|
|
||||||
Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
|
Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
|
||||||
|
@ -218,7 +217,7 @@ Here are some specific guidelines for the kernel documentation:
|
||||||
examples, etc.), use ``::`` for anything that doesn't really benefit
|
examples, etc.), use ``::`` for anything that doesn't really benefit
|
||||||
from syntax highlighting, especially short snippets. Use
|
from syntax highlighting, especially short snippets. Use
|
||||||
``.. code-block:: <language>`` for longer code blocks that benefit
|
``.. code-block:: <language>`` for longer code blocks that benefit
|
||||||
from highlighting.
|
from highlighting. For a short snippet of code embedded in the text, use \`\`.
|
||||||
|
|
||||||
|
|
||||||
the C domain
|
the C domain
|
||||||
|
@ -242,11 +241,14 @@ The C domain of the kernel-doc has some additional features. E.g. you can
|
||||||
|
|
||||||
The func-name (e.g. ioctl) remains in the output but the ref-name changed from
|
The func-name (e.g. ioctl) remains in the output but the ref-name changed from
|
||||||
``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
|
``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
|
||||||
changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by:
|
changed to ``VIDIOC_LOG_STATUS``.
|
||||||
|
|
||||||
.. code-block:: rst
|
Please note that there is no need to use ``c:func:`` to generate cross
|
||||||
|
references to function documentation. Due to some Sphinx extension magic,
|
||||||
:c:func:`VIDIOC_LOG_STATUS`
|
the documentation build system will automatically turn a reference to
|
||||||
|
``function()`` into a cross reference if an index entry for the given
|
||||||
|
function name exists. If you see ``c:func:`` use in a kernel document,
|
||||||
|
please feel free to remove it.
|
||||||
|
|
||||||
|
|
||||||
list tables
|
list tables
|
||||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue