There has been a fair amount of activity in the docs tree this time

around.  Highlights include:
 
  - Conversion of a bunch of security documentation into RST
 
  - The conversion of the remaining DocBook templates by The Amazing
    Mauro Machine.  We can now drop the entire DocBook build chain.
 
  - The usual collection of fixes and minor updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZWkGAAAoJEI3ONVYwIuV6rf0P/0B3JTiVPKS/WUx53+jzbAi4
 1BN7dmmuMxE1bWpgdEq+ac4aKxm07iAojuntuMj0qz/ZB1WARcmvEqqzI5i4wfq9
 5MrLduLkyuWfr4MOPseKJ2VK83p8nkMOiO7jmnBsilu7fE4nF+5YY9j4cVaArfMy
 cCQvAGjQzvej2eiWMGUSLHn4QFKh00aD7cwKyBVsJ08b27C9xL0J2LQyCDZ4yDgf
 37/MH3puEd3HX/4qAwLonIxT3xrIrrbDturqLU7OSKcWTtGZNrYyTFbwR3RQtqWd
 H8YZVg2Uyhzg9MYhkbQ2E5dEjUP4mkegcp6/JTINH++OOPpTbdTJgirTx7VTkSf1
 +kL8t7+Ayxd0FH3+77GJ5RMj8LUK6rj5cZfU5nClFQKWXP9UL3IelQ3Nl+SpdM8v
 ZAbR2KjKgH9KS6+cbIhgFYlvY+JgPkOVruwbIAc7wXVM3ibk1sWoBOFEujcbueWh
 yDpQv3l1UX0CKr3jnevJoW26LtEbGFtC7gSKZ+3btyeSBpWFGlii42KNycEGwUW0
 ezlwryDVHzyTUiKllNmkdK4v73mvPsZHEjgmme4afKAIiUilmcUF4XcqD86hISFT
 t+UJLA/zEU+0sJe26o2nK6GNJzmo4oCtVyxfhRe26Ojs1n80xlYgnZRfuIYdd31Z
 nwLBnwDCHAOyX91WXp9G
 =cVjZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.13' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "There has been a fair amount of activity in the docs tree this time
  around. Highlights include:

   - Conversion of a bunch of security documentation into RST

   - The conversion of the remaining DocBook templates by The Amazing
     Mauro Machine. We can now drop the entire DocBook build chain.

   - The usual collection of fixes and minor updates"

* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
  scripts/kernel-doc: handle DECLARE_HASHTABLE
  Documentation: atomic_ops.txt is core-api/atomic_ops.rst
  Docs: clean up some DocBook loose ends
  Make the main documentation title less Geocities
  Docs: Use kernel-figure in vidioc-g-selection.rst
  Docs: fix table problems in ras.rst
  Docs: Fix breakage with Sphinx 1.5 and upper
  Docs: Include the Latex "ifthen" package
  doc/kokr/howto: Only send regression fixes after -rc1
  docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
  doc: Document suitability of IBM Verse for kernel development
  Doc: fix a markup error in coding-style.rst
  docs: driver-api: i2c: remove some outdated information
  Documentation: DMA API: fix a typo in a function name
  Docs: Insert missing space to separate link from text
  doc/ko_KR/memory-barriers: Update control-dependencies example
  Documentation, kbuild: fix typo "minimun" -> "minimum"
  docs: Fix some formatting issues in request-key.rst
  doc: ReSTify keys-trusted-encrypted.txt
  doc: ReSTify keys-request-key.txt
  ...
This commit is contained in:
Linus Torvalds 2017-07-03 21:13:25 -07:00
commit 650fc870a2
181 changed files with 8404 additions and 12196 deletions

View File

@ -24,8 +24,6 @@ DMA-ISA-LPC.txt
- How to do DMA with ISA (and LPC) devices.
DMA-attributes.txt
- listing of the various possible attributes a DMA region can have
DocBook/
- directory with DocBook templates etc. for kernel documentation.
EDID/
- directory with info on customizing EDID for broken gfx/displays.
IPMI.txt
@ -40,8 +38,6 @@ Intel-IOMMU.txt
- basic info on the Intel IOMMU virtualization support.
Makefile
- It's not of interest for those who aren't touching the build system.
Makefile.sphinx
- It's not of interest for those who aren't touching the build system.
PCI/
- info related to PCI drivers.
RCU/
@ -264,6 +260,8 @@ logo.gif
- full colour GIF image of Linux logo (penguin - Tux).
logo.txt
- info on creator of above logo & site to get additional images from.
lsm.txt
- Linux Security Modules: General Security Hooks for Linux
lzo.txt
- kernel LZO decompressor input formats
m68k/

View File

@ -692,7 +692,7 @@ of preallocated entries is defined per architecture. If it is too low for you
boot with 'dma_debug_entries=<your_desired_number>' to overwrite the
architectural default.
void debug_dmap_mapping_error(struct device *dev, dma_addr_t dma_addr);
void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check DMA mapping errors on addresses returned by dma_map_single() and

View File

@ -1,17 +0,0 @@
*.xml
*.ps
*.pdf
*.html
*.9.gz
*.9
*.aux
*.dvi
*.log
*.out
*.png
*.gif
*.svg
*.proc
*.db
media-indices.tmpl
media-entities.tmpl

View File

@ -1,282 +0,0 @@
###
# This makefile is used to generate the kernel documentation,
# primarily based on in-line comments in various source files.
# See Documentation/kernel-doc-nano-HOWTO.txt for instruction in how
# to document the SRC - and how to read it.
# To add a new book the only step required is to add the book to the
# list of DOCBOOKS.
DOCBOOKS := z8530book.xml \
kernel-hacking.xml kernel-locking.xml \
networking.xml \
filesystems.xml lsm.xml kgdb.xml \
libata.xml mtdnand.xml librs.xml rapidio.xml \
s390-drivers.xml scsi.xml \
sh.xml w1.xml
ifeq ($(DOCBOOKS),)
# Skip DocBook build if the user explicitly requested no DOCBOOKS.
.DEFAULT:
@echo " SKIP DocBook $@ target (DOCBOOKS=\"\" specified)."
else
ifneq ($(SPHINXDIRS),)
# Skip DocBook build if the user explicitly requested a sphinx dir
.DEFAULT:
@echo " SKIP DocBook $@ target (SPHINXDIRS specified)."
else
###
# The build process is as follows (targets):
# (xmldocs) [by docproc]
# file.tmpl --> file.xml +--> file.ps (psdocs) [by db2ps or xmlto]
# +--> file.pdf (pdfdocs) [by db2pdf or xmlto]
# +--> DIR=file (htmldocs) [by xmlto]
# +--> man/ (mandocs) [by xmlto]
# for PDF and PS output you can choose between xmlto and docbook-utils tools
PDF_METHOD = $(prefer-db2x)
PS_METHOD = $(prefer-db2x)
targets += $(DOCBOOKS)
BOOKS := $(addprefix $(obj)/,$(DOCBOOKS))
xmldocs: $(BOOKS)
sgmldocs: xmldocs
PS := $(patsubst %.xml, %.ps, $(BOOKS))
psdocs: $(PS)
PDF := $(patsubst %.xml, %.pdf, $(BOOKS))
pdfdocs: $(PDF)
HTML := $(sort $(patsubst %.xml, %.html, $(BOOKS)))
htmldocs: $(HTML)
$(call cmd,build_main_index)
MAN := $(patsubst %.xml, %.9, $(BOOKS))
mandocs: $(MAN)
find $(obj)/man -name '*.9' | xargs gzip -nf
# Default location for installed man pages
export INSTALL_MAN_PATH = $(objtree)/usr
installmandocs: mandocs
mkdir -p $(INSTALL_MAN_PATH)/man/man9/
find $(obj)/man -name '*.9.gz' -printf '%h %f\n' | \
sort -k 2 -k 1 | uniq -f 1 | sed -e 's: :/:' | \
xargs install -m 644 -t $(INSTALL_MAN_PATH)/man/man9/
# no-op for the DocBook toolchain
epubdocs:
latexdocs:
linkcheckdocs:
###
#External programs used
KERNELDOCXMLREF = $(srctree)/scripts/kernel-doc-xml-ref
KERNELDOC = $(srctree)/scripts/kernel-doc
DOCPROC = $(objtree)/scripts/docproc
CHECK_LC_CTYPE = $(objtree)/scripts/check-lc_ctype
# Use a fixed encoding - UTF-8 if the C library has support built-in
# or ASCII if not
LC_CTYPE := $(call try-run, LC_CTYPE=C.UTF-8 $(CHECK_LC_CTYPE),C.UTF-8,C)
export LC_CTYPE
XMLTOFLAGS = -m $(srctree)/$(src)/stylesheet.xsl
XMLTOFLAGS += --skip-validation
###
# DOCPROC is used for two purposes:
# 1) To generate a dependency list for a .tmpl file
# 2) To preprocess a .tmpl file and call kernel-doc with
# appropriate parameters.
# The following rules are used to generate the .xml documentation
# required to generate the final targets. (ps, pdf, html).
quiet_cmd_docproc = DOCPROC $@
cmd_docproc = SRCTREE=$(srctree)/ $(DOCPROC) doc $< >$@
define rule_docproc
set -e; \
$(if $($(quiet)cmd_$(1)),echo ' $($(quiet)cmd_$(1))';) \
$(cmd_$(1)); \
( \
echo 'cmd_$@ := $(cmd_$(1))'; \
echo $@: `SRCTREE=$(srctree) $(DOCPROC) depend $<`; \
) > $(dir $@).$(notdir $@).cmd
endef
%.xml: %.tmpl $(KERNELDOC) $(DOCPROC) $(KERNELDOCXMLREF) FORCE
$(call if_changed_rule,docproc)
# Tell kbuild to always build the programs
always := $(hostprogs-y)
notfoundtemplate = echo "*** You have to install docbook-utils or xmlto ***"; \
exit 1
db2xtemplate = db2TYPE -o $(dir $@) $<
xmltotemplate = xmlto TYPE $(XMLTOFLAGS) -o $(dir $@) $<
# determine which methods are available
ifeq ($(shell which db2ps >/dev/null 2>&1 && echo found),found)
use-db2x = db2x
prefer-db2x = db2x
else
use-db2x = notfound
prefer-db2x = $(use-xmlto)
endif
ifeq ($(shell which xmlto >/dev/null 2>&1 && echo found),found)
use-xmlto = xmlto
prefer-xmlto = xmlto
else
use-xmlto = notfound
prefer-xmlto = $(use-db2x)
endif
# the commands, generated from the chosen template
quiet_cmd_db2ps = PS $@
cmd_db2ps = $(subst TYPE,ps, $($(PS_METHOD)template))
%.ps : %.xml
$(call cmd,db2ps)
quiet_cmd_db2pdf = PDF $@
cmd_db2pdf = $(subst TYPE,pdf, $($(PDF_METHOD)template))
%.pdf : %.xml
$(call cmd,db2pdf)
index = index.html
main_idx = $(obj)/$(index)
quiet_cmd_build_main_index = HTML $(main_idx)
cmd_build_main_index = rm -rf $(main_idx); \
echo '<h1>Linux Kernel HTML Documentation</h1>' >> $(main_idx) && \
echo '<h2>Kernel Version: $(KERNELVERSION)</h2>' >> $(main_idx) && \
cat $(HTML) >> $(main_idx)
quiet_cmd_db2html = HTML $@
cmd_db2html = xmlto html $(XMLTOFLAGS) -o $(patsubst %.html,%,$@) $< && \
echo '<a HREF="$(patsubst %.html,%,$(notdir $@))/index.html"> \
$(patsubst %.html,%,$(notdir $@))</a><p>' > $@
###
# Rules to create an aux XML and .db, and use them to re-process the DocBook XML
# to fill internal hyperlinks
gen_aux_xml = :
quiet_gen_aux_xml = echo ' XMLREF $@'
silent_gen_aux_xml = :
%.aux.xml: %.xml
@$($(quiet)gen_aux_xml)
@rm -rf $@
@(cat $< | egrep "^<refentry id" | egrep -o "\".*\"" | cut -f 2 -d \" > $<.db)
@$(KERNELDOCXMLREF) -db $<.db $< > $@
.PRECIOUS: %.aux.xml
%.html: %.aux.xml
@(which xmlto > /dev/null 2>&1) || \
(echo "*** You need to install xmlto ***"; \
exit 1)
@rm -rf $@ $(patsubst %.html,%,$@)
$(call cmd,db2html)
@if [ ! -z "$(PNG-$(basename $(notdir $@)))" ]; then \
cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi
quiet_cmd_db2man = MAN $@
cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man/$(*F) $< ; fi
%.9 : %.xml
@(which xmlto > /dev/null 2>&1) || \
(echo "*** You need to install xmlto ***"; \
exit 1)
$(Q)mkdir -p $(obj)/man/$(*F)
$(call cmd,db2man)
@touch $@
###
# Rules to generate postscripts and PNG images from .fig format files
quiet_cmd_fig2eps = FIG2EPS $@
cmd_fig2eps = fig2dev -Leps $< $@
%.eps: %.fig
@(which fig2dev > /dev/null 2>&1) || \
(echo "*** You need to install transfig ***"; \
exit 1)
$(call cmd,fig2eps)
quiet_cmd_fig2png = FIG2PNG $@
cmd_fig2png = fig2dev -Lpng $< $@
%.png: %.fig
@(which fig2dev > /dev/null 2>&1) || \
(echo "*** You need to install transfig ***"; \
exit 1)
$(call cmd,fig2png)
###
# Rule to convert a .c file to inline XML documentation
gen_xml = :
quiet_gen_xml = echo ' GEN $@'
silent_gen_xml = :
%.xml: %.c
@$($(quiet)gen_xml)
@( \
echo "<programlisting>"; \
expand --tabs=8 < $< | \
sed -e "s/&/\\&amp;/g" \
-e "s/</\\&lt;/g" \
-e "s/>/\\&gt;/g"; \
echo "</programlisting>") > $@
endif # DOCBOOKS=""
endif # SPHINDIR=...
###
# Help targets as used by the top-level makefile
dochelp:
@echo ' Linux kernel internal documentation in different formats (DocBook):'
@echo ' htmldocs - HTML'
@echo ' pdfdocs - PDF'
@echo ' psdocs - Postscript'
@echo ' xmldocs - XML DocBook'
@echo ' mandocs - man pages'
@echo ' installmandocs - install man pages generated by mandocs to INSTALL_MAN_PATH'; \
echo ' (default: $(INSTALL_MAN_PATH))'; \
echo ''
@echo ' cleandocs - clean all generated DocBook files'
@echo
@echo ' make DOCBOOKS="s1.xml s2.xml" [target] Generate only docs s1.xml s2.xml'
@echo ' valid values for DOCBOOKS are: $(DOCBOOKS)'
@echo
@echo " make DOCBOOKS=\"\" [target] Don't generate docs from Docbook"
@echo ' This is useful to generate only the ReST docs (Sphinx)'
###
# Temporary files left by various tools
clean-files := $(DOCBOOKS) \
$(patsubst %.xml, %.dvi, $(DOCBOOKS)) \
$(patsubst %.xml, %.aux, $(DOCBOOKS)) \
$(patsubst %.xml, %.tex, $(DOCBOOKS)) \
$(patsubst %.xml, %.log, $(DOCBOOKS)) \
$(patsubst %.xml, %.out, $(DOCBOOKS)) \
$(patsubst %.xml, %.ps, $(DOCBOOKS)) \
$(patsubst %.xml, %.pdf, $(DOCBOOKS)) \
$(patsubst %.xml, %.html, $(DOCBOOKS)) \
$(patsubst %.xml, %.9, $(DOCBOOKS)) \
$(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
$(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \
$(patsubst %.xml, %.xml, $(DOCBOOKS)) \
$(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \
$(index)
clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
cleandocs:
$(Q)rm -f $(call objectify, $(clean-files))
$(Q)rm -rf $(call objectify, $(clean-dirs))
# Declare the contents of the .PHONY variable as phony. We keep that
# information in a variable so we can use it in if_changed and friends.
.PHONY: $(PHONY)

View File

@ -1,381 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Linux-filesystems-API">
<bookinfo>
<title>Linux Filesystems API</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="vfs">
<title>The Linux VFS</title>
<sect1 id="the_filesystem_types"><title>The Filesystem types</title>
!Iinclude/linux/fs.h
</sect1>
<sect1 id="the_directory_cache"><title>The Directory Cache</title>
!Efs/dcache.c
!Iinclude/linux/dcache.h
</sect1>
<sect1 id="inode_handling"><title>Inode Handling</title>
!Efs/inode.c
!Efs/bad_inode.c
</sect1>
<sect1 id="registration_and_superblocks"><title>Registration and Superblocks</title>
!Efs/super.c
</sect1>
<sect1 id="file_locks"><title>File Locks</title>
!Efs/locks.c
!Ifs/locks.c
</sect1>
<sect1 id="other_functions"><title>Other Functions</title>
!Efs/mpage.c
!Efs/namei.c
!Efs/buffer.c
!Eblock/bio.c
!Efs/seq_file.c
!Efs/filesystems.c
!Efs/fs-writeback.c
!Efs/block_dev.c
</sect1>
</chapter>
<chapter id="proc">
<title>The proc filesystem</title>
<sect1 id="sysctl_interface"><title>sysctl interface</title>
!Ekernel/sysctl.c
</sect1>
<sect1 id="proc_filesystem_interface"><title>proc filesystem interface</title>
!Ifs/proc/base.c
</sect1>
</chapter>
<chapter id="fs_events">
<title>Events based on file descriptors</title>
!Efs/eventfd.c
</chapter>
<chapter id="sysfs">
<title>The Filesystem for Exporting Kernel Objects</title>
!Efs/sysfs/file.c
!Efs/sysfs/symlink.c
</chapter>
<chapter id="debugfs">
<title>The debugfs filesystem</title>
<sect1 id="debugfs_interface"><title>debugfs interface</title>
!Efs/debugfs/inode.c
!Efs/debugfs/file.c
</sect1>
</chapter>
<chapter id="LinuxJDBAPI">
<chapterinfo>
<title>The Linux Journalling API</title>
<authorgroup>
<author>
<firstname>Roger</firstname>
<surname>Gammans</surname>
<affiliation>
<address>
<email>rgammans@computer-surgery.co.uk</email>
</address>
</affiliation>
</author>
</authorgroup>
<authorgroup>
<author>
<firstname>Stephen</firstname>
<surname>Tweedie</surname>
<affiliation>
<address>
<email>sct@redhat.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2002</year>
<holder>Roger Gammans</holder>
</copyright>
</chapterinfo>
<title>The Linux Journalling API</title>
<sect1 id="journaling_overview">
<title>Overview</title>
<sect2 id="journaling_details">
<title>Details</title>
<para>
The journalling layer is easy to use. You need to
first of all create a journal_t data structure. There are
two calls to do this dependent on how you decide to allocate the physical
media on which the journal resides. The jbd2_journal_init_inode() call
is for journals stored in filesystem inodes, or the jbd2_journal_init_dev()
call can be used for journal stored on a raw device (in a continuous range
of blocks). A journal_t is a typedef for a struct pointer, so when
you are finally finished make sure you call jbd2_journal_destroy() on it
to free up any used kernel memory.
</para>
<para>
Once you have got your journal_t object you need to 'mount' or load the journal
file. The journalling layer expects the space for the journal was already
allocated and initialized properly by the userspace tools. When loading the
journal you must call jbd2_journal_load() to process journal contents. If the
client file system detects the journal contents does not need to be processed
(or even need not have valid contents), it may call jbd2_journal_wipe() to
clear the journal contents before calling jbd2_journal_load().
</para>
<para>
Note that jbd2_journal_wipe(..,0) calls jbd2_journal_skip_recovery() for you if
it detects any outstanding transactions in the journal and similarly
jbd2_journal_load() will call jbd2_journal_recover() if necessary. I would
advise reading ext4_load_journal() in fs/ext4/super.c for examples on this
stage.
</para>
<para>
Now you can go ahead and start modifying the underlying
filesystem. Almost.
</para>
<para>
You still need to actually journal your filesystem changes, this
is done by wrapping them into transactions. Additionally you
also need to wrap the modification of each of the buffers
with calls to the journal layer, so it knows what the modifications
you are actually making are. To do this use jbd2_journal_start() which
returns a transaction handle.
</para>
<para>
jbd2_journal_start()
and its counterpart jbd2_journal_stop(), which indicates the end of a
transaction are nestable calls, so you can reenter a transaction if necessary,
but remember you must call jbd2_journal_stop() the same number of times as
jbd2_journal_start() before the transaction is completed (or more accurately
leaves the update phase). Ext4/VFS makes use of this feature to simplify
handling of inode dirtying, quota support, etc.
</para>
<para>
Inside each transaction you need to wrap the modifications to the
individual buffers (blocks). Before you start to modify a buffer you
need to call jbd2_journal_get_{create,write,undo}_access() as appropriate,
this allows the journalling layer to copy the unmodified data if it
needs to. After all the buffer may be part of a previously uncommitted
transaction.
At this point you are at last ready to modify a buffer, and once
you are have done so you need to call jbd2_journal_dirty_{meta,}data().
Or if you've asked for access to a buffer you now know is now longer
required to be pushed back on the device you can call jbd2_journal_forget()
in much the same way as you might have used bforget() in the past.
</para>
<para>
A jbd2_journal_flush() may be called at any time to commit and checkpoint
all your transactions.
</para>
<para>
Then at umount time , in your put_super() you can then call jbd2_journal_destroy()
to clean up your in-core journal object.
</para>
<para>
Unfortunately there a couple of ways the journal layer can cause a deadlock.
The first thing to note is that each task can only have
a single outstanding transaction at any one time, remember nothing
commits until the outermost jbd2_journal_stop(). This means
you must complete the transaction at the end of each file/inode/address
etc. operation you perform, so that the journalling system isn't re-entered
on another journal. Since transactions can't be nested/batched
across differing journals, and another filesystem other than
yours (say ext4) may be modified in a later syscall.
</para>
<para>
The second case to bear in mind is that jbd2_journal_start() can
block if there isn't enough space in the journal for your transaction
(based on the passed nblocks param) - when it blocks it merely(!) needs to
wait for transactions to complete and be committed from other tasks,
so essentially we are waiting for jbd2_journal_stop(). So to avoid
deadlocks you must treat jbd2_journal_start/stop() as if they
were semaphores and include them in your semaphore ordering rules to prevent
deadlocks. Note that jbd2_journal_extend() has similar blocking behaviour to
jbd2_journal_start() so you can deadlock here just as easily as on
jbd2_journal_start().
</para>
<para>
Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this transaction.
I advise having a look at at least ext4_jbd.h to see the basis on which
ext4 uses to make these decisions.
</para>
<para>
Another wriggle to watch out for is your on-disk block allocation strategy.
Why? Because, if you do a delete, you need to ensure you haven't reused any
of the freed blocks until the transaction freeing these blocks commits. If you
reused these blocks and crash happens, there is no way to restore the contents
of the reallocated blocks at the end of the last fully committed transaction.
One simple way of doing this is to mark blocks as free in internal in-memory
block allocation structures only after the transaction freeing them commits.
Ext4 uses journal commit callback for this purpose.
</para>
<para>
With journal commit callbacks you can ask the journalling layer to call a
callback function when the transaction is finally committed to disk, so that
you can do some of your own management. You ask the journalling layer for
calling the callback by simply setting journal->j_commit_callback function
pointer and that function is called after each transaction commit. You can also
use transaction->t_private_list for attaching entries to a transaction that
need processing when the transaction commits.
</para>
<para>
JBD2 also provides a way to block all transaction updates via
jbd2_journal_{un,}lock_updates(). Ext4 uses this when it wants a window with a
clean and stable fs for a moment. E.g.
</para>
<programlisting>
jbd2_journal_lock_updates() //stop new stuff happening..
jbd2_journal_flush() // checkpoint everything.
..do stuff on stable fs
jbd2_journal_unlock_updates() // carry on with filesystem use.
</programlisting>
<para>
The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing these
calls.
</para>
</sect2>
<sect2 id="jbd_summary">
<title>Summary</title>
<para>
Using the journal is a matter of wrapping the different context changes,
being each mount, each modification (transaction) and each changed buffer
to tell the journalling layer about them.
</para>
</sect2>
</sect1>
<sect1 id="data_types">
<title>Data Types</title>
<para>
The journalling layer uses typedefs to 'hide' the concrete definitions
of the structures used. As a client of the JBD2 layer you can
just rely on the using the pointer as a magic cookie of some sort.
Obviously the hiding is not enforced as this is 'C'.
</para>
<sect2 id="structures"><title>Structures</title>
!Iinclude/linux/jbd2.h
</sect2>
</sect1>
<sect1 id="functions">
<title>Functions</title>
<para>
The functions here are split into two groups those that
affect a journal as a whole, and those which are used to
manage transactions
</para>
<sect2 id="journal_level"><title>Journal Level</title>
!Efs/jbd2/journal.c
!Ifs/jbd2/recovery.c
</sect2>
<sect2 id="transaction_level"><title>Transasction Level</title>
!Efs/jbd2/transaction.c
</sect2>
</sect1>
<sect1 id="see_also">
<title>See also</title>
<para>
<citation>
<ulink url="http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz">
Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen Tweedie
</ulink>
</citation>
</para>
<para>
<citation>
<ulink url="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html">
Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen Tweedie
</ulink>
</citation>
</para>
</sect1>
</chapter>
<chapter id="splice">
<title>splice API</title>
<para>
splice is a method for moving blocks of data around inside the
kernel, without continually transferring them between the kernel
and user space.
</para>
!Ffs/splice.c
</chapter>
<chapter id="pipes">
<title>pipes API</title>
<para>
Pipe interfaces are all for in-kernel (builtin image) use.
They are not exported for use by modules.
</para>
!Iinclude/linux/pipe_fs_i.h
!Ffs/pipe.c
</chapter>
</book>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,918 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="kgdbOnLinux">
<bookinfo>
<title>Using kgdb, kdb and the kernel debugger internals</title>
<authorgroup>
<author>
<firstname>Jason</firstname>
<surname>Wessel</surname>
<affiliation>
<address>
<email>jason.wessel@windriver.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2008,2010</year>
<holder>Wind River Systems, Inc.</holder>
</copyright>
<copyright>
<year>2004-2005</year>
<holder>MontaVista Software, Inc.</holder>
</copyright>
<copyright>
<year>2004</year>
<holder>Amit S. Kale</holder>
</copyright>
<legalnotice>
<para>
This file is licensed under the terms of the GNU General Public License
version 2. This program is licensed "as is" without any warranty of any
kind, whether express or implied.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="Introduction">
<title>Introduction</title>
<para>
The kernel has two different debugger front ends (kdb and kgdb)
which interface to the debug core. It is possible to use either
of the debugger front ends and dynamically transition between them
if you configure the kernel properly at compile and runtime.
</para>
<para>
Kdb is simplistic shell-style interface which you can use on a
system console with a keyboard or serial console. You can use it
to inspect memory, registers, process lists, dmesg, and even set
breakpoints to stop in a certain location. Kdb is not a source
level debugger, although you can set breakpoints and execute some
basic kernel run control. Kdb is mainly aimed at doing some
analysis to aid in development or diagnosing kernel problems. You
can access some symbols by name in kernel built-ins or in kernel
modules if the code was built
with <symbol>CONFIG_KALLSYMS</symbol>.
</para>
<para>
Kgdb is intended to be used as a source level debugger for the
Linux kernel. It is used along with gdb to debug a Linux kernel.
The expectation is that gdb can be used to "break in" to the
kernel to inspect memory, variables and look through call stack
information similar to the way an application developer would use
gdb to debug an application. It is possible to place breakpoints
in kernel code and perform some limited execution stepping.
</para>
<para>
Two machines are required for using kgdb. One of these machines is
a development machine and the other is the target machine. The
kernel to be debugged runs on the target machine. The development
machine runs an instance of gdb against the vmlinux file which
contains the symbols (not a boot image such as bzImage, zImage,
uImage...). In gdb the developer specifies the connection
parameters and connects to kgdb. The type of connection a
developer makes with gdb depends on the availability of kgdb I/O
modules compiled as built-ins or loadable kernel modules in the test
machine's kernel.
</para>
</chapter>
<chapter id="CompilingAKernel">
<title>Compiling a kernel</title>
<para>
<itemizedlist>
<listitem><para>In order to enable compilation of kdb, you must first enable kgdb.</para></listitem>
<listitem><para>The kgdb test compile options are described in the kgdb test suite chapter.</para></listitem>
</itemizedlist>
</para>
<sect1 id="CompileKGDB">
<title>Kernel config options for kgdb</title>
<para>
To enable <symbol>CONFIG_KGDB</symbol> you should look under
"Kernel hacking" / "Kernel debugging" and select "KGDB: kernel debugger".
</para>
<para>
While it is not a hard requirement that you have symbols in your
vmlinux file, gdb tends not to be very useful without the symbolic
data, so you will want to turn
on <symbol>CONFIG_DEBUG_INFO</symbol> which is called "Compile the
kernel with debug info" in the config menu.
</para>
<para>
It is advised, but not required, that you turn on the
<symbol>CONFIG_FRAME_POINTER</symbol> kernel option which is called "Compile the
kernel with frame pointers" in the config menu. This option
inserts code to into the compiled executable which saves the frame
information in registers or on the stack at different points which
allows a debugger such as gdb to more accurately construct
stack back traces while debugging the kernel.
</para>
<para>
If the architecture that you are using supports the kernel option
CONFIG_STRICT_KERNEL_RWX, you should consider turning it off. This
option will prevent the use of software breakpoints because it
marks certain regions of the kernel's memory space as read-only.
If kgdb supports it for the architecture you are using, you can
use hardware breakpoints if you desire to run with the
CONFIG_STRICT_KERNEL_RWX option turned on, else you need to turn off
this option.
</para>
<para>
Next you should choose one of more I/O drivers to interconnect
debugging host and debugged target. Early boot debugging requires
a KGDB I/O driver that supports early debugging and the driver
must be built into the kernel directly. Kgdb I/O driver
configuration takes place via kernel or module parameters which
you can learn more about in the in the section that describes the
parameter "kgdboc".
</para>
<para>Here is an example set of .config symbols to enable or
disable for kgdb:
<itemizedlist>
<listitem><para># CONFIG_STRICT_KERNEL_RWX is not set</para></listitem>
<listitem><para>CONFIG_FRAME_POINTER=y</para></listitem>
<listitem><para>CONFIG_KGDB=y</para></listitem>
<listitem><para>CONFIG_KGDB_SERIAL_CONSOLE=y</para></listitem>
</itemizedlist>
</para>
</sect1>
<sect1 id="CompileKDB">
<title>Kernel config options for kdb</title>
<para>Kdb is quite a bit more complex than the simple gdbstub
sitting on top of the kernel's debug core. Kdb must implement a
shell, and also adds some helper functions in other parts of the
kernel, responsible for printing out interesting data such as what
you would see if you ran "lsmod", or "ps". In order to build kdb
into the kernel you follow the same steps as you would for kgdb.
</para>
<para>The main config option for kdb
is <symbol>CONFIG_KGDB_KDB</symbol> which is called "KGDB_KDB:
include kdb frontend for kgdb" in the config menu. In theory you
would have already also selected an I/O driver such as the
CONFIG_KGDB_SERIAL_CONSOLE interface if you plan on using kdb on a
serial port, when you were configuring kgdb.
</para>
<para>If you want to use a PS/2-style keyboard with kdb, you would
select CONFIG_KDB_KEYBOARD which is called "KGDB_KDB: keyboard as
input device" in the config menu. The CONFIG_KDB_KEYBOARD option
is not used for anything in the gdb interface to kgdb. The
CONFIG_KDB_KEYBOARD option only works with kdb.
</para>
<para>Here is an example set of .config symbols to enable/disable kdb:
<itemizedlist>
<listitem><para># CONFIG_STRICT_KERNEL_RWX is not set</para></listitem>
<listitem><para>CONFIG_FRAME_POINTER=y</para></listitem>
<listitem><para>CONFIG_KGDB=y</para></listitem>
<listitem><para>CONFIG_KGDB_SERIAL_CONSOLE=y</para></listitem>
<listitem><para>CONFIG_KGDB_KDB=y</para></listitem>
<listitem><para>CONFIG_KDB_KEYBOARD=y</para></listitem>
</itemizedlist>
</para>
</sect1>
</chapter>
<chapter id="kgdbKernelArgs">
<title>Kernel Debugger Boot Arguments</title>
<para>This section describes the various runtime kernel
parameters that affect the configuration of the kernel debugger.
The following chapter covers using kdb and kgdb as well as
providing some examples of the configuration parameters.</para>
<sect1 id="kgdboc">
<title>Kernel parameter: kgdboc</title>
<para>The kgdboc driver was originally an abbreviation meant to
stand for "kgdb over console". Today it is the primary mechanism
to configure how to communicate from gdb to kgdb as well as the
devices you want to use to interact with the kdb shell.
</para>
<para>For kgdb/gdb, kgdboc is designed to work with a single serial
port. It is intended to cover the circumstance where you want to
use a serial console as your primary console as well as using it to
perform kernel debugging. It is also possible to use kgdb on a
serial port which is not designated as a system console. Kgdboc
may be configured as a kernel built-in or a kernel loadable module.
You can only make use of <constant>kgdbwait</constant> and early
debugging if you build kgdboc into the kernel as a built-in.
</para>
<para>Optionally you can elect to activate kms (Kernel Mode
Setting) integration. When you use kms with kgdboc and you have a
video driver that has atomic mode setting hooks, it is possible to
enter the debugger on the graphics console. When the kernel
execution is resumed, the previous graphics mode will be restored.
This integration can serve as a useful tool to aid in diagnosing
crashes or doing analysis of memory with kdb while allowing the
full graphics console applications to run.
</para>
<sect2 id="kgdbocArgs">
<title>kgdboc arguments</title>
<para>Usage: <constant>kgdboc=[kms][[,]kbd][[,]serial_device][,baud]</constant></para>
<para>The order listed above must be observed if you use any of the
optional configurations together.
</para>
<para>Abbreviations:
<itemizedlist>
<listitem><para>kms = Kernel Mode Setting</para></listitem>
<listitem><para>kbd = Keyboard</para></listitem>
</itemizedlist>
</para>
<para>You can configure kgdboc to use the keyboard, and/or a serial
device depending on if you are using kdb and/or kgdb, in one of the
following scenarios. The order listed above must be observed if
you use any of the optional configurations together. Using kms +
only gdb is generally not a useful combination.</para>
<sect3 id="kgdbocArgs1">
<title>Using loadable module or built-in</title>
<para>
<orderedlist>
<listitem><para>As a kernel built-in:</para>
<para>Use the kernel boot argument: <constant>kgdboc=&lt;tty-device&gt;,[baud]</constant></para></listitem>
<listitem>
<para>As a kernel loadable module:</para>
<para>Use the command: <constant>modprobe kgdboc kgdboc=&lt;tty-device&gt;,[baud]</constant></para>
<para>Here are two examples of how you might format the kgdboc
string. The first is for an x86 target using the first serial port.
The second example is for the ARM Versatile AB using the second
serial port.
<orderedlist>
<listitem><para><constant>kgdboc=ttyS0,115200</constant></para></listitem>
<listitem><para><constant>kgdboc=ttyAMA1,115200</constant></para></listitem>
</orderedlist>
</para>
</listitem>
</orderedlist></para>
</sect3>
<sect3 id="kgdbocArgs2">
<title>Configure kgdboc at runtime with sysfs</title>
<para>At run time you can enable or disable kgdboc by echoing a
parameters into the sysfs. Here are two examples:</para>
<orderedlist>
<listitem><para>Enable kgdboc on ttyS0</para>
<para><constant>echo ttyS0 &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
<listitem><para>Disable kgdboc</para>
<para><constant>echo "" &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
</orderedlist>
<para>NOTE: You do not need to specify the baud if you are
configuring the console on tty which is already configured or
open.</para>
</sect3>
<sect3 id="kgdbocArgs3">
<title>More examples</title>
<para>You can configure kgdboc to use the keyboard, and/or a serial device
depending on if you are using kdb and/or kgdb, in one of the
following scenarios.
<orderedlist>
<listitem><para>kdb and kgdb over only a serial port</para>
<para><constant>kgdboc=&lt;serial_device&gt;[,baud]</constant></para>
<para>Example: <constant>kgdboc=ttyS0,115200</constant></para>
</listitem>
<listitem><para>kdb and kgdb with keyboard and a serial port</para>
<para><constant>kgdboc=kbd,&lt;serial_device&gt;[,baud]</constant></para>
<para>Example: <constant>kgdboc=kbd,ttyS0,115200</constant></para>
</listitem>
<listitem><para>kdb with a keyboard</para>
<para><constant>kgdboc=kbd</constant></para>
</listitem>
<listitem><para>kdb with kernel mode setting</para>
<para><constant>kgdboc=kms,kbd</constant></para>
</listitem>
<listitem><para>kdb with kernel mode setting and kgdb over a serial port</para>
<para><constant>kgdboc=kms,kbd,ttyS0,115200</constant></para>
</listitem>
</orderedlist>
</para>
<para>NOTE: Kgdboc does not support interrupting the target via the
gdb remote protocol. You must manually send a sysrq-g unless you
have a proxy that splits console output to a terminal program.
A console proxy has a separate TCP port for the debugger and a separate
TCP port for the "human" console. The proxy can take care of sending
the sysrq-g for you.
</para>
<para>When using kgdboc with no debugger proxy, you can end up
connecting the debugger at one of two entry points. If an
exception occurs after you have loaded kgdboc, a message should
print on the console stating it is waiting for the debugger. In
this case you disconnect your terminal program and then connect the
debugger in its place. If you want to interrupt the target system
and forcibly enter a debug session you have to issue a Sysrq
sequence and then type the letter <constant>g</constant>. Then
you disconnect the terminal session and connect gdb. Your options
if you don't like this are to hack gdb to send the sysrq-g for you
as well as on the initial connect, or to use a debugger proxy that
allows an unmodified gdb to do the debugging.
</para>
</sect3>
</sect2>
</sect1>
<sect1 id="kgdbwait">
<title>Kernel parameter: kgdbwait</title>
<para>
The Kernel command line option <constant>kgdbwait</constant> makes
kgdb wait for a debugger connection during booting of a kernel. You
can only use this option if you compiled a kgdb I/O driver into the
kernel and you specified the I/O driver configuration as a kernel
command line option. The kgdbwait parameter should always follow the
configuration parameter for the kgdb I/O driver in the kernel
command line else the I/O driver will not be configured prior to
asking the kernel to use it to wait.
</para>
<para>
The kernel will stop and wait as early as the I/O driver and
architecture allows when you use this option. If you build the
kgdb I/O driver as a loadable kernel module kgdbwait will not do
anything.
</para>
</sect1>
<sect1 id="kgdbcon">
<title>Kernel parameter: kgdbcon</title>
<para> The kgdbcon feature allows you to see printk() messages
inside gdb while gdb is connected to the kernel. Kdb does not make
use of the kgdbcon feature.
</para>
<para>Kgdb supports using the gdb serial protocol to send console
messages to the debugger when the debugger is connected and running.
There are two ways to activate this feature.
<orderedlist>
<listitem><para>Activate with the kernel command line option:</para>
<para><constant>kgdbcon</constant></para>
</listitem>
<listitem><para>Use sysfs before configuring an I/O driver</para>
<para>
<constant>echo 1 &gt; /sys/module/kgdb/parameters/kgdb_use_con</constant>
</para>
<para>
NOTE: If you do this after you configure the kgdb I/O driver, the
setting will not take effect until the next point the I/O is
reconfigured.
</para>
</listitem>
</orderedlist>
</para>
<para>IMPORTANT NOTE: You cannot use kgdboc + kgdbcon on a tty that is an
active system console. An example of incorrect usage is <constant>console=ttyS0,115200 kgdboc=ttyS0 kgdbcon</constant>
</para>
<para>It is possible to use this option with kgdboc on a tty that is not a system console.
</para>
</sect1>
<sect1 id="kgdbreboot">
<title>Run time parameter: kgdbreboot</title>
<para> The kgdbreboot feature allows you to change how the debugger
deals with the reboot notification. You have 3 choices for the
behavior. The default behavior is always set to 0.</para>
<orderedlist>
<listitem><para>echo -1 > /sys/module/debug_core/parameters/kgdbreboot</para>
<para>Ignore the reboot notification entirely.</para>
</listitem>
<listitem><para>echo 0 > /sys/module/debug_core/parameters/kgdbreboot</para>
<para>Send the detach message to any attached debugger client.</para>
</listitem>
<listitem><para>echo 1 > /sys/module/debug_core/parameters/kgdbreboot</para>
<para>Enter the debugger on reboot notify.</para>
</listitem>
</orderedlist>
</sect1>
</chapter>
<chapter id="usingKDB">
<title>Using kdb</title>
<para>
</para>
<sect1 id="quickKDBserial">
<title>Quick start for kdb on a serial port</title>
<para>This is a quick example of how to use kdb.</para>
<para><orderedlist>
<listitem><para>Configure kgdboc at boot using kernel parameters:
<itemizedlist>
<listitem><para><constant>console=ttyS0,115200 kgdboc=ttyS0,115200</constant></para></listitem>
</itemizedlist></para>
<para>OR</para>
<para>Configure kgdboc after the kernel has booted; assuming you are using a serial port console:
<itemizedlist>
<listitem><para><constant>echo ttyS0 &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
</itemizedlist>
</para>
</listitem>
<listitem><para>Enter the kernel debugger manually or by waiting for an oops or fault. There are several ways you can enter the kernel debugger manually; all involve using the sysrq-g, which means you must have enabled CONFIG_MAGIC_SYSRQ=y in your kernel config.</para>
<itemizedlist>
<listitem><para>When logged in as root or with a super user session you can run:</para>
<para><constant>echo g &gt; /proc/sysrq-trigger</constant></para></listitem>
<listitem><para>Example using minicom 2.2</para>
<para>Press: <constant>Control-a</constant></para>
<para>Press: <constant>f</constant></para>
<para>Press: <constant>g</constant></para>
</listitem>
<listitem><para>When you have telneted to a terminal server that supports sending a remote break</para>
<para>Press: <constant>Control-]</constant></para>
<para>Type in:<constant>send break</constant></para>
<para>Press: <constant>Enter</constant></para>
<para>Press: <constant>g</constant></para>
</listitem>
</itemizedlist>
</listitem>
<listitem><para>From the kdb prompt you can run the "help" command to see a complete list of the commands that are available.</para>
<para>Some useful commands in kdb include:
<itemizedlist>
<listitem><para>lsmod -- Shows where kernel modules are loaded</para></listitem>
<listitem><para>ps -- Displays only the active processes</para></listitem>
<listitem><para>ps A -- Shows all the processes</para></listitem>
<listitem><para>summary -- Shows kernel version info and memory usage</para></listitem>
<listitem><para>bt -- Get a backtrace of the current process using dump_stack()</para></listitem>
<listitem><para>dmesg -- View the kernel syslog buffer</para></listitem>
<listitem><para>go -- Continue the system</para></listitem>
</itemizedlist>
</para>
</listitem>
<listitem>
<para>When you are done using kdb you need to consider rebooting the
system or using the "go" command to resuming normal kernel
execution. If you have paused the kernel for a lengthy period of
time, applications that rely on timely networking or anything to do
with real wall clock time could be adversely affected, so you
should take this into consideration when using the kernel
debugger.</para>
</listitem>
</orderedlist></para>
</sect1>
<sect1 id="quickKDBkeyboard">
<title>Quick start for kdb using a keyboard connected console</title>
<para>This is a quick example of how to use kdb with a keyboard.</para>
<para><orderedlist>
<listitem><para>Configure kgdboc at boot using kernel parameters:
<itemizedlist>
<listitem><para><constant>kgdboc=kbd</constant></para></listitem>
</itemizedlist></para>
<para>OR</para>
<para>Configure kgdboc after the kernel has booted:
<itemizedlist>
<listitem><para><constant>echo kbd &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
</itemizedlist>
</para>
</listitem>
<listitem><para>Enter the kernel debugger manually or by waiting for an oops or fault. There are several ways you can enter the kernel debugger manually; all involve using the sysrq-g, which means you must have enabled CONFIG_MAGIC_SYSRQ=y in your kernel config.</para>
<itemizedlist>
<listitem><para>When logged in as root or with a super user session you can run:</para>
<para><constant>echo g &gt; /proc/sysrq-trigger</constant></para></listitem>
<listitem><para>Example using a laptop keyboard</para>
<para>Press and hold down: <constant>Alt</constant></para>
<para>Press and hold down: <constant>Fn</constant></para>
<para>Press and release the key with the label: <constant>SysRq</constant></para>
<para>Release: <constant>Fn</constant></para>
<para>Press and release: <constant>g</constant></para>
<para>Release: <constant>Alt</constant></para>
</listitem>
<listitem><para>Example using a PS/2 101-key keyboard</para>
<para>Press and hold down: <constant>Alt</constant></para>
<para>Press and release the key with the label: <constant>SysRq</constant></para>
<para>Press and release: <constant>g</constant></para>
<para>Release: <constant>Alt</constant></para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>Now type in a kdb command such as "help", "dmesg", "bt" or "go" to continue kernel execution.</para>
</listitem>
</orderedlist></para>
</sect1>
</chapter>
<chapter id="EnableKGDB">
<title>Using kgdb / gdb</title>
<para>In order to use kgdb you must activate it by passing
configuration information to one of the kgdb I/O drivers. If you
do not pass any configuration information kgdb will not do anything
at all. Kgdb will only actively hook up to the kernel trap hooks
if a kgdb I/O driver is loaded and configured. If you unconfigure
a kgdb I/O driver, kgdb will unregister all the kernel hook points.
</para>
<para> All kgdb I/O drivers can be reconfigured at run time, if
<symbol>CONFIG_SYSFS</symbol> and <symbol>CONFIG_MODULES</symbol>
are enabled, by echo'ing a new config string to
<constant>/sys/module/&lt;driver&gt;/parameter/&lt;option&gt;</constant>.
The driver can be unconfigured by passing an empty string. You cannot
change the configuration while the debugger is attached. Make sure
to detach the debugger with the <constant>detach</constant> command
prior to trying to unconfigure a kgdb I/O driver.
</para>
<sect1 id="ConnectingGDB">
<title>Connecting with gdb to a serial port</title>
<orderedlist>
<listitem><para>Configure kgdboc</para>
<para>Configure kgdboc at boot using kernel parameters:
<itemizedlist>
<listitem><para><constant>kgdboc=ttyS0,115200</constant></para></listitem>
</itemizedlist></para>
<para>OR</para>
<para>Configure kgdboc after the kernel has booted:
<itemizedlist>
<listitem><para><constant>echo ttyS0 &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
</itemizedlist></para>
</listitem>
<listitem>
<para>Stop kernel execution (break into the debugger)</para>
<para>In order to connect to gdb via kgdboc, the kernel must
first be stopped. There are several ways to stop the kernel which
include using kgdbwait as a boot argument, via a sysrq-g, or running
the kernel until it takes an exception where it waits for the
debugger to attach.
<itemizedlist>
<listitem><para>When logged in as root or with a super user session you can run:</para>
<para><constant>echo g &gt; /proc/sysrq-trigger</constant></para></listitem>
<listitem><para>Example using minicom 2.2</para>
<para>Press: <constant>Control-a</constant></para>
<para>Press: <constant>f</constant></para>
<para>Press: <constant>g</constant></para>
</listitem>
<listitem><para>When you have telneted to a terminal server that supports sending a remote break</para>
<para>Press: <constant>Control-]</constant></para>
<para>Type in:<constant>send break</constant></para>
<para>Press: <constant>Enter</constant></para>
<para>Press: <constant>g</constant></para>
</listitem>
</itemizedlist>
</para>
</listitem>
<listitem>
<para>Connect from gdb</para>
<para>
Example (using a directly connected port):
</para>
<programlisting>
% gdb ./vmlinux
(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyS0
</programlisting>
<para>
Example (kgdb to a terminal server on TCP port 2012):
</para>
<programlisting>
% gdb ./vmlinux
(gdb) target remote 192.168.2.2:2012
</programlisting>
<para>
Once connected, you can debug a kernel the way you would debug an
application program.
</para>
<para>
If you are having problems connecting or something is going
seriously wrong while debugging, it will most often be the case
that you want to enable gdb to be verbose about its target
communications. You do this prior to issuing the <constant>target
remote</constant> command by typing in: <constant>set debug remote 1</constant>
</para>
</listitem>
</orderedlist>
<para>Remember if you continue in gdb, and need to "break in" again,
you need to issue an other sysrq-g. It is easy to create a simple
entry point by putting a breakpoint at <constant>sys_sync</constant>
and then you can run "sync" from a shell or script to break into the
debugger.</para>
</sect1>
</chapter>
<chapter id="switchKdbKgdb">
<title>kgdb and kdb interoperability</title>
<para>It is possible to transition between kdb and kgdb dynamically.
The debug core will remember which you used the last time and
automatically start in the same mode.</para>
<sect1>
<title>Switching between kdb and kgdb</title>
<sect2>
<title>Switching from kgdb to kdb</title>
<para>
There are two ways to switch from kgdb to kdb: you can use gdb to
issue a maintenance packet, or you can blindly type the command $3#33.
Whenever the kernel debugger stops in kgdb mode it will print the
message <constant>KGDB or $3#33 for KDB</constant>. It is important
to note that you have to type the sequence correctly in one pass.
You cannot type a backspace or delete because kgdb will interpret
that as part of the debug stream.
<orderedlist>
<listitem><para>Change from kgdb to kdb by blindly typing:</para>
<para><constant>$3#33</constant></para></listitem>
<listitem><para>Change from kgdb to kdb with gdb</para>
<para><constant>maintenance packet 3</constant></para>
<para>NOTE: Now you must kill gdb. Typically you press control-z and
issue the command: kill -9 %</para></listitem>
</orderedlist>
</para>
</sect2>
<sect2>
<title>Change from kdb to kgdb</title>
<para>There are two ways you can change from kdb to kgdb. You can
manually enter kgdb mode by issuing the kgdb command from the kdb
shell prompt, or you can connect gdb while the kdb shell prompt is
active. The kdb shell looks for the typical first commands that gdb
would issue with the gdb remote protocol and if it sees one of those
commands it automatically changes into kgdb mode.</para>
<orderedlist>
<listitem><para>From kdb issue the command:</para>
<para><constant>kgdb</constant></para>
<para>Now disconnect your terminal program and connect gdb in its place</para></listitem>
<listitem><para>At the kdb prompt, disconnect the terminal program and connect gdb in its place.</para></listitem>
</orderedlist>
</sect2>
</sect1>
<sect1>
<title>Running kdb commands from gdb</title>
<para>It is possible to run a limited set of kdb commands from gdb,
using the gdb monitor command. You don't want to execute any of the
run control or breakpoint operations, because it can disrupt the
state of the kernel debugger. You should be using gdb for
breakpoints and run control operations if you have gdb connected.
The more useful commands to run are things like lsmod, dmesg, ps or
possibly some of the memory information commands. To see all the kdb
commands you can run <constant>monitor help</constant>.</para>
<para>Example:
<informalexample><programlisting>
(gdb) monitor ps
1 idle process (state I) and
27 sleeping system daemon (state M) processes suppressed,
use 'ps A' to see all.
Task Addr Pid Parent [*] cpu State Thread Command
0xc78291d0 1 0 0 0 S 0xc7829404 init
0xc7954150 942 1 0 0 S 0xc7954384 dropbear
0xc78789c0 944 1 0 0 S 0xc7878bf4 sh
(gdb)
</programlisting></informalexample>
</para>
</sect1>
</chapter>
<chapter id="KGDBTestSuite">
<title>kgdb Test Suite</title>
<para>
When kgdb is enabled in the kernel config you can also elect to
enable the config parameter KGDB_TESTS. Turning this on will
enable a special kgdb I/O module which is designed to test the
kgdb internal functions.
</para>
<para>
The kgdb tests are mainly intended for developers to test the kgdb
internals as well as a tool for developing a new kgdb architecture
specific implementation. These tests are not really for end users
of the Linux kernel. The primary source of documentation would be
to look in the drivers/misc/kgdbts.c file.
</para>
<para>
The kgdb test suite can also be configured at compile time to run
the core set of tests by setting the kernel config parameter
KGDB_TESTS_ON_BOOT. This particular option is aimed at automated
regression testing and does not require modifying the kernel boot
config arguments. If this is turned on, the kgdb test suite can
be disabled by specifying "kgdbts=" as a kernel boot argument.
</para>
</chapter>
<chapter id="CommonBackEndReq">
<title>Kernel Debugger Internals</title>
<sect1 id="kgdbArchitecture">
<title>Architecture Specifics</title>
<para>
The kernel debugger is organized into a number of components:
<orderedlist>
<listitem><para>The debug core</para>
<para>
The debug core is found in kernel/debugger/debug_core.c. It contains:
<itemizedlist>
<listitem><para>A generic OS exception handler which includes
sync'ing the processors into a stopped state on an multi-CPU
system.</para></listitem>
<listitem><para>The API to talk to the kgdb I/O drivers</para></listitem>
<listitem><para>The API to make calls to the arch-specific kgdb implementation</para></listitem>
<listitem><para>The logic to perform safe memory reads and writes to memory while using the debugger</para></listitem>
<listitem><para>A full implementation for software breakpoints unless overridden by the arch</para></listitem>
<listitem><para>The API to invoke either the kdb or kgdb frontend to the debug core.</para></listitem>
<listitem><para>The structures and callback API for atomic kernel mode setting.</para>
<para>NOTE: kgdboc is where the kms callbacks are invoked.</para></listitem>
</itemizedlist>
</para>
</listitem>
<listitem><para>kgdb arch-specific implementation</para>
<para>
This implementation is generally found in arch/*/kernel/kgdb.c.
As an example, arch/x86/kernel/kgdb.c contains the specifics to
implement HW breakpoint as well as the initialization to
dynamically register and unregister for the trap handlers on
this architecture. The arch-specific portion implements:
<itemizedlist>
<listitem><para>contains an arch-specific trap catcher which
invokes kgdb_handle_exception() to start kgdb about doing its
work</para></listitem>
<listitem><para>translation to and from gdb specific packet format to pt_regs</para></listitem>
<listitem><para>Registration and unregistration of architecture specific trap hooks</para></listitem>
<listitem><para>Any special exception handling and cleanup</para></listitem>
<listitem><para>NMI exception handling and cleanup</para></listitem>
<listitem><para>(optional) HW breakpoints</para></listitem>
</itemizedlist>
</para>
</listitem>
<listitem><para>gdbstub frontend (aka kgdb)</para>
<para>The gdbstub is located in kernel/debug/gdbstub.c. It contains:</para>
<itemizedlist>
<listitem><para>All the logic to implement the gdb serial protocol</para></listitem>
</itemizedlist>
</listitem>
<listitem><para>kdb frontend</para>
<para>The kdb debugger shell is broken down into a number of
components. The kdb core is located in kernel/debug/kdb. There
are a number of helper functions in some of the other kernel
components to make it possible for kdb to examine and report
information about the kernel without taking locks that could
cause a kernel deadlock. The kdb core contains implements the following functionality.</para>
<itemizedlist>
<listitem><para>A simple shell</para></listitem>
<listitem><para>The kdb core command set</para></listitem>
<listitem><para>A registration API to register additional kdb shell commands.</para>
<itemizedlist>
<listitem><para>A good example of a self-contained kdb module
is the "ftdump" command for dumping the ftrace buffer. See:
kernel/trace/trace_kdb.c</para></listitem>
<listitem><para>For an example of how to dynamically register
a new kdb command you can build the kdb_hello.ko kernel module
from samples/kdb/kdb_hello.c. To build this example you can
set CONFIG_SAMPLES=y and CONFIG_SAMPLE_KDB=m in your kernel
config. Later run "modprobe kdb_hello" and the next time you
enter the kdb shell, you can run the "hello"
command.</para></listitem>
</itemizedlist></listitem>
<listitem><para>The implementation for kdb_printf() which
emits messages directly to I/O drivers, bypassing the kernel
log.</para></listitem>
<listitem><para>SW / HW breakpoint management for the kdb shell</para></listitem>
</itemizedlist>
</listitem>
<listitem><para>kgdb I/O driver</para>
<para>
Each kgdb I/O driver has to provide an implementation for the following:
<itemizedlist>
<listitem><para>configuration via built-in or module</para></listitem>
<listitem><para>dynamic configuration and kgdb hook registration calls</para></listitem>
<listitem><para>read and write character interface</para></listitem>
<listitem><para>A cleanup handler for unconfiguring from the kgdb core</para></listitem>
<listitem><para>(optional) Early debug methodology</para></listitem>
</itemizedlist>
Any given kgdb I/O driver has to operate very closely with the
hardware and must do it in such a way that does not enable
interrupts or change other parts of the system context without
completely restoring them. The kgdb core will repeatedly "poll"
a kgdb I/O driver for characters when it needs input. The I/O
driver is expected to return immediately if there is no data
available. Doing so allows for the future possibility to touch
watchdog hardware in such a way as to have a target system not
reset when these are enabled.
</para>
</listitem>
</orderedlist>
</para>
<para>
If you are intent on adding kgdb architecture specific support
for a new architecture, the architecture should define
<constant>HAVE_ARCH_KGDB</constant> in the architecture specific
Kconfig file. This will enable kgdb for the architecture, and
at that point you must create an architecture specific kgdb
implementation.
</para>
<para>
There are a few flags which must be set on every architecture in
their &lt;asm/kgdb.h&gt; file. These are:
<itemizedlist>
<listitem>
<para>
NUMREGBYTES: The size in bytes of all of the registers, so
that we can ensure they will all fit into a packet.
</para>
</listitem>
<listitem>
<para>
BUFMAX: The size in bytes of the buffer GDB will read into.
This must be larger than NUMREGBYTES.
</para>
</listitem>
<listitem>
<para>
CACHE_FLUSH_IS_SAFE: Set to 1 if it is always safe to call
flush_cache_range or flush_icache_range. On some architectures,
these functions may not be safe to call on SMP since we keep other
CPUs in a holding pattern.
</para>
</listitem>
</itemizedlist>
</para>
<para>
There are also the following functions for the common backend,
found in kernel/kgdb.c, that must be supplied by the
architecture-specific backend unless marked as (optional), in
which case a default function maybe used if the architecture
does not need to provide a specific implementation.
</para>
!Iinclude/linux/kgdb.h
</sect1>
<sect1 id="kgdbocDesign">
<title>kgdboc internals</title>
<sect2>
<title>kgdboc and uarts</title>
<para>
The kgdboc driver is actually a very thin driver that relies on the
underlying low level to the hardware driver having "polling hooks"
to which the tty driver is attached. In the initial
implementation of kgdboc the serial_core was changed to expose a
low level UART hook for doing polled mode reading and writing of a
single character while in an atomic context. When kgdb makes an I/O
request to the debugger, kgdboc invokes a callback in the serial
core which in turn uses the callback in the UART driver.</para>
<para>
When using kgdboc with a UART, the UART driver must implement two callbacks in the <constant>struct uart_ops</constant>. Example from drivers/8250.c:<programlisting>
#ifdef CONFIG_CONSOLE_POLL
.poll_get_char = serial8250_get_poll_char,
.poll_put_char = serial8250_put_poll_char,
#endif
</programlisting>
Any implementation specifics around creating a polling driver use the
<constant>#ifdef CONFIG_CONSOLE_POLL</constant>, as shown above.
Keep in mind that polling hooks have to be implemented in such a way
that they can be called from an atomic context and have to restore
the state of the UART chip on return such that the system can return
to normal when the debugger detaches. You need to be very careful
with any kind of lock you consider, because failing here is most likely
going to mean pressing the reset button.
</para>
</sect2>
<sect2 id="kgdbocKbd">
<title>kgdboc and keyboards</title>
<para>The kgdboc driver contains logic to configure communications
with an attached keyboard. The keyboard infrastructure is only
compiled into the kernel when CONFIG_KDB_KEYBOARD=y is set in the
kernel configuration.</para>
<para>The core polled keyboard driver driver for PS/2 type keyboards
is in drivers/char/kdb_keyboard.c. This driver is hooked into the
debug core when kgdboc populates the callback in the array
called <constant>kdb_poll_funcs[]</constant>. The
kdb_get_kbd_char() is the top-level function which polls hardware
for single character input.
</para>
</sect2>
<sect2 id="kgdbocKms">
<title>kgdboc and kms</title>
<para>The kgdboc driver contains logic to request the graphics
display to switch to a text context when you are using
"kgdboc=kms,kbd", provided that you have a video driver which has a
frame buffer console and atomic kernel mode setting support.</para>
<para>
Every time the kernel
debugger is entered it calls kgdboc_pre_exp_handler() which in turn
calls con_debug_enter() in the virtual console layer. On resuming kernel
execution, the kernel debugger calls kgdboc_post_exp_handler() which
in turn calls con_debug_leave().</para>
<para>Any video driver that wants to be compatible with the kernel
debugger and the atomic kms callbacks must implement the
mode_set_base_atomic, fb_debug_enter and fb_debug_leave operations.
For the fb_debug_enter and fb_debug_leave the option exists to use
the generic drm fb helper functions or implement something custom for
the hardware. The following example shows the initialization of the
.mode_set_base_atomic operation in
drivers/gpu/drm/i915/intel_display.c:
<informalexample>
<programlisting>
static const struct drm_crtc_helper_funcs intel_helper_funcs = {
[...]
.mode_set_base_atomic = intel_pipe_set_base_atomic,
[...]
};
</programlisting>
</informalexample>
</para>
<para>Here is an example of how the i915 driver initializes the fb_debug_enter and fb_debug_leave functions to use the generic drm helpers in
drivers/gpu/drm/i915/intel_fb.c:
<informalexample>
<programlisting>
static struct fb_ops intelfb_ops = {
[...]
.fb_debug_enter = drm_fb_helper_debug_enter,
.fb_debug_leave = drm_fb_helper_debug_leave,
[...]
};
</programlisting>
</informalexample>
</para>
</sect2>
</sect1>
</chapter>
<chapter id="credits">
<title>Credits</title>
<para>
The following people have contributed to this document:
<orderedlist>
<listitem><para>Amit Kale<email>amitkale@linsyssoft.com</email></para></listitem>
<listitem><para>Tom Rini<email>trini@kernel.crashing.org</email></para></listitem>
</orderedlist>
In March 2008 this document was completely rewritten by:
<itemizedlist>
<listitem><para>Jason Wessel<email>jason.wessel@windriver.com</email></para></listitem>
</itemizedlist>
In Jan 2010 this document was updated to include kdb.
<itemizedlist>
<listitem><para>Jason Wessel<email>jason.wessel@windriver.com</email></para></listitem>
</itemizedlist>
</para>
</chapter>
</book>

File diff suppressed because it is too large Load Diff

View File

@ -1,289 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Reed-Solomon-Library-Guide">
<bookinfo>
<title>Reed-Solomon Library Programming Interface</title>
<authorgroup>
<author>
<firstname>Thomas</firstname>
<surname>Gleixner</surname>
<affiliation>
<address>
<email>tglx@linutronix.de</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2004</year>
<holder>Thomas Gleixner</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
The generic Reed-Solomon Library provides encoding, decoding
and error correction functions.
</para>
<para>
Reed-Solomon codes are used in communication and storage
applications to ensure data integrity.
</para>
<para>
This documentation is provided for developers who want to utilize
the functions provided by the library.
</para>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
None.
</para>
</chapter>
<chapter id="usage">
<title>Usage</title>
<para>
This chapter provides examples of how to use the library.
</para>
<sect1>
<title>Initializing</title>
<para>
The init function init_rs returns a pointer to an
rs decoder structure, which holds the necessary
information for encoding, decoding and error correction
with the given polynomial. It either uses an existing
matching decoder or creates a new one. On creation all
the lookup tables for fast en/decoding are created.
The function may take a while, so make sure not to
call it in critical code paths.
</para>
<programlisting>
/* the Reed Solomon control structure */
static struct rs_control *rs_decoder;
/* Symbolsize is 10 (bits)
* Primitive polynomial is x^10+x^3+1
* first consecutive root is 0
* primitive element to generate roots = 1
* generator polynomial degree (number of roots) = 6
*/
rs_decoder = init_rs (10, 0x409, 0, 1, 6);
</programlisting>
</sect1>
<sect1>
<title>Encoding</title>
<para>
The encoder calculates the Reed-Solomon code over
the given data length and stores the result in
the parity buffer. Note that the parity buffer must
be initialized before calling the encoder.
</para>
<para>
The expanded data can be inverted on the fly by
providing a non-zero inversion mask. The expanded data is
XOR'ed with the mask. This is used e.g. for FLASH
ECC, where the all 0xFF is inverted to an all 0x00.
The Reed-Solomon code for all 0x00 is all 0x00. The
code is inverted before storing to FLASH so it is 0xFF
too. This prevents that reading from an erased FLASH
results in ECC errors.
</para>
<para>
The databytes are expanded to the given symbol size
on the fly. There is no support for encoding continuous
bitstreams with a symbol size != 8 at the moment. If
it is necessary it should be not a big deal to implement
such functionality.
</para>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6];
/* Initialize the parity buffer */
memset(par, 0, sizeof(par));
/* Encode 512 byte in data8. Store parity in buffer par */
encode_rs8 (rs_decoder, data8, 512, par, 0);
</programlisting>
</sect1>
<sect1>
<title>Decoding</title>
<para>
The decoder calculates the syndrome over
the given data length and the received parity symbols
and corrects errors in the data.
</para>
<para>
If a syndrome is available from a hardware decoder
then the syndrome calculation is skipped.
</para>
<para>
The correction of the data buffer can be suppressed
by providing a correction pattern buffer and an error
location buffer to the decoder. The decoder stores the
calculated error location and the correction bitmask
in the given buffers. This is useful for hardware
decoders which use a weird bit ordering scheme.
</para>
<para>
The databytes are expanded to the given symbol size
on the fly. There is no support for decoding continuous
bitstreams with a symbolsize != 8 at the moment. If
it is necessary it should be not a big deal to implement
such functionality.
</para>
<sect2>
<title>
Decoding with syndrome calculation, direct data correction
</title>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6];
uint8_t data[512];
int numerr;
/* Receive data */
.....
/* Receive parity */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL);
</programlisting>
</sect2>
<sect2>
<title>
Decoding with syndrome given by hardware decoder, direct data correction
</title>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6], syn[6];
uint8_t data[512];
int numerr;
/* Receive data */
.....
/* Receive parity */
.....
/* Get syndrome from hardware decoder */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL);
</programlisting>
</sect2>
<sect2>
<title>
Decoding with syndrome given by hardware decoder, no direct data correction.
</title>
<para>
Note: It's not necessary to give data and received parity to the decoder.
</para>
<programlisting>
/* Parity buffer. Size = number of roots */
uint16_t par[6], syn[6], corr[8];
uint8_t data[512];
int numerr, errpos[8];
/* Receive data */
.....
/* Receive parity */
.....
/* Get syndrome from hardware decoder */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr);
for (i = 0; i &lt; numerr; i++) {
do_error_correction_in_your_buffer(errpos[i], corr[i]);
}
</programlisting>
</sect2>
</sect1>
<sect1>
<title>Cleanup</title>
<para>
The function free_rs frees the allocated resources,
if the caller is the last user of the decoder.
</para>
<programlisting>
/* Release resources */
free_rs(rs_decoder);
</programlisting>
</sect1>
</chapter>
<chapter id="structs">
<title>Structures</title>
<para>
This chapter contains the autogenerated documentation of the structures which are
used in the Reed-Solomon Library and are relevant for a developer.
</para>
!Iinclude/linux/rslib.h
</chapter>
<chapter id="pubfunctions">
<title>Public Functions Provided</title>
<para>
This chapter contains the autogenerated documentation of the Reed-Solomon functions
which are exported.
</para>
!Elib/reed_solomon/reed_solomon.c
</chapter>
<chapter id="credits">
<title>Credits</title>
<para>
The library code for encoding and decoding was written by Phil Karn.
</para>
<programlisting>
Copyright 2002, Phil Karn, KA9Q
May be used under the terms of the GNU General Public License (GPL)
</programlisting>
<para>
The wrapper functions and interfaces are written by Thomas Gleixner.
</para>
<para>
Many users have provided bugfixes, improvements and helping hands for testing.
Thanks a lot.
</para>
<para>
The following people have contributed to this document:
</para>
<para>
Thomas Gleixner<email>tglx@linutronix.de</email>
</para>
</chapter>
</book>

View File

@ -1,265 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<article class="whitepaper" id="LinuxSecurityModule" lang="en">
<articleinfo>
<title>Linux Security Modules: General Security Hooks for Linux</title>
<authorgroup>
<author>
<firstname>Stephen</firstname>
<surname>Smalley</surname>
<affiliation>
<orgname>NAI Labs</orgname>
<address><email>ssmalley@nai.com</email></address>
</affiliation>
</author>
<author>
<firstname>Timothy</firstname>
<surname>Fraser</surname>
<affiliation>
<orgname>NAI Labs</orgname>
<address><email>tfraser@nai.com</email></address>
</affiliation>
</author>
<author>
<firstname>Chris</firstname>
<surname>Vance</surname>
<affiliation>
<orgname>NAI Labs</orgname>
<address><email>cvance@nai.com</email></address>
</affiliation>
</author>
</authorgroup>
</articleinfo>
<sect1 id="Introduction"><title>Introduction</title>
<para>
In March 2001, the National Security Agency (NSA) gave a presentation
about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel
Summit. SELinux is an implementation of flexible and fine-grained
nondiscretionary access controls in the Linux kernel, originally
implemented as its own particular kernel patch. Several other
security projects (e.g. RSBAC, Medusa) have also developed flexible
access control architectures for the Linux kernel, and various
projects have developed particular access control models for Linux
(e.g. LIDS, DTE, SubDomain). Each project has developed and
maintained its own kernel patch to support its security needs.
</para>
<para>
In response to the NSA presentation, Linus Torvalds made a set of
remarks that described a security framework he would be willing to
consider for inclusion in the mainstream Linux kernel. He described a
general framework that would provide a set of security hooks to
control operations on kernel objects and a set of opaque security
fields in kernel data structures for maintaining security attributes.
This framework could then be used by loadable kernel modules to
implement any desired model of security. Linus also suggested the
possibility of migrating the Linux capabilities code into such a
module.
</para>
<para>
The Linux Security Modules (LSM) project was started by WireX to
develop such a framework. LSM is a joint development effort by
several security projects, including Immunix, SELinux, SGI and Janus,
and several individuals, including Greg Kroah-Hartman and James
Morris, to develop a Linux kernel patch that implements this
framework. The patch is currently tracking the 2.4 series and is
targeted for integration into the 2.5 development series. This
technical report provides an overview of the framework and the example
capabilities security module provided by the LSM kernel patch.
</para>
</sect1>
<sect1 id="framework"><title>LSM Framework</title>
<para>
The LSM kernel patch provides a general kernel framework to support
security modules. In particular, the LSM framework is primarily
focused on supporting access control modules, although future
development is likely to address other security needs such as
auditing. By itself, the framework does not provide any additional
security; it merely provides the infrastructure to support security
modules. The LSM kernel patch also moves most of the capabilities
logic into an optional security module, with the system defaulting
to the traditional superuser logic. This capabilities module
is discussed further in <xref linkend="cap"/>.
</para>
<para>
The LSM kernel patch adds security fields to kernel data structures
and inserts calls to hook functions at critical points in the kernel
code to manage the security fields and to perform access control. It
also adds functions for registering and unregistering security
modules, and adds a general <function>security</function> system call
to support new system calls for security-aware applications.
</para>
<para>
The LSM security fields are simply <type>void*</type> pointers. For
process and program execution security information, security fields
were added to <structname>struct task_struct</structname> and
<structname>struct linux_binprm</structname>. For filesystem security
information, a security field was added to
<structname>struct super_block</structname>. For pipe, file, and socket
security information, security fields were added to
<structname>struct inode</structname> and
<structname>struct file</structname>. For packet and network device security
information, security fields were added to
<structname>struct sk_buff</structname> and
<structname>struct net_device</structname>. For System V IPC security
information, security fields were added to
<structname>struct kern_ipc_perm</structname> and
<structname>struct msg_msg</structname>; additionally, the definitions
for <structname>struct msg_msg</structname>, <structname>struct
msg_queue</structname>, and <structname>struct
shmid_kernel</structname> were moved to header files
(<filename>include/linux/msg.h</filename> and
<filename>include/linux/shm.h</filename> as appropriate) to allow
the security modules to use these definitions.
</para>
<para>
Each LSM hook is a function pointer in a global table,
security_ops. This table is a
<structname>security_operations</structname> structure as defined by
<filename>include/linux/security.h</filename>. Detailed documentation
for each hook is included in this header file. At present, this
structure consists of a collection of substructures that group related
hooks based on the kernel object (e.g. task, inode, file, sk_buff,
etc) as well as some top-level hook function pointers for system
operations. This structure is likely to be flattened in the future
for performance. The placement of the hook calls in the kernel code
is described by the "called:" lines in the per-hook documentation in
the header file. The hook calls can also be easily found in the
kernel code by looking for the string "security_ops->".
</para>
<para>
Linus mentioned per-process security hooks in his original remarks as a
possible alternative to global security hooks. However, if LSM were
to start from the perspective of per-process hooks, then the base
framework would have to deal with how to handle operations that
involve multiple processes (e.g. kill), since each process might have
its own hook for controlling the operation. This would require a
general mechanism for composing hooks in the base framework.
Additionally, LSM would still need global hooks for operations that
have no process context (e.g. network input operations).
Consequently, LSM provides global security hooks, but a security
module is free to implement per-process hooks (where that makes sense)
by storing a security_ops table in each process' security field and
then invoking these per-process hooks from the global hooks.
The problem of composition is thus deferred to the module.
</para>
<para>
The global security_ops table is initialized to a set of hook
functions provided by a dummy security module that provides
traditional superuser logic. A <function>register_security</function>
function (in <filename>security/security.c</filename>) is provided to
allow a security module to set security_ops to refer to its own hook
functions, and an <function>unregister_security</function> function is
provided to revert security_ops to the dummy module hooks. This
mechanism is used to set the primary security module, which is
responsible for making the final decision for each hook.
</para>
<para>
LSM also provides a simple mechanism for stacking additional security
modules with the primary security module. It defines
<function>register_security</function> and
<function>unregister_security</function> hooks in the
<structname>security_operations</structname> structure and provides
<function>mod_reg_security</function> and
<function>mod_unreg_security</function> functions that invoke these
hooks after performing some sanity checking. A security module can
call these functions in order to stack with other modules. However,
the actual details of how this stacking is handled are deferred to the
module, which can implement these hooks in any way it wishes
(including always returning an error if it does not wish to support
stacking). In this manner, LSM again defers the problem of
composition to the module.
</para>
<para>
Although the LSM hooks are organized into substructures based on
kernel object, all of the hooks can be viewed as falling into two
major categories: hooks that are used to manage the security fields
and hooks that are used to perform access control. Examples of the
first category of hooks include the
<function>alloc_security</function> and
<function>free_security</function> hooks defined for each kernel data
structure that has a security field. These hooks are used to allocate
and free security structures for kernel objects. The first category
of hooks also includes hooks that set information in the security
field after allocation, such as the <function>post_lookup</function>
hook in <structname>struct inode_security_ops</structname>. This hook
is used to set security information for inodes after successful lookup
operations. An example of the second category of hooks is the
<function>permission</function> hook in
<structname>struct inode_security_ops</structname>. This hook checks
permission when accessing an inode.
</para>
</sect1>
<sect1 id="cap"><title>LSM Capabilities Module</title>
<para>
The LSM kernel patch moves most of the existing POSIX.1e capabilities
logic into an optional security module stored in the file
<filename>security/capability.c</filename>. This change allows
users who do not want to use capabilities to omit this code entirely
from their kernel, instead using the dummy module for traditional
superuser logic or any other module that they desire. This change
also allows the developers of the capabilities logic to maintain and
enhance their code more freely, without needing to integrate patches
back into the base kernel.
</para>
<para>
In addition to moving the capabilities logic, the LSM kernel patch
could move the capability-related fields from the kernel data
structures into the new security fields managed by the security
modules. However, at present, the LSM kernel patch leaves the
capability fields in the kernel data structures. In his original
remarks, Linus suggested that this might be preferable so that other
security modules can be easily stacked with the capabilities module
without needing to chain multiple security structures on the security field.
It also avoids imposing extra overhead on the capabilities module
to manage the security fields. However, the LSM framework could
certainly support such a move if it is determined to be desirable,
with only a few additional changes described below.
</para>
<para>
At present, the capabilities logic for computing process capabilities
on <function>execve</function> and <function>set*uid</function>,
checking capabilities for a particular process, saving and checking
capabilities for netlink messages, and handling the
<function>capget</function> and <function>capset</function> system
calls have been moved into the capabilities module. There are still a
few locations in the base kernel where capability-related fields are
directly examined or modified, but the current version of the LSM
patch does allow a security module to completely replace the
assignment and testing of capabilities. These few locations would
need to be changed if the capability-related fields were moved into
the security field. The following is a list of known locations that
still perform such direct examination or modification of
capability-related fields:
<itemizedlist>
<listitem><para><filename>fs/open.c</filename>:<function>sys_access</function></para></listitem>
<listitem><para><filename>fs/lockd/host.c</filename>:<function>nlm_bind_host</function></para></listitem>
<listitem><para><filename>fs/nfsd/auth.c</filename>:<function>nfsd_setuser</function></para></listitem>
<listitem><para><filename>fs/proc/array.c</filename>:<function>task_cap</function></para></listitem>
</itemizedlist>
</para>
</sect1>
</article>

File diff suppressed because it is too large Load Diff

View File

@ -1,111 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="LinuxNetworking">
<bookinfo>
<title>Linux Networking and Network Devices APIs</title>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="netcore">
<title>Linux Networking</title>
<sect1><title>Networking Base Types</title>
!Iinclude/linux/net.h
</sect1>
<sect1><title>Socket Buffer Functions</title>
!Iinclude/linux/skbuff.h
!Iinclude/net/sock.h
!Enet/socket.c
!Enet/core/skbuff.c
!Enet/core/sock.c
!Enet/core/datagram.c
!Enet/core/stream.c
</sect1>
<sect1><title>Socket Filter</title>
!Enet/core/filter.c
</sect1>
<sect1><title>Generic Network Statistics</title>
!Iinclude/uapi/linux/gen_stats.h
!Enet/core/gen_stats.c
!Enet/core/gen_estimator.c
</sect1>
<sect1><title>SUN RPC subsystem</title>
<!-- The !D functionality is not perfect, garbage has to be protected by comments
!Dnet/sunrpc/sunrpc_syms.c
-->
!Enet/sunrpc/xdr.c
!Enet/sunrpc/svc_xprt.c
!Enet/sunrpc/xprt.c
!Enet/sunrpc/sched.c
!Enet/sunrpc/socklib.c
!Enet/sunrpc/stats.c
!Enet/sunrpc/rpc_pipe.c
!Enet/sunrpc/rpcb_clnt.c
!Enet/sunrpc/clnt.c
</sect1>
<sect1><title>WiMAX</title>
!Enet/wimax/op-msg.c
!Enet/wimax/op-reset.c
!Enet/wimax/op-rfkill.c
!Enet/wimax/stack.c
!Iinclude/net/wimax.h
!Iinclude/uapi/linux/wimax.h
</sect1>
</chapter>
<chapter id="netdev">
<title>Network device support</title>
<sect1><title>Driver Support</title>
!Enet/core/dev.c
!Enet/ethernet/eth.c
!Enet/sched/sch_generic.c
!Iinclude/linux/etherdevice.h
!Iinclude/linux/netdevice.h
</sect1>
<sect1><title>PHY Support</title>
!Edrivers/net/phy/phy.c
!Idrivers/net/phy/phy.c
!Edrivers/net/phy/phy_device.c
!Idrivers/net/phy/phy_device.c
!Edrivers/net/phy/mdio_bus.c
!Idrivers/net/phy/mdio_bus.c
</sect1>
<!-- FIXME: Removed for now since no structured comments in source
<sect1><title>Wireless</title>
X!Enet/core/wireless.c
</sect1>
-->
</chapter>
</book>

View File

@ -1,155 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
<!ENTITY rapidio SYSTEM "rapidio.xml">
]>
<book id="RapidIO-Guide">
<bookinfo>
<title>RapidIO Subsystem Guide</title>
<authorgroup>
<author>
<firstname>Matt</firstname>
<surname>Porter</surname>
<affiliation>
<address>
<email>mporter@kernel.crashing.org</email>
<email>mporter@mvista.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2005</year>
<holder>MontaVista Software, Inc.</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
RapidIO is a high speed switched fabric interconnect with
features aimed at the embedded market. RapidIO provides
support for memory-mapped I/O as well as message-based
transactions over the switched fabric network. RapidIO has
a standardized discovery mechanism not unlike the PCI bus
standard that allows simple detection of devices in a
network.
</para>
<para>
This documentation is provided for developers intending
to support RapidIO on new architectures, write new drivers,
or to understand the subsystem internals.
</para>
</chapter>
<chapter id="bugs">
<title>Known Bugs and Limitations</title>
<sect1 id="known_bugs">
<title>Bugs</title>
<para>None. ;)</para>
</sect1>
<sect1 id="Limitations">
<title>Limitations</title>
<para>
<orderedlist>
<listitem><para>Access/management of RapidIO memory regions is not supported</para></listitem>
<listitem><para>Multiple host enumeration is not supported</para></listitem>
</orderedlist>
</para>
</sect1>
</chapter>
<chapter id="drivers">
<title>RapidIO driver interface</title>
<para>
Drivers are provided a set of calls in order
to interface with the subsystem to gather info
on devices, request/map memory region resources,
and manage mailboxes/doorbells.
</para>
<sect1 id="Functions">
<title>Functions</title>
!Iinclude/linux/rio_drv.h
!Edrivers/rapidio/rio-driver.c
!Edrivers/rapidio/rio.c
</sect1>
</chapter>
<chapter id="internals">
<title>Internals</title>
<para>
This chapter contains the autogenerated documentation of the RapidIO
subsystem.
</para>
<sect1 id="Structures"><title>Structures</title>
!Iinclude/linux/rio.h
</sect1>
<sect1 id="Enumeration_and_Discovery"><title>Enumeration and Discovery</title>
!Idrivers/rapidio/rio-scan.c
</sect1>
<sect1 id="Driver_functionality"><title>Driver functionality</title>
!Idrivers/rapidio/rio.c
!Idrivers/rapidio/rio-access.c
</sect1>
<sect1 id="Device_model_support"><title>Device model support</title>
!Idrivers/rapidio/rio-driver.c
</sect1>
<sect1 id="PPC32_support"><title>PPC32 support</title>
!Iarch/powerpc/sysdev/fsl_rio.c
</sect1>
</chapter>
<chapter id="credits">
<title>Credits</title>
<para>
The following people have contributed to the RapidIO
subsystem directly or indirectly:
<orderedlist>
<listitem><para>Matt Porter<email>mporter@kernel.crashing.org</email></para></listitem>
<listitem><para>Randy Vinson<email>rvinson@mvista.com</email></para></listitem>
<listitem><para>Dan Malek<email>dan@embeddedalley.com</email></para></listitem>
</orderedlist>
</para>
<para>
The following people have contributed to this document:
<orderedlist>
<listitem><para>Matt Porter<email>mporter@kernel.crashing.org</email></para></listitem>
</orderedlist>
</para>
</chapter>
</book>

View File

@ -1,161 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="s390drivers">
<bookinfo>
<title>Writing s390 channel device drivers</title>
<authorgroup>
<author>
<firstname>Cornelia</firstname>
<surname>Huck</surname>
<affiliation>
<address>
<email>cornelia.huck@de.ibm.com</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2007</year>
<holder>IBM Corp.</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
This document describes the interfaces available for device drivers that
drive s390 based channel attached I/O devices. This includes interfaces for
interaction with the hardware and interfaces for interacting with the
common driver core. Those interfaces are provided by the s390 common I/O
layer.
</para>
<para>
The document assumes a familarity with the technical terms associated
with the s390 channel I/O architecture. For a description of this
architecture, please refer to the "z/Architecture: Principles of
Operation", IBM publication no. SA22-7832.
</para>
<para>
While most I/O devices on a s390 system are typically driven through the
channel I/O mechanism described here, there are various other methods
(like the diag interface). These are out of the scope of this document.
</para>
<para>
Some additional information can also be found in the kernel source
under Documentation/s390/driver-model.txt.
</para>
</chapter>
<chapter id="ccw">
<title>The ccw bus</title>
<para>
The ccw bus typically contains the majority of devices available to
a s390 system. Named after the channel command word (ccw), the basic
command structure used to address its devices, the ccw bus contains
so-called channel attached devices. They are addressed via I/O
subchannels, visible on the css bus. A device driver for
channel-attached devices, however, will never interact with the
subchannel directly, but only via the I/O device on the ccw bus,
the ccw device.
</para>
<sect1 id="channelIO">
<title>I/O functions for channel-attached devices</title>
<para>
Some hardware structures have been translated into C structures for use
by the common I/O layer and device drivers. For more information on
the hardware structures represented here, please consult the Principles
of Operation.
</para>
!Iarch/s390/include/asm/cio.h
</sect1>
<sect1 id="ccwdev">
<title>ccw devices</title>
<para>
Devices that want to initiate channel I/O need to attach to the ccw bus.
Interaction with the driver core is done via the common I/O layer, which
provides the abstractions of ccw devices and ccw device drivers.
</para>
<para>
The functions that initiate or terminate channel I/O all act upon a
ccw device structure. Device drivers must not bypass those functions
or strange side effects may happen.
</para>
!Iarch/s390/include/asm/ccwdev.h
!Edrivers/s390/cio/device.c
!Edrivers/s390/cio/device_ops.c
</sect1>
<sect1 id="cmf">
<title>The channel-measurement facility</title>
<para>
The channel-measurement facility provides a means to collect
measurement data which is made available by the channel subsystem
for each channel attached device.
</para>
!Iarch/s390/include/asm/cmb.h
!Edrivers/s390/cio/cmf.c
</sect1>
</chapter>
<chapter id="ccwgroup">
<title>The ccwgroup bus</title>
<para>
The ccwgroup bus only contains artificial devices, created by the user.
Many networking devices (e.g. qeth) are in fact composed of several
ccw devices (like read, write and data channel for qeth). The
ccwgroup bus provides a mechanism to create a meta-device which
contains those ccw devices as slave devices and can be associated
with the netdevice.
</para>
<sect1 id="ccwgroupdevices">
<title>ccw group devices</title>
!Iarch/s390/include/asm/ccwgroup.h
!Edrivers/s390/cio/ccwgroup.c
</sect1>
</chapter>
<chapter id="genericinterfaces">
<title>Generic interfaces</title>
<para>
Some interfaces are available to other drivers that do not necessarily
have anything to do with the busses described above, but still are
indirectly using basic infrastructure in the common I/O layer.
One example is the support for adapter interrupts.
</para>
!Edrivers/s390/cio/airq.c
</chapter>
</book>

View File

@ -1,409 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="scsimid">
<bookinfo>
<title>SCSI Interfaces Guide</title>
<authorgroup>
<author>
<firstname>James</firstname>
<surname>Bottomley</surname>
<affiliation>
<address>
<email>James.Bottomley@hansenpartnership.com</email>
</address>
</affiliation>
</author>
<author>
<firstname>Rob</firstname>
<surname>Landley</surname>
<affiliation>
<address>
<email>rob@landley.net</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2007</year>
<holder>Linux Foundation</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<sect1 id="protocol_vs_bus">
<title>Protocol vs bus</title>
<para>
Once upon a time, the Small Computer Systems Interface defined both
a parallel I/O bus and a data protocol to connect a wide variety of
peripherals (disk drives, tape drives, modems, printers, scanners,
optical drives, test equipment, and medical devices) to a host
computer.
</para>
<para>
Although the old parallel (fast/wide/ultra) SCSI bus has largely
fallen out of use, the SCSI command set is more widely used than ever
to communicate with devices over a number of different busses.
</para>
<para>
The <ulink url='http://www.t10.org/scsi-3.htm'>SCSI protocol</ulink>
is a big-endian peer-to-peer packet based protocol. SCSI commands
are 6, 10, 12, or 16 bytes long, often followed by an associated data
payload.
</para>
<para>
SCSI commands can be transported over just about any kind of bus, and
are the default protocol for storage devices attached to USB, SATA,
SAS, Fibre Channel, FireWire, and ATAPI devices. SCSI packets are
also commonly exchanged over Infiniband,
<ulink url='http://i2o.shadowconnect.com/faq.php'>I20</ulink>, TCP/IP
(<ulink url='https://en.wikipedia.org/wiki/ISCSI'>iSCSI</ulink>), even
<ulink url='http://cyberelk.net/tim/parport/parscsi.html'>Parallel
ports</ulink>.
</para>
</sect1>
<sect1 id="subsystem_design">
<title>Design of the Linux SCSI subsystem</title>
<para>
The SCSI subsystem uses a three layer design, with upper, mid, and low
layers. Every operation involving the SCSI subsystem (such as reading
a sector from a disk) uses one driver at each of the 3 levels: one
upper layer driver, one lower layer driver, and the SCSI midlayer.
</para>
<para>
The SCSI upper layer provides the interface between userspace and the
kernel, in the form of block and char device nodes for I/O and
ioctl(). The SCSI lower layer contains drivers for specific hardware
devices.
</para>
<para>
In between is the SCSI mid-layer, analogous to a network routing
layer such as the IPv4 stack. The SCSI mid-layer routes a packet
based data protocol between the upper layer's /dev nodes and the
corresponding devices in the lower layer. It manages command queues,
provides error handling and power management functions, and responds
to ioctl() requests.
</para>
</sect1>
</chapter>
<chapter id="upper_layer">
<title>SCSI upper layer</title>
<para>
The upper layer supports the user-kernel interface by providing
device nodes.
</para>
<sect1 id="sd">
<title>sd (SCSI Disk)</title>
<para>sd (sd_mod.o)</para>
<!-- !Idrivers/scsi/sd.c -->
</sect1>
<sect1 id="sr">
<title>sr (SCSI CD-ROM)</title>
<para>sr (sr_mod.o)</para>
</sect1>
<sect1 id="st">
<title>st (SCSI Tape)</title>
<para>st (st.o)</para>
</sect1>
<sect1 id="sg">
<title>sg (SCSI Generic)</title>
<para>sg (sg.o)</para>
</sect1>
<sect1 id="ch">
<title>ch (SCSI Media Changer)</title>
<para>ch (ch.c)</para>
</sect1>
</chapter>
<chapter id="mid_layer">
<title>SCSI mid layer</title>
<sect1 id="midlayer_implementation">
<title>SCSI midlayer implementation</title>
<sect2 id="scsi_device.h">
<title>include/scsi/scsi_device.h</title>
<para>
</para>
!Iinclude/scsi/scsi_device.h
</sect2>
<sect2 id="scsi.c">
<title>drivers/scsi/scsi.c</title>
<para>Main file for the SCSI midlayer.</para>
!Edrivers/scsi/scsi.c
</sect2>
<sect2 id="scsicam.c">
<title>drivers/scsi/scsicam.c</title>
<para>
<ulink url='http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf'>SCSI
Common Access Method</ulink> support functions, for use with
HDIO_GETGEO, etc.
</para>
!Edrivers/scsi/scsicam.c
</sect2>
<sect2 id="scsi_error.c">
<title>drivers/scsi/scsi_error.c</title>
<para>Common SCSI error/timeout handling routines.</para>
!Edrivers/scsi/scsi_error.c
</sect2>
<sect2 id="scsi_devinfo.c">
<title>drivers/scsi/scsi_devinfo.c</title>
<para>
Manage scsi_dev_info_list, which tracks blacklisted and whitelisted
devices.
</para>
!Idrivers/scsi/scsi_devinfo.c
</sect2>
<sect2 id="scsi_ioctl.c">
<title>drivers/scsi/scsi_ioctl.c</title>
<para>
Handle ioctl() calls for SCSI devices.
</para>
!Edrivers/scsi/scsi_ioctl.c
</sect2>
<sect2 id="scsi_lib.c">
<title>drivers/scsi/scsi_lib.c</title>
<para>
SCSI queuing library.
</para>
!Edrivers/scsi/scsi_lib.c
</sect2>
<sect2 id="scsi_lib_dma.c">
<title>drivers/scsi/scsi_lib_dma.c</title>
<para>
SCSI library functions depending on DMA
(map and unmap scatter-gather lists).
</para>
!Edrivers/scsi/scsi_lib_dma.c
</sect2>
<sect2 id="scsi_module.c">
<title>drivers/scsi/scsi_module.c</title>
<para>
The file drivers/scsi/scsi_module.c contains legacy support for
old-style host templates. It should never be used by any new driver.
</para>
</sect2>
<sect2 id="scsi_proc.c">
<title>drivers/scsi/scsi_proc.c</title>
<para>
The functions in this file provide an interface between
the PROC file system and the SCSI device drivers
It is mainly used for debugging, statistics and to pass
information directly to the lowlevel driver.
I.E. plumbing to manage /proc/scsi/*
</para>
!Idrivers/scsi/scsi_proc.c
</sect2>
<sect2 id="scsi_netlink.c">
<title>drivers/scsi/scsi_netlink.c</title>
<para>
Infrastructure to provide async events from transports to userspace
via netlink, using a single NETLINK_SCSITRANSPORT protocol for all
transports.
See <ulink url='http://marc.info/?l=linux-scsi&amp;m=115507374832500&amp;w=2'>the
original patch submission</ulink> for more details.
</para>
!Idrivers/scsi/scsi_netlink.c
</sect2>
<sect2 id="scsi_scan.c">
<title>drivers/scsi/scsi_scan.c</title>
<para>
Scan a host to determine which (if any) devices are attached.
The general scanning/probing algorithm is as follows, exceptions are
made to it depending on device specific flags, compilation options,
and global variable (boot or module load time) settings.
A specific LUN is scanned via an INQUIRY command; if the LUN has a
device attached, a scsi_device is allocated and setup for it.
For every id of every channel on the given host, start by scanning
LUN 0. Skip hosts that don't respond at all to a scan of LUN 0.
Otherwise, if LUN 0 has a device attached, allocate and setup a
scsi_device for it. If target is SCSI-3 or up, issue a REPORT LUN,
and scan all of the LUNs returned by the REPORT LUN; else,
sequentially scan LUNs up until some maximum is reached, or a LUN is
seen that cannot have a device attached to it.
</para>
!Idrivers/scsi/scsi_scan.c
</sect2>
<sect2 id="scsi_sysctl.c">
<title>drivers/scsi/scsi_sysctl.c</title>
<para>
Set up the sysctl entry: "/dev/scsi/logging_level"
(DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level.
</para>
</sect2>
<sect2 id="scsi_sysfs.c">
<title>drivers/scsi/scsi_sysfs.c</title>
<para>
SCSI sysfs interface routines.
</para>
!Edrivers/scsi/scsi_sysfs.c
</sect2>
<sect2 id="hosts.c">
<title>drivers/scsi/hosts.c</title>
<para>
mid to lowlevel SCSI driver interface
</para>
!Edrivers/scsi/hosts.c
</sect2>
<sect2 id="constants.c">
<title>drivers/scsi/constants.c</title>
<para>
mid to lowlevel SCSI driver interface
</para>
!Edrivers/scsi/constants.c
</sect2>
</sect1>
<sect1 id="Transport_classes">
<title>Transport classes</title>
<para>
Transport classes are service libraries for drivers in the SCSI
lower layer, which expose transport attributes in sysfs.
</para>
<sect2 id="Fibre_Channel_transport">
<title>Fibre Channel transport</title>
<para>
The file drivers/scsi/scsi_transport_fc.c defines transport attributes
for Fibre Channel.
</para>
!Edrivers/scsi/scsi_transport_fc.c
</sect2>
<sect2 id="iSCSI_transport">
<title>iSCSI transport class</title>
<para>
The file drivers/scsi/scsi_transport_iscsi.c defines transport
attributes for the iSCSI class, which sends SCSI packets over TCP/IP
connections.
</para>
!Edrivers/scsi/scsi_transport_iscsi.c
</sect2>
<sect2 id="SAS_transport">
<title>Serial Attached SCSI (SAS) transport class</title>
<para>
The file drivers/scsi/scsi_transport_sas.c defines transport
attributes for Serial Attached SCSI, a variant of SATA aimed at
large high-end systems.
</para>
<para>
The SAS transport class contains common code to deal with SAS HBAs,
an aproximated representation of SAS topologies in the driver model,
and various sysfs attributes to expose these topologies and management
interfaces to userspace.
</para>
<para>
In addition to the basic SCSI core objects this transport class
introduces two additional intermediate objects: The SAS PHY
as represented by struct sas_phy defines an "outgoing" PHY on
a SAS HBA or Expander, and the SAS remote PHY represented by
struct sas_rphy defines an "incoming" PHY on a SAS Expander or
end device. Note that this is purely a software concept, the
underlying hardware for a PHY and a remote PHY is the exactly
the same.
</para>
<para>
There is no concept of a SAS port in this code, users can see
what PHYs form a wide port based on the port_identifier attribute,
which is the same for all PHYs in a port.
</para>
!Edrivers/scsi/scsi_transport_sas.c
</sect2>
<sect2 id="SATA_transport">
<title>SATA transport class</title>
<para>
The SATA transport is handled by libata, which has its own book of
documentation in this directory.
</para>
</sect2>
<sect2 id="SPI_transport">
<title>Parallel SCSI (SPI) transport class</title>
<para>
The file drivers/scsi/scsi_transport_spi.c defines transport
attributes for traditional (fast/wide/ultra) SCSI busses.
</para>
!Edrivers/scsi/scsi_transport_spi.c
</sect2>
<sect2 id="SRP_transport">
<title>SCSI RDMA (SRP) transport class</title>
<para>
The file drivers/scsi/scsi_transport_srp.c defines transport
attributes for SCSI over Remote Direct Memory Access.
</para>
!Edrivers/scsi/scsi_transport_srp.c
</sect2>
</sect1>
</chapter>
<chapter id="lower_layer">
<title>SCSI lower layer</title>
<sect1 id="hba_drivers">
<title>Host Bus Adapter transport types</title>
<para>
Many modern device controllers use the SCSI command set as a protocol to
communicate with their devices through many different types of physical
connections.
</para>
<para>
In SCSI language a bus capable of carrying SCSI commands is
called a "transport", and a controller connecting to such a bus is
called a "host bus adapter" (HBA).
</para>
<sect2 id="scsi_debug.c">
<title>Debug transport</title>
<para>
The file drivers/scsi/scsi_debug.c simulates a host adapter with a
variable number of disks (or disk like devices) attached, sharing a
common amount of RAM. Does a lot of checking to make sure that we are
not getting blocks mixed up, and panics the kernel if anything out of
the ordinary is seen.
</para>
<para>
To be more realistic, the simulated devices have the transport
attributes of SAS disks.
</para>
<para>
For documentation see
<ulink url='http://sg.danny.cz/sg/sdebug26.html'>http://sg.danny.cz/sg/sdebug26.html</ulink>
</para>
<!-- !Edrivers/scsi/scsi_debug.c -->
</sect2>
<sect2 id="todo">
<title>todo</title>
<para>Parallel (fast/wide/ultra) SCSI, USB, SATA,
SAS, Fibre Channel, FireWire, ATAPI devices, Infiniband,
I20, iSCSI, Parallel ports, netlink...
</para>
</sect2>
</sect1>
</chapter>
</book>

View File

@ -1,105 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="sh-drivers">
<bookinfo>
<title>SuperH Interfaces Guide</title>
<authorgroup>
<author>
<firstname>Paul</firstname>
<surname>Mundt</surname>
<affiliation>
<address>
<email>lethal@linux-sh.org</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2008-2010</year>
<holder>Paul Mundt</holder>
</copyright>
<copyright>
<year>2008-2010</year>
<holder>Renesas Technology Corp.</holder>
</copyright>
<copyright>
<year>2010</year>
<holder>Renesas Electronics Corp.</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2 as published by the Free Software Foundation.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="mm">
<title>Memory Management</title>
<sect1 id="sh4">
<title>SH-4</title>
<sect2 id="sq">
<title>Store Queue API</title>
!Earch/sh/kernel/cpu/sh4/sq.c
</sect2>
</sect1>
<sect1 id="sh5">
<title>SH-5</title>
<sect2 id="tlb">
<title>TLB Interfaces</title>
!Iarch/sh/mm/tlb-sh5.c
!Iarch/sh/include/asm/tlb_64.h
</sect2>
</sect1>
</chapter>
<chapter id="mach">
<title>Machine Specific Interfaces</title>
<sect1 id="dreamcast">
<title>mach-dreamcast</title>
!Iarch/sh/boards/mach-dreamcast/rtc.c
</sect1>
<sect1 id="x3proto">
<title>mach-x3proto</title>
!Earch/sh/boards/mach-x3proto/ilsel.c
</sect1>
</chapter>
<chapter id="busses">
<title>Busses</title>
<sect1 id="superhyway">
<title>SuperHyway</title>
!Edrivers/sh/superhyway/superhyway.c
</sect1>
<sect1 id="maple">
<title>Maple</title>
!Edrivers/sh/maple/maple.c
</sect1>
</chapter>
</book>

View File

@ -1,11 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">
<param name="chunk.quietly">1</param>
<param name="funcsynopsis.style">ansi</param>
<param name="funcsynopsis.tabular.threshold">80</param>
<param name="callout.graphics">0</param>
<!-- <param name="paper.type">A4</param> -->
<param name="generate.consistent.ids">1</param>
<param name="generate.section.toc.level">2</param>
<param name="use.id.as.filename">1</param>
</stylesheet>

View File

@ -1,101 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="w1id">
<bookinfo>
<title>W1: Dallas' 1-wire bus</title>
<authorgroup>
<author>
<firstname>David</firstname>
<surname>Fries</surname>
<affiliation>
<address>
<email>David@Fries.net</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2013</year>
<!--
<holder></holder>
-->
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License version 2.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="w1_internal">
<title>W1 API internal to the kernel</title>
<sect1 id="w1_internal_api">
<title>W1 API internal to the kernel</title>
<sect2 id="w1.h">
<title>include/linux/w1.h</title>
<para>W1 kernel API functions.</para>
!Iinclude/linux/w1.h
</sect2>
<sect2 id="w1.c">
<title>drivers/w1/w1.c</title>
<para>W1 core functions.</para>
!Idrivers/w1/w1.c
</sect2>
<sect2 id="w1_family.c">
<title>drivers/w1/w1_family.c</title>
<para>Allows registering device family operations.</para>
!Edrivers/w1/w1_family.c
</sect2>
<sect2 id="w1_internal.h">
<title>drivers/w1/w1_internal.h</title>
<para>W1 internal initialization for master devices.</para>
!Idrivers/w1/w1_internal.h
</sect2>
<sect2 id="w1_int.c">
<title>drivers/w1/w1_int.c</title>
<para>W1 internal initialization for master devices.</para>
!Edrivers/w1/w1_int.c
</sect2>
<sect2 id="w1_netlink.h">
<title>drivers/w1/w1_netlink.h</title>
<para>W1 external netlink API structures and commands.</para>
!Idrivers/w1/w1_netlink.h
</sect2>
<sect2 id="w1_io.c">
<title>drivers/w1/w1_io.c</title>
<para>W1 input/output.</para>
!Edrivers/w1/w1_io.c
!Idrivers/w1/w1_io.c
</sect2>
</sect1>
</chapter>
</book>

View File

@ -1,371 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
<book id="Z85230Guide">
<bookinfo>
<title>Z8530 Programming Guide</title>
<authorgroup>
<author>
<firstname>Alan</firstname>
<surname>Cox</surname>
<affiliation>
<address>
<email>alan@lxorguk.ukuu.org.uk</email>
</address>
</affiliation>
</author>
</authorgroup>
<copyright>
<year>2000</year>
<holder>Alan Cox</holder>
</copyright>
<legalnotice>
<para>
This documentation is free software; you can redistribute
it and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later
version.
</para>
<para>
This program is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
</para>
<para>
For more details see the file COPYING in the source
distribution of Linux.
</para>
</legalnotice>
</bookinfo>
<toc></toc>
<chapter id="intro">
<title>Introduction</title>
<para>
The Z85x30 family synchronous/asynchronous controller chips are
used on a large number of cheap network interface cards. The
kernel provides a core interface layer that is designed to make
it easy to provide WAN services using this chip.
</para>
<para>
The current driver only support synchronous operation. Merging the
asynchronous driver support into this code to allow any Z85x30
device to be used as both a tty interface and as a synchronous
controller is a project for Linux post the 2.4 release
</para>
</chapter>
<chapter id="Driver_Modes">
<title>Driver Modes</title>
<para>
The Z85230 driver layer can drive Z8530, Z85C30 and Z85230 devices
in three different modes. Each mode can be applied to an individual
channel on the chip (each chip has two channels).
</para>
<para>
The PIO synchronous mode supports the most common Z8530 wiring. Here
the chip is interface to the I/O and interrupt facilities of the
host machine but not to the DMA subsystem. When running PIO the
Z8530 has extremely tight timing requirements. Doing high speeds,
even with a Z85230 will be tricky. Typically you should expect to
achieve at best 9600 baud with a Z8C530 and 64Kbits with a Z85230.
</para>
<para>
The DMA mode supports the chip when it is configured to use dual DMA
channels on an ISA bus. The better cards tend to support this mode
of operation for a single channel. With DMA running the Z85230 tops
out when it starts to hit ISA DMA constraints at about 512Kbits. It
is worth noting here that many PC machines hang or crash when the
chip is driven fast enough to hold the ISA bus solid.
</para>
<para>
Transmit DMA mode uses a single DMA channel. The DMA channel is used
for transmission as the transmit FIFO is smaller than the receive
FIFO. it gives better performance than pure PIO mode but is nowhere
near as ideal as pure DMA mode.
</para>
</chapter>
<chapter id="Using_the_Z85230_driver">
<title>Using the Z85230 driver</title>
<para>
The Z85230 driver provides the back end interface to your board. To
configure a Z8530 interface you need to detect the board and to
identify its ports and interrupt resources. It is also your problem
to verify the resources are available.
</para>
<para>
Having identified the chip you need to fill in a struct z8530_dev,
which describes each chip. This object must exist until you finally
shutdown the board. Firstly zero the active field. This ensures
nothing goes off without you intending it. The irq field should
be set to the interrupt number of the chip. (Each chip has a single
interrupt source rather than each channel). You are responsible
for allocating the interrupt line. The interrupt handler should be
set to <function>z8530_interrupt</function>. The device id should
be set to the z8530_dev structure pointer. Whether the interrupt can
be shared or not is board dependent, and up to you to initialise.
</para>
<para>
The structure holds two channel structures.
Initialise chanA.ctrlio and chanA.dataio with the address of the
control and data ports. You can or this with Z8530_PORT_SLEEP to
indicate your interface needs the 5uS delay for chip settling done
in software. The PORT_SLEEP option is architecture specific. Other
flags may become available on future platforms, eg for MMIO.
Initialise the chanA.irqs to &amp;z8530_nop to start the chip up
as disabled and discarding interrupt events. This ensures that
stray interrupts will be mopped up and not hang the bus. Set
chanA.dev to point to the device structure itself. The
private and name field you may use as you wish. The private field
is unused by the Z85230 layer. The name is used for error reporting
and it may thus make sense to make it match the network name.
</para>
<para>
Repeat the same operation with the B channel if your chip has
both channels wired to something useful. This isn't always the
case. If it is not wired then the I/O values do not matter, but
you must initialise chanB.dev.
</para>
<para>
If your board has DMA facilities then initialise the txdma and
rxdma fields for the relevant channels. You must also allocate the
ISA DMA channels and do any necessary board level initialisation
to configure them. The low level driver will do the Z8530 and
DMA controller programming but not board specific magic.
</para>
<para>
Having initialised the device you can then call
<function>z8530_init</function>. This will probe the chip and
reset it into a known state. An identification sequence is then
run to identify the chip type. If the checks fail to pass the
function returns a non zero error code. Typically this indicates
that the port given is not valid. After this call the
type field of the z8530_dev structure is initialised to either
Z8530, Z85C30 or Z85230 according to the chip found.
</para>
<para>
Once you have called z8530_init you can also make use of the utility
function <function>z8530_describe</function>. This provides a
consistent reporting format for the Z8530 devices, and allows all
the drivers to provide consistent reporting.
</para>
</chapter>
<chapter id="Attaching_Network_Interfaces">
<title>Attaching Network Interfaces</title>
<para>
If you wish to use the network interface facilities of the driver,
then you need to attach a network device to each channel that is
present and in use. In addition to use the generic HDLC
you need to follow some additional plumbing rules. They may seem
complex but a look at the example hostess_sv11 driver should
reassure you.
</para>
<para>
The network device used for each channel should be pointed to by
the netdevice field of each channel. The hdlc-&gt; priv field of the
network device points to your private data - you will need to be
able to find your private data from this.
</para>
<para>
The way most drivers approach this particular problem is to
create a structure holding the Z8530 device definition and
put that into the private field of the network device. The
network device fields of the channels then point back to the
network devices.
</para>
<para>
If you wish to use the generic HDLC then you need to register
the HDLC device.
</para>
<para>
Before you register your network device you will also need to
provide suitable handlers for most of the network device callbacks.
See the network device documentation for more details on this.
</para>
</chapter>
<chapter id="Configuring_And_Activating_The_Port">
<title>Configuring And Activating The Port</title>
<para>
The Z85230 driver provides helper functions and tables to load the
port registers on the Z8530 chips. When programming the register
settings for a channel be aware that the documentation recommends
initialisation orders. Strange things happen when these are not
followed.
</para>
<para>
<function>z8530_channel_load</function> takes an array of
pairs of initialisation values in an array of u8 type. The first
value is the Z8530 register number. Add 16 to indicate the alternate
register bank on the later chips. The array is terminated by a 255.
</para>
<para>
The driver provides a pair of public tables. The
z8530_hdlc_kilostream table is for the UK 'Kilostream' service and
also happens to cover most other end host configurations. The
z8530_hdlc_kilostream_85230 table is the same configuration using
the enhancements of the 85230 chip. The configuration loaded is
standard NRZ encoded synchronous data with HDLC bitstuffing. All
of the timing is taken from the other end of the link.
</para>
<para>
When writing your own tables be aware that the driver internally
tracks register values. It may need to reload values. You should
therefore be sure to set registers 1-7, 9-11, 14 and 15 in all
configurations. Where the register settings depend on DMA selection
the driver will update the bits itself when you open or close.
Loading a new table with the interface open is not recommended.
</para>
<para>
There are three standard configurations supported by the core
code. In PIO mode the interface is programmed up to use
interrupt driven PIO. This places high demands on the host processor
to avoid latency. The driver is written to take account of latency
issues but it cannot avoid latencies caused by other drivers,
notably IDE in PIO mode. Because the drivers allocate buffers you
must also prevent MTU changes while the port is open.
</para>
<para>
Once the port is open it will call the rx_function of each channel
whenever a completed packet arrived. This is invoked from
interrupt context and passes you the channel and a network
buffer (struct sk_buff) holding the data. The data includes
the CRC bytes so most users will want to trim the last two
bytes before processing the data. This function is very timing
critical. When you wish to simply discard data the support
code provides the function <function>z8530_null_rx</function>
to discard the data.
</para>
<para>
To active PIO mode sending and receiving the <function>
z8530_sync_open</function> is called. This expects to be passed
the network device and the channel. Typically this is called from
your network device open callback. On a failure a non zero error
status is returned. The <function>z8530_sync_close</function>
function shuts down a PIO channel. This must be done before the
channel is opened again and before the driver shuts down
and unloads.
</para>
<para>
The ideal mode of operation is dual channel DMA mode. Here the
kernel driver will configure the board for DMA in both directions.
The driver also handles ISA DMA issues such as controller
programming and the memory range limit for you. This mode is
activated by calling the <function>z8530_sync_dma_open</function>
function. On failure a non zero error value is returned.
Once this mode is activated it can be shut down by calling the
<function>z8530_sync_dma_close</function>. You must call the close
function matching the open mode you used.
</para>
<para>
The final supported mode uses a single DMA channel to drive the
transmit side. As the Z85C30 has a larger FIFO on the receive
channel this tends to increase the maximum speed a little.
This is activated by calling the <function>z8530_sync_txdma_open
</function>. This returns a non zero error code on failure. The
<function>z8530_sync_txdma_close</function> function closes down
the Z8530 interface from this mode.
</para>
</chapter>
<chapter id="Network_Layer_Functions">
<title>Network Layer Functions</title>
<para>
The Z8530 layer provides functions to queue packets for
transmission. The driver internally buffers the frame currently
being transmitted and one further frame (in order to keep back
to back transmission running). Any further buffering is up to
the caller.
</para>
<para>
The function <function>z8530_queue_xmit</function> takes a network
buffer in sk_buff format and queues it for transmission. The
caller must provide the entire packet with the exception of the
bitstuffing and CRC. This is normally done by the caller via
the generic HDLC interface layer. It returns 0 if the buffer has been
queued and non zero values for queue full. If the function accepts
the buffer it becomes property of the Z8530 layer and the caller
should not free it.
</para>
<para>
The function <function>z8530_get_stats</function> returns a pointer
to an internally maintained per interface statistics block. This
provides most of the interface code needed to implement the network
layer get_stats callback.
</para>
</chapter>
<chapter id="Porting_The_Z8530_Driver">
<title>Porting The Z8530 Driver</title>
<para>
The Z8530 driver is written to be portable. In DMA mode it makes
assumptions about the use of ISA DMA. These are probably warranted
in most cases as the Z85230 in particular was designed to glue to PC
type machines. The PIO mode makes no real assumptions.
</para>
<para>
Should you need to retarget the Z8530 driver to another architecture
the only code that should need changing are the port I/O functions.
At the moment these assume PC I/O port accesses. This may not be
appropriate for all platforms. Replacing
<function>z8530_read_port</function> and <function>z8530_write_port
</function> is intended to be all that is required to port this
driver layer.
</para>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
<para>
<variablelist>
<varlistentry><term>Interrupt Locking</term>
<listitem>
<para>
The locking in the driver is done via the global cli/sti lock. This
makes for relatively poor SMP performance. Switching this to use a
per device spin lock would probably materially improve performance.
</para>
</listitem></varlistentry>
<varlistentry><term>Occasional Failures</term>
<listitem>
<para>
We have reports of occasional failures when run for very long
periods of time and the driver starts to receive junk frames. At
the moment the cause of this is not clear.
</para>
</listitem></varlistentry>
</variablelist>
</para>
</chapter>
<chapter id="pubfunctions">
<title>Public Functions Provided</title>
!Edrivers/net/wan/z85230.c
</chapter>
<chapter id="intfunctions">
<title>Internal Functions</title>
!Idrivers/net/wan/z85230.c
</chapter>
</book>

View File

@ -1 +1,126 @@
# -*- makefile -*-
# Makefile for Sphinx documentation
#
subdir-y :=
# You can set these variables from the command line.
SPHINXBUILD = sphinx-build
SPHINXOPTS =
SPHINXDIRS = .
_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py))
SPHINX_CONF = conf.py
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode
# User-friendly check for sphinx-build
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
ifeq ($(HAVE_SPHINX),0)
.DEFAULT:
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
@echo " SKIP Sphinx $@ target."
else # HAVE_SPHINX
# User-friendly check for pdflatex
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
KERNELDOC = $(srctree)/scripts/kernel-doc
KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# commands; the 'cmd' from scripts/Kbuild.include is not *loopable*
loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
# $2 sphinx builder e.g. "html"
# $3 name of the build subfolder / e.g. "media", used as:
# * dest folder relative to $(BUILDDIR) and
# * cache folder relative to $(BUILDDIR)/.doctrees
# $4 dest subfolder e.g. "man" for man pages at media/man
# $5 reST source folder relative to $(srctree)/$(src),
# e.g. "media" for the linux-tv book-set at ./Documentation/media
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \
PYTHONDONTWRITEBYTECODE=1 \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
$(SPHINXBUILD) \
-b $2 \
-c $(abspath $(srctree)/$(src)) \
-d $(abspath $(BUILDDIR)/.doctrees/$3) \
-D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
$(ALLSPHINXOPTS) \
$(abspath $(srctree)/$(src)/$5) \
$(abspath $(BUILDDIR)/$3/$4)
htmldocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
linkcheckdocs:
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
latexdocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
ifeq ($(HAVE_PDFLATEX),0)
pdfdocs:
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
pdfdocs: latexdocs
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
endif # HAVE_PDFLATEX
epubdocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
xmldocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
endif # HAVE_SPHINX
# The following targets are independent of HAVE_SPHINX, and the rules should
# work or silently pass without Sphinx.
# no-ops for the Sphinx toolchain
sgmldocs:
@:
psdocs:
@:
mandocs:
@:
installmandocs:
@:
cleandocs:
$(Q)rm -rf $(BUILDDIR)
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media clean
dochelp:
@echo ' Linux kernel internal documentation in different formats from ReST:'
@echo ' htmldocs - HTML'
@echo ' latexdocs - LaTeX'
@echo ' pdfdocs - PDF'
@echo ' epubdocs - EPUB'
@echo ' xmldocs - XML'
@echo ' linkcheckdocs - check for broken external links (will connect to external hosts)'
@echo ' cleandocs - clean all generated files'
@echo
@echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'
@echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)'
@echo
@echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
@echo ' configuration. This is e.g. useful to build with nit-picking config.'

View File

@ -1,130 +0,0 @@
# -*- makefile -*-
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXBUILD = sphinx-build
SPHINXOPTS =
SPHINXDIRS = .
_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py))
SPHINX_CONF = conf.py
PAPER =
BUILDDIR = $(obj)/output
PDFLATEX = xelatex
LATEXOPTS = -interaction=batchmode
# User-friendly check for sphinx-build
HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
ifeq ($(HAVE_SPHINX),0)
.DEFAULT:
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
@echo " SKIP Sphinx $@ target."
else ifneq ($(DOCBOOKS),)
# Skip Sphinx build if the user explicitly requested DOCBOOKS.
.DEFAULT:
@echo " SKIP Sphinx $@ target (DOCBOOKS specified)."
else # HAVE_SPHINX
# User-friendly check for pdflatex
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
KERNELDOC = $(srctree)/scripts/kernel-doc
KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# commands; the 'cmd' from scripts/Kbuild.include is not *loopable*
loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
# $2 sphinx builder e.g. "html"
# $3 name of the build subfolder / e.g. "media", used as:
# * dest folder relative to $(BUILDDIR) and
# * cache folder relative to $(BUILDDIR)/.doctrees
# $4 dest subfolder e.g. "man" for man pages at media/man
# $5 reST source folder relative to $(srctree)/$(src),
# e.g. "media" for the linux-tv book-set at ./Documentation/media
quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \
PYTHONDONTWRITEBYTECODE=1 \
BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
$(SPHINXBUILD) \
-b $2 \
-c $(abspath $(srctree)/$(src)) \
-d $(abspath $(BUILDDIR)/.doctrees/$3) \
-D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
$(ALLSPHINXOPTS) \
$(abspath $(srctree)/$(src)/$5) \
$(abspath $(BUILDDIR)/$3/$4)
htmldocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
linkcheckdocs:
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
latexdocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
ifeq ($(HAVE_PDFLATEX),0)
pdfdocs:
$(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
@echo " SKIP Sphinx $@ target."
else # HAVE_PDFLATEX
pdfdocs: latexdocs
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
endif # HAVE_PDFLATEX
epubdocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
xmldocs:
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
endif # HAVE_SPHINX
# The following targets are independent of HAVE_SPHINX, and the rules should
# work or silently pass without Sphinx.
# no-ops for the Sphinx toolchain
sgmldocs:
@:
psdocs:
@:
mandocs:
@:
installmandocs:
@:
cleandocs:
$(Q)rm -rf $(BUILDDIR)
$(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media clean
dochelp:
@echo ' Linux kernel internal documentation in different formats (Sphinx):'
@echo ' htmldocs - HTML'
@echo ' latexdocs - LaTeX'
@echo ' pdfdocs - PDF'
@echo ' epubdocs - EPUB'
@echo ' xmldocs - XML'
@echo ' linkcheckdocs - check for broken external links (will connect to external hosts)'
@echo ' cleandocs - clean all generated files'
@echo
@echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'
@echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)'
@echo
@echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
@echo ' configuration. This is e.g. useful to build with nit-picking config.'

View File

@ -186,7 +186,7 @@ must disable interrupts while the lock is held. If the device sends
a different interrupt, the driver will deadlock trying to recursively
acquire the spinlock. Such deadlocks can be avoided by using
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
and acquire the lock (see Documentation/DocBook/kernel-locking).
and acquire the lock (see Documentation/kernel-hacking/locking.rst).
4.5 How to tell whether MSI/MSI-X is enabled on a device

View File

@ -1,3 +1,7 @@
=======
LoadPin
=======
LoadPin is a Linux Security Module that ensures all kernel-loaded files
(modules, firmware, etc) all originate from the same filesystem, with
the expectation that such a filesystem is backed by a read-only device
@ -5,13 +9,13 @@ such as dm-verity or CDROM. This allows systems that have a verified
and/or unchangeable filesystem to enforce module and firmware loading
restrictions without needing to sign the files individually.
The LSM is selectable at build-time with CONFIG_SECURITY_LOADPIN, and
The LSM is selectable at build-time with ``CONFIG_SECURITY_LOADPIN``, and
can be controlled at boot-time with the kernel command line option
"loadpin.enabled". By default, it is enabled, but can be disabled at
boot ("loadpin.enabled=0").
"``loadpin.enabled``". By default, it is enabled, but can be disabled at
boot ("``loadpin.enabled=0``").
LoadPin starts pinning when it sees the first file loaded. If the
block device backing the filesystem is not read-only, a sysctl is
created to toggle pinning: /proc/sys/kernel/loadpin/enabled. (Having
created to toggle pinning: ``/proc/sys/kernel/loadpin/enabled``. (Having
a mutable filesystem means pinning is mutable too, but having the
sysctl allows for easy testing on systems with a mutable filesystem.)

View File

@ -1,27 +1,33 @@
=======
SELinux
=======
If you want to use SELinux, chances are you will want
to use the distro-provided policies, or install the
latest reference policy release from
http://oss.tresys.com/projects/refpolicy
However, if you want to install a dummy policy for
testing, you can do using 'mdp' provided under
testing, you can do using ``mdp`` provided under
scripts/selinux. Note that this requires the selinux
userspace to be installed - in particular you will
need checkpolicy to compile a kernel, and setfiles and
fixfiles to label the filesystem.
1. Compile the kernel with selinux enabled.
2. Type 'make' to compile mdp.
2. Type ``make`` to compile ``mdp``.
3. Make sure that you are not running with
SELinux enabled and a real policy. If
you are, reboot with selinux disabled
before continuing.
4. Run install_policy.sh:
4. Run install_policy.sh::
cd scripts/selinux
sh install_policy.sh
Step 4 will create a new dummy policy valid for your
kernel, with a single selinux user, role, and type.
It will compile the policy, will set your SELINUXTYPE to
dummy in /etc/selinux/config, install the compiled policy
as 'dummy', and relabel your filesystem.
It will compile the policy, will set your ``SELINUXTYPE`` to
``dummy`` in ``/etc/selinux/config``, install the compiled policy
as ``dummy``, and relabel your filesystem.

View File

@ -1,3 +1,6 @@
=====
Smack
=====
"Good for you, you've decided to clean the elevator!"
@ -14,6 +17,7 @@ available to determine which is best suited to the problem
at hand.
Smack consists of three major components:
- The kernel
- Basic utilities, which are helpful but not required
- Configuration data
@ -39,16 +43,24 @@ The current git repository for Smack user space is:
This should make and install on most modern distributions.
There are five commands included in smackutil:
chsmack - display or set Smack extended attribute values
smackctl - load the Smack access rules
smackaccess - report if a process with one label has access
chsmack:
display or set Smack extended attribute values
smackctl:
load the Smack access rules
smackaccess:
report if a process with one label has access
to an object with another
These two commands are obsolete with the introduction of
the smackfs/load2 and smackfs/cipso2 interfaces.
smackload - properly formats data for writing to smackfs/load
smackcipso - properly formats data for writing to smackfs/cipso
smackload:
properly formats data for writing to smackfs/load
smackcipso:
properly formats data for writing to smackfs/cipso
In keeping with the intent of Smack, configuration data is
minimal and not strictly required. The most important
@ -56,15 +68,15 @@ configuration step is mounting the smackfs pseudo filesystem.
If smackutil is installed the startup script will take care
of this, but it can be manually as well.
Add this line to /etc/fstab:
Add this line to ``/etc/fstab``::
smackfs /sys/fs/smackfs smackfs defaults 0 0
The /sys/fs/smackfs directory is created by the kernel.
The ``/sys/fs/smackfs`` directory is created by the kernel.
Smack uses extended attributes (xattrs) to store labels on filesystem
objects. The attributes are stored in the extended attribute security
name space. A process must have CAP_MAC_ADMIN to change any of these
name space. A process must have ``CAP_MAC_ADMIN`` to change any of these
attributes.
The extended attributes that Smack uses are:
@ -73,14 +85,17 @@ SMACK64
Used to make access control decisions. In almost all cases
the label given to a new filesystem object will be the label
of the process that created it.
SMACK64EXEC
The Smack label of a process that execs a program file with
this attribute set will run with this attribute's value.
SMACK64MMAP
Don't allow the file to be mmapped by a process whose Smack
label does not allow all of the access permitted to a process
with the label contained in this attribute. This is a very
specific use case for shared libraries.
SMACK64TRANSMUTE
Can only have the value "TRUE". If this attribute is present
on a directory when an object is created in the directory and
@ -89,27 +104,29 @@ SMACK64TRANSMUTE
gets the label of the directory instead of the label of the
creating process. If the object being created is a directory
the SMACK64TRANSMUTE attribute is set as well.
SMACK64IPIN
This attribute is only available on file descriptors for sockets.
Use the Smack label in this attribute for access control
decisions on packets being delivered to this socket.
SMACK64IPOUT
This attribute is only available on file descriptors for sockets.
Use the Smack label in this attribute for access control
decisions on packets coming from this socket.
There are multiple ways to set a Smack label on a file:
There are multiple ways to set a Smack label on a file::
# attr -S -s SMACK64 -V "value" path
# chsmack -a value path
A process can see the Smack label it is running with by
reading /proc/self/attr/current. A process with CAP_MAC_ADMIN
reading ``/proc/self/attr/current``. A process with ``CAP_MAC_ADMIN``
can set the process Smack by writing there.
Most Smack configuration is accomplished by writing to files
in the smackfs filesystem. This pseudo-filesystem is mounted
on /sys/fs/smackfs.
on ``/sys/fs/smackfs``.
access
Provided for backward compatibility. The access2 interface
@ -120,6 +137,7 @@ access
this file. The next read will indicate whether the access
would be permitted. The text will be either "1" indicating
access, or "0" indicating denial.
access2
This interface reports whether a subject with the specified
Smack label has a particular access to an object with a
@ -127,13 +145,17 @@ access2
this file. The next read will indicate whether the access
would be permitted. The text will be either "1" indicating
access, or "0" indicating denial.
ambient
This contains the Smack label applied to unlabeled network
packets.
change-rule
This interface allows modification of existing access control rules.
The format accepted on write is:
The format accepted on write is::
"%s %s %s %s"
where the first string is the subject label, the second the
object label, the third the access to allow and the fourth the
access to deny. The access strings may contain only the characters
@ -141,47 +163,63 @@ change-rule
modified by enabling the permissions in the third string and disabling
those in the fourth string. If there is no such rule it will be
created using the access specified in the third and the fourth strings.
cipso
Provided for backward compatibility. The cipso2 interface
is preferred and should be used instead.
This interface allows a specific CIPSO header to be assigned
to a Smack label. The format accepted on write is:
to a Smack label. The format accepted on write is::
"%24s%4d%4d"["%4d"]...
The first string is a fixed Smack label. The first number is
the level to use. The second number is the number of categories.
The following numbers are the categories.
The following numbers are the categories::
"level-3-cats-5-19 3 2 5 19"
cipso2
This interface allows a specific CIPSO header to be assigned
to a Smack label. The format accepted on write is:
to a Smack label. The format accepted on write is::
"%s%4d%4d"["%4d"]...
The first string is a long Smack label. The first number is
the level to use. The second number is the number of categories.
The following numbers are the categories.
The following numbers are the categories::
"level-3-cats-5-19 3 2 5 19"
direct
This contains the CIPSO level used for Smack direct label
representation in network packets.
doi
This contains the CIPSO domain of interpretation used in
network packets.
ipv6host
This interface allows specific IPv6 internet addresses to be
treated as single label hosts. Packets are sent to single
label hosts only from processes that have Smack write access
to the host label. All packets received from single label hosts
are given the specified label. The format accepted on write is:
are given the specified label. The format accepted on write is::
"%h:%h:%h:%h:%h:%h:%h:%h label" or
"%h:%h:%h:%h:%h:%h:%h:%h/%d label".
The "::" address shortcut is not supported.
If label is "-DELETE" a matched entry will be deleted.
load
Provided for backward compatibility. The load2 interface
is preferred and should be used instead.
This interface allows access control rules in addition to
the system defined rules to be specified. The format accepted
on write is:
on write is::
"%24s%24s%5s"
where the first string is the subject label, the second the
object label, and the third the requested access. The access
string may contain only the characters "rwxat-", and specifies
@ -189,17 +227,21 @@ load
permissions that are not allowed. The string "r-x--" would
specify read and execute access. Labels are limited to 23
characters in length.
load2
This interface allows access control rules in addition to
the system defined rules to be specified. The format accepted
on write is:
on write is::
"%s %s %s"
where the first string is the subject label, the second the
object label, and the third the requested access. The access
string may contain only the characters "rwxat-", and specifies
which sort of access is allowed. The "-" is a placeholder for
permissions that are not allowed. The string "r-x--" would
specify read and execute access.
load-self
Provided for backward compatibility. The load-self2 interface
is preferred and should be used instead.
@ -208,66 +250,83 @@ load-self
otherwise be permitted, and are intended to provide additional
restrictions on the process. The format is the same as for
the load interface.
load-self2
This interface allows process specific access rules to be
defined. These rules are only consulted if access would
otherwise be permitted, and are intended to provide additional
restrictions on the process. The format is the same as for
the load2 interface.
logging
This contains the Smack logging state.
mapped
This contains the CIPSO level used for Smack mapped label
representation in network packets.
netlabel
This interface allows specific internet addresses to be
treated as single label hosts. Packets are sent to single
label hosts without CIPSO headers, but only from processes
that have Smack write access to the host label. All packets
received from single label hosts are given the specified
label. The format accepted on write is:
label. The format accepted on write is::
"%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
If the label specified is "-CIPSO" the address is treated
as a host that supports CIPSO headers.
onlycap
This contains labels processes must have for CAP_MAC_ADMIN
and CAP_MAC_OVERRIDE to be effective. If this file is empty
and ``CAP_MAC_OVERRIDE`` to be effective. If this file is empty
these capabilities are effective at for processes with any
label. The values are set by writing the desired labels, separated
by spaces, to the file or cleared by writing "-" to the file.
ptrace
This is used to define the current ptrace policy
0 - default: this is the policy that relies on Smack access rules.
For the PTRACE_READ a subject needs to have a read access on
object. For the PTRACE_ATTACH a read-write access is required.
1 - exact: this is the policy that limits PTRACE_ATTACH. Attach is
0 - default:
this is the policy that relies on Smack access rules.
For the ``PTRACE_READ`` a subject needs to have a read access on
object. For the ``PTRACE_ATTACH`` a read-write access is required.
1 - exact:
this is the policy that limits ``PTRACE_ATTACH``. Attach is
only allowed when subject's and object's labels are equal.
PTRACE_READ is not affected. Can be overridden with CAP_SYS_PTRACE.
2 - draconian: this policy behaves like the 'exact' above with an
exception that it can't be overridden with CAP_SYS_PTRACE.
``PTRACE_READ`` is not affected. Can be overridden with ``CAP_SYS_PTRACE``.
2 - draconian:
this policy behaves like the 'exact' above with an
exception that it can't be overridden with ``CAP_SYS_PTRACE``.
revoke-subject
Writing a Smack label here sets the access to '-' for all access
rules with that subject label.
unconfined
If the kernel is configured with CONFIG_SECURITY_SMACK_BRINGUP
a process with CAP_MAC_ADMIN can write a label into this interface.
If the kernel is configured with ``CONFIG_SECURITY_SMACK_BRINGUP``
a process with ``CAP_MAC_ADMIN`` can write a label into this interface.
Thereafter, accesses that involve that label will be logged and
the access permitted if it wouldn't be otherwise. Note that this
is dangerous and can ruin the proper labeling of your system.
It should never be used in production.
relabel-self
This interface contains a list of labels to which the process can
transition to, by writing to /proc/self/attr/current.
transition to, by writing to ``/proc/self/attr/current``.
Normally a process can change its own label to any legal value, but only
if it has CAP_MAC_ADMIN. This interface allows a process without
CAP_MAC_ADMIN to relabel itself to one of labels from predefined list.
A process without CAP_MAC_ADMIN can change its label only once. When it
if it has ``CAP_MAC_ADMIN``. This interface allows a process without
``CAP_MAC_ADMIN`` to relabel itself to one of labels from predefined list.
A process without ``CAP_MAC_ADMIN`` can change its label only once. When it
does, this list will be cleared.
The values are set by writing the desired labels, separated
by spaces, to the file or cleared by writing "-" to the file.
If you are using the smackload utility
you can add access rules in /etc/smack/accesses. They take the form:
you can add access rules in ``/etc/smack/accesses``. They take the form::
subjectlabel objectlabel access
@ -277,14 +336,14 @@ object with objectlabel. If there is no rule no access is allowed.
Look for additional programs on http://schaufler-ca.com
From the Smack Whitepaper:
The Simplified Mandatory Access Control Kernel
The Simplified Mandatory Access Control Kernel (Whitepaper)
===========================================================
Casey Schaufler
casey@schaufler-ca.com
Mandatory Access Control
------------------------
Computer systems employ a variety of schemes to constrain how information is
shared among the people and services using the machine. Some of these schemes
@ -297,6 +356,7 @@ access control mechanisms because you don't have a choice regarding the users
or programs that have access to pieces of data.
Bell & LaPadula
---------------
From the middle of the 1980's until the turn of the century Mandatory Access
Control (MAC) was very closely associated with the Bell & LaPadula security
@ -306,6 +366,7 @@ within the Capital Beltway and Scandinavian supercomputer centers but was
often sited as failing to address general needs.
Domain Type Enforcement
-----------------------
Around the turn of the century Domain Type Enforcement (DTE) became popular.
This scheme organizes users, programs, and data into domains that are
@ -316,6 +377,7 @@ necessary to provide a secure domain mapping leads to the scheme being
disabled or used in limited ways in the majority of cases.
Smack
-----
Smack is a Mandatory Access Control mechanism designed to provide useful MAC
while avoiding the pitfalls of its predecessors. The limitations of Bell &
@ -326,46 +388,55 @@ Enforcement and avoided by defining access controls in terms of the access
modes already in use.
Smack Terminology
-----------------
The jargon used to talk about Smack will be familiar to those who have dealt
with other MAC systems and shouldn't be too difficult for the uninitiated to
pick up. There are four terms that are used in a specific way and that are
especially important:
Subject: A subject is an active entity on the computer system.
Subject:
A subject is an active entity on the computer system.
On Smack a subject is a task, which is in turn the basic unit
of execution.
Object: An object is a passive entity on the computer system.
Object:
An object is a passive entity on the computer system.
On Smack files of all types, IPC, and tasks can be objects.
Access: Any attempt by a subject to put information into or get
Access:
Any attempt by a subject to put information into or get
information from an object is an access.
Label: Data that identifies the Mandatory Access Control
Label:
Data that identifies the Mandatory Access Control
characteristics of a subject or an object.
These definitions are consistent with the traditional use in the security
community. There are also some terms from Linux that are likely to crop up:
Capability: A task that possesses a capability has permission to
Capability:
A task that possesses a capability has permission to
violate an aspect of the system security policy, as identified by
the specific capability. A task that possesses one or more
capabilities is a privileged task, whereas a task with no
capabilities is an unprivileged task.
Privilege: A task that is allowed to violate the system security
Privilege:
A task that is allowed to violate the system security
policy is said to have privilege. As of this writing a task can
have privilege either by possessing capabilities or by having an
effective user of root.
Smack Basics
------------
Smack is an extension to a Linux system. It enforces additional restrictions
on what subjects can access which objects, based on the labels attached to
each of the subject and the object.
Labels
~~~~~~
Smack labels are ASCII character strings. They can be up to 255 characters
long, but keeping them to twenty-three characters is recommended.
@ -377,7 +448,7 @@ contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
(quote) and '"' (double-quote) characters.
Smack labels cannot begin with a '-'. This is reserved for special options.
There are some predefined labels:
There are some predefined labels::
_ Pronounced "floor", a single underscore character.
^ Pronounced "hat", a single circumflex character.
@ -390,14 +461,18 @@ of a process will usually be assigned by the system initialization
mechanism.
Access Rules
~~~~~~~~~~~~
Smack uses the traditional access modes of Linux. These modes are read,
execute, write, and occasionally append. There are a few cases where the
access mode may not be obvious. These include:
Signals: A signal is a write operation from the subject task to
Signals:
A signal is a write operation from the subject task to
the object task.
Internet Domain IPC: Transmission of a packet is considered a
Internet Domain IPC:
Transmission of a packet is considered a
write operation from the source task to the destination task.
Smack restricts access based on the label attached to a subject and the label
@ -417,6 +492,7 @@ order:
7. Any other access is denied.
Smack Access Rules
~~~~~~~~~~~~~~~~~~
With the isolation provided by Smack access separation is simple. There are
many interesting cases where limited access by subjects to objects with
@ -427,8 +503,9 @@ be "born" highly classified. To accommodate such schemes Smack includes a
mechanism for specifying rules allowing access between labels.
Access Rule Format
~~~~~~~~~~~~~~~~~~
The format of an access rule is:
The format of an access rule is::
subject-label object-label access
@ -446,7 +523,7 @@ describe access modes:
Uppercase values for the specification letters are allowed as well.
Access mode specifications can be in any order. Examples of acceptable rules
are:
are::
TopSecret Secret rx
Secret Unclass R
@ -456,7 +533,7 @@ are:
New Old rRrRr
Closed Off -
Examples of unacceptable rules are:
Examples of unacceptable rules are::
Top Secret Secret rx
Ace Ace r
@ -469,6 +546,7 @@ access specifications. The dash is a placeholder, so "a-r" is the same
as "ar". A lone dash is used to specify that no access should be allowed.
Applying Access Rules
~~~~~~~~~~~~~~~~~~~~~
The developers of Linux rarely define new sorts of things, usually importing
schemes and concepts from other systems. Most often, the other systems are
@ -511,6 +589,7 @@ one process to another requires that the sender have write access to the
receiver. The receiver is not required to have read access to the sender.
Setting Access Rules
~~~~~~~~~~~~~~~~~~~~
The configuration file /etc/smack/accesses contains the rules to be set at
system startup. The contents are written to the special file
@ -520,6 +599,7 @@ one rule, with the most recently specified overriding any earlier
specification.
Task Attribute
~~~~~~~~~~~~~~
The Smack label of a process can be read from /proc/<pid>/attr/current. A
process can read its own Smack label from /proc/self/attr/current. A
@ -527,12 +607,14 @@ privileged process can change its own Smack label by writing to
/proc/self/attr/current but not the label of another process.
File Attribute
~~~~~~~~~~~~~~
The Smack label of a filesystem object is stored as an extended attribute
named SMACK64 on the file. This attribute is in the security namespace. It can
only be changed by a process with privilege.
Privilege
~~~~~~~~~
A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged.
CAP_MAC_OVERRIDE allows the process access to objects it would
@ -540,6 +622,7 @@ be denied otherwise. CAP_MAC_ADMIN allows a process to change
Smack data, including rules and attributes.
Smack Networking
~~~~~~~~~~~~~~~~
As mentioned before, Smack enforces access control on network protocol
transmissions. Every packet sent by a Smack process is tagged with its Smack
@ -551,6 +634,7 @@ packet has write access to the receiving process and if that is not the case
the packet is dropped.
CIPSO Configuration
~~~~~~~~~~~~~~~~~~~
It is normally unnecessary to specify the CIPSO configuration. The default
values used by the system handle all internal cases. Smack will compose CIPSO
@ -571,13 +655,13 @@ discarded. The DOI is 3 by default. The value can be read from
The label and category set are mapped to a Smack label as defined in
/etc/smack/cipso.
A Smack/CIPSO mapping has the form:
A Smack/CIPSO mapping has the form::
smack level [category [category]*]
Smack does not expect the level or category sets to be related in any
particular way and does not assume or assign accesses based on them. Some
examples of mappings:
examples of mappings::
TopSecret 7
TS:A,B 7 1 2
@ -597,23 +681,28 @@ value can be read from /sys/fs/smackfs/direct and changed by writing to
/sys/fs/smackfs/direct.
Socket Attributes
~~~~~~~~~~~~~~~~~
There are two attributes that are associated with sockets. These attributes
can only be set by privileged tasks, but any task can read them for their own
sockets.
SMACK64IPIN: The Smack label of the task object. A privileged
SMACK64IPIN:
The Smack label of the task object. A privileged
program that will enforce policy may set this to the star label.
SMACK64IPOUT: The Smack label transmitted with outgoing packets.
SMACK64IPOUT:
The Smack label transmitted with outgoing packets.
A privileged program may set this to match the label of another
task with which it hopes to communicate.
Smack Netlabel Exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~
You will often find that your labeled application has to talk to the outside,
unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel
where you can add some exceptions in the form of :
where you can add some exceptions in the form of::
@IP1 LABEL1 or
@IP2/MASK LABEL2
@ -624,28 +713,32 @@ access on LABEL2.
Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask
first, like in classless IPv4 routing.
A special label '@' and an option '-CIPSO' can be used there :
A special label '@' and an option '-CIPSO' can be used there::
@ means Internet, any application with any label has access to it
-CIPSO means standard CIPSO networking
If you don't know what CIPSO is and don't plan to use it, you can just do :
If you don't know what CIPSO is and don't plan to use it, you can just do::
echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled
Internet access, you can have :
Internet access, you can have::
echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel
echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
Writing Applications for Smack
------------------------------
There are three sorts of applications that will run on a Smack system. How an
application interacts with Smack will determine what it will have to do to
work properly under Smack.
Smack Ignorant Applications
---------------------------
By far the majority of applications have no reason whatever to care about the
unique properties of Smack. Since invoking a program has no impact on the
@ -653,12 +746,14 @@ Smack label associated with the process the only concern likely to arise is
whether the process has execute access to the program.
Smack Relevant Applications
---------------------------
Some programs can be improved by teaching them about Smack, but do not make
any security decisions themselves. The utility ls(1) is one example of such a
program.
Smack Enforcing Applications
----------------------------
These are special programs that not only know about Smack, but participate in
the enforcement of system policy. In most cases these are the programs that
@ -666,15 +761,16 @@ set up user sessions. There are also network services that provide information
to processes running with various labels.
File System Interfaces
----------------------
Smack maintains labels on file system objects using extended attributes. The
Smack label of a file, directory, or other file system object can be obtained
using getxattr(2).
using getxattr(2)::
len = getxattr("/", "security.SMACK64", value, sizeof (value));
will put the Smack label of the root directory into value. A privileged
process can set the Smack label of a file system object with setxattr(2).
process can set the Smack label of a file system object with setxattr(2)::
len = strlen("Rubble");
rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0);
@ -683,17 +779,18 @@ will set the Smack label of /foo to "Rubble" if the program has appropriate
privilege.
Socket Interfaces
-----------------
The socket attributes can be read using fgetxattr(2).
A privileged process can set the Smack label of outgoing packets with
fsetxattr(2).
fsetxattr(2)::
len = strlen("Rubble");
rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0);
will set the Smack label "Rubble" on packets going out from the socket if the
program has appropriate privilege.
program has appropriate privilege::
rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0);
@ -701,29 +798,36 @@ will set the Smack label "*" as the object label against which incoming
packets will be checked if the program has appropriate privilege.
Administration
--------------
Smack supports some mount options:
smackfsdef=label: specifies the label to give files that lack
smackfsdef=label:
specifies the label to give files that lack
the Smack label extended attribute.
smackfsroot=label: specifies the label to assign the root of the
smackfsroot=label:
specifies the label to assign the root of the
file system if it lacks the Smack extended attribute.
smackfshat=label: specifies a label that must have read access to
smackfshat=label:
specifies a label that must have read access to
all labels set on the filesystem. Not yet enforced.
smackfsfloor=label: specifies a label to which all labels set on the
smackfsfloor=label:
specifies a label to which all labels set on the
filesystem must have read access. Not yet enforced.
These mount options apply to all file system types.
Smack auditing
--------------
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
in your kernel configuration.
By default, all denied events will be audited. You can change this behavior by
writing a single character to the /sys/fs/smackfs/logging file :
writing a single character to the /sys/fs/smackfs/logging file::
0 : no logging
1 : log denied (default)
2 : log accepted
@ -735,6 +839,7 @@ that triggered the event, plus other pairs depending on the type of event
audited.
Bringup Mode
------------
Bringup mode provides logging features that can make application
configuration and system bringup easier. Configure the kernel with

View File

@ -1,13 +1,14 @@
====
Yama
====
Yama is a Linux Security Module that collects system-wide DAC security
protections that are not handled by the core kernel itself. This is
selectable at build-time with CONFIG_SECURITY_YAMA, and can be controlled
at run-time through sysctls in /proc/sys/kernel/yama:
selectable at build-time with ``CONFIG_SECURITY_YAMA``, and can be controlled
at run-time through sysctls in ``/proc/sys/kernel/yama``:
- ptrace_scope
==============================================================
ptrace_scope:
ptrace_scope
============
As Linux grows in popularity, it will become a larger target for
malware. One particularly troubling weakness of the Linux process
@ -25,47 +26,49 @@ exist and remain possible if ptrace is allowed to operate as before.
Since ptrace is not commonly used by non-developers and non-admins, system
builders should be allowed the option to disable this debugging system.
For a solution, some applications use prctl(PR_SET_DUMPABLE, ...) to
For a solution, some applications use ``prctl(PR_SET_DUMPABLE, ...)`` to
specifically disallow such ptrace attachment (e.g. ssh-agent), but many
do not. A more general solution is to only allow ptrace directly from a
parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still
work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID"
work), or with ``CAP_SYS_PTRACE`` (i.e. "gdb --pid=PID", and "strace -p PID"
still work as root).
In mode 1, software that has defined application-specific relationships
between a debugging process and its inferior (crash handlers, etc),
prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which
other process (and its descendants) are allowed to call PTRACE_ATTACH
``prctl(PR_SET_PTRACER, pid, ...)`` can be used. An inferior can declare which
other process (and its descendants) are allowed to call ``PTRACE_ATTACH``
against it. Only one such declared debugging process can exists for
each inferior at a time. For example, this is used by KDE, Chromium, and
Firefox's crash handlers, and by Wine for allowing only Wine processes
to ptrace each other. If a process wishes to entirely disable these ptrace
restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)
restrictions, it can call ``prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)``
so that any otherwise allowed process (even those in external pid namespaces)
may attach.
The sysctl settings (writable only with CAP_SYS_PTRACE) are:
The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are:
0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
0 - classic ptrace permissions:
a process can ``PTRACE_ATTACH`` to any other
process running under the same uid, as long as it is dumpable (i.e.
did not transition uids, start privileged, or have called
prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is
``prctl(PR_SET_DUMPABLE...)`` already). Similarly, ``PTRACE_TRACEME`` is
unchanged.
1 - restricted ptrace: a process must have a predefined relationship
with the inferior it wants to call PTRACE_ATTACH on. By default,
1 - restricted ptrace:
a process must have a predefined relationship
with the inferior it wants to call ``PTRACE_ATTACH`` on. By default,
this relationship is that of only its descendants when the above
classic criteria is also met. To change the relationship, an
inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
an allowed debugger PID to call PTRACE_ATTACH on the inferior.
Using PTRACE_TRACEME is unchanged.
inferior can call ``prctl(PR_SET_PTRACER, debugger, ...)`` to declare
an allowed debugger PID to call ``PTRACE_ATTACH`` on the inferior.
Using ``PTRACE_TRACEME`` is unchanged.
2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
with PTRACE_ATTACH, or through children calling PTRACE_TRACEME.
2 - admin-only attach:
only processes with ``CAP_SYS_PTRACE`` may use ptrace
with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``.
3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via
PTRACE_TRACEME. Once set, this sysctl value cannot be changed.
3 - no attach:
no processes may use ptrace with ``PTRACE_ATTACH`` nor via
``PTRACE_TRACEME``. Once set, this sysctl value cannot be changed.
The original children-only logic was based on the restrictions in grsecurity.
==============================================================

View File

@ -1,4 +1,9 @@
--- What is AppArmor? ---
========
AppArmor
========
What is AppArmor?
=================
AppArmor is MAC style security extension for the Linux kernel. It implements
a task centered policy, with task "profiles" being created and loaded
@ -6,34 +11,41 @@ from user space. Tasks on the system that do not have a profile defined for
them run in an unconfined state which is equivalent to standard Linux DAC
permissions.
--- How to enable/disable ---
How to enable/disable
=====================
set CONFIG_SECURITY_APPARMOR=y
set ``CONFIG_SECURITY_APPARMOR=y``
If AppArmor should be selected as the default security module then
set CONFIG_DEFAULT_SECURITY="apparmor"
and CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
If AppArmor should be selected as the default security module then set::
CONFIG_DEFAULT_SECURITY="apparmor"
CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
Build the kernel
If AppArmor is not the default security module it can be enabled by passing
security=apparmor on the kernel's command line.
``security=apparmor`` on the kernel's command line.
If AppArmor is the default security module it can be disabled by passing
apparmor=0, security=XXXX (where XXX is valid security module), on the
kernel's command line
``apparmor=0, security=XXXX`` (where ``XXXX`` is valid security module), on the
kernel's command line.
For AppArmor to enforce any restrictions beyond standard Linux DAC permissions
policy must be loaded into the kernel from user space (see the Documentation
and tools links).
--- Documentation ---
Documentation
=============
Documentation can be found on the wiki.
Documentation can be found on the wiki, linked below.
--- Links ---
Links
=====
Mailing List - apparmor@lists.ubuntu.com
Wiki - http://apparmor.wiki.kernel.org/
User space tools - https://launchpad.net/apparmor
Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git

View File

@ -1,12 +1,13 @@
Linux Security Module framework
-------------------------------
===========================
Linux Security Module Usage
===========================
The Linux Security Module (LSM) framework provides a mechanism for
various security checks to be hooked by new kernel extensions. The name
"module" is a bit of a misnomer since these extensions are not actually
loadable kernel modules. Instead, they are selectable at build-time via
CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the
"security=..." kernel command line argument, in the case where multiple
``"security=..."`` kernel command line argument, in the case where multiple
LSMs were built into a given kernel.
The primary users of the LSM interface are Mandatory Access Control
@ -19,23 +20,22 @@ in the core functionality of Linux itself.
Without a specific LSM built into the kernel, the default LSM will be the
Linux capabilities system. Most LSMs choose to extend the capabilities
system, building their checks on top of the defined capability hooks.
For more details on capabilities, see capabilities(7) in the Linux
For more details on capabilities, see ``capabilities(7)`` in the Linux
man-pages project.
A list of the active security modules can be found by reading
/sys/kernel/security/lsm. This is a comma separated list, and
``/sys/kernel/security/lsm``. This is a comma separated list, and
will always include the capability module. The list reflects the
order in which checks are made. The capability module will always
be first, followed by any "minor" modules (e.g. Yama) and then
the one "major" module (e.g. SELinux) if there is one configured.
Based on https://lkml.org/lkml/2007/10/26/215,
a new LSM is accepted into the kernel when its intent (a description of
what it tries to protect against and in what cases one would expect to
use it) has been appropriately documented in Documentation/security/.
This allows an LSM's code to be easily compared to its goals, and so
that end users and distros can make a more informed decision about which
LSMs suit their requirements.
.. toctree::
:maxdepth: 1
For extensive documentation on the available LSM hook interfaces, please
see include/linux/security.h.
apparmor
LoadPin
SELinux
Smack
tomoyo
Yama

View File

@ -1,21 +1,30 @@
--- What is TOMOYO? ---
======
TOMOYO
======
What is TOMOYO?
===============
TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel.
LiveCD-based tutorials are available at
http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/
http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/ .
http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/
Though these tutorials use non-LSM version of TOMOYO, they are useful for you
to know what TOMOYO is.
--- How to enable TOMOYO? ---
How to enable TOMOYO?
=====================
Build the kernel with CONFIG_SECURITY_TOMOYO=y and pass "security=tomoyo" on
Build the kernel with ``CONFIG_SECURITY_TOMOYO=y`` and pass ``security=tomoyo`` on
kernel's command line.
Please see http://tomoyo.sourceforge.jp/2.3/ for details.
--- Where is documentation? ---
Where is documentation?
=======================
User <-> Kernel interface documentation is available at
http://tomoyo.sourceforge.jp/2.3/policy-reference.html .
@ -42,7 +51,8 @@ History of TOMOYO?
Realities of Mainlining
http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf
--- What is future plan? ---
What is future plan?
====================
We believe that inode based security and name based security are complementary
and both should be used together. But unfortunately, so far, we cannot enable

View File

@ -55,12 +55,6 @@ Documentation
contains information about the problems, which may result by upgrading
your kernel.
- The Documentation/DocBook/ subdirectory contains several guides for
kernel developers and users. These guides can be rendered in a
number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
or ``make mandocs`` will render the documentation in the requested format.
Installing the kernel source
----------------------------

View File

@ -62,6 +62,7 @@ configure specific aspects of kernel behavior to your liking.
ras
pm/index
thunderbolt
LSM/index
.. only:: subproject and html

View File

@ -727,7 +727,8 @@
See also Documentation/input/joystick-parport.txt
ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
time. See Documentation/dynamic-debug-howto.txt for
time. See
Documentation/admin-guide/dynamic-debug-howto.rst for
details. Deprecated, see dyndbg.
debug [KNL] Enable kernel debugging (events log level).
@ -890,7 +891,8 @@
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
module.dyndbg[="val"]
Enable debug messages at boot time. See
Documentation/dynamic-debug-howto.txt for details.
Documentation/admin-guide/dynamic-debug-howto.rst
for details.
nompx [X86] Disables Intel Memory Protection Extensions.
See Documentation/x86/intel_mpx.txt for more

View File

@ -344,9 +344,9 @@ for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory
controllers. The following example will assume 2 channels:
+------------+-----------------------+
| Chip | Channels |
| Select +-----------+-----------+
| rows | ``ch0`` | ``ch1`` |
| CS Rows | Channels |
+------------+-----------+-----------+
| | ``ch0`` | ``ch1`` |
+============+===========+===========+
| ``csrow0`` | DIMM_A0 | DIMM_B0 |
+------------+ | |
@ -698,7 +698,7 @@ information indicating that errors have been detected::
The structure of the message is:
+---------------------------------------+-------------+
| Content + Example |
| Content | Example |
+=======================================+=============+
| The memory controller | MC0 |
+---------------------------------------+-------------+
@ -713,7 +713,7 @@ The structure of the message is:
+---------------------------------------+-------------+
| The error syndrome | 0xb741 |
+---------------------------------------+-------------+
| Memory row | row 0 +
| Memory row | row 0 |
+---------------------------------------+-------------+
| Memory channel | channel 1 |
+---------------------------------------+-------------+

View File

@ -271,8 +271,7 @@ latex_elements = {
# Additional stuff for the LaTeX preamble.
'preamble': '''
% Adjust margins
\\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry}
\\usepackage{ifthen}
% Allow generate some pages in landscape
\\usepackage{lscape}
@ -281,6 +280,7 @@ latex_elements = {
\\definecolor{NoteColor}{RGB}{204,255,255}
\\definecolor{WarningColor}{RGB}{255,204,204}
\\definecolor{AttentionColor}{RGB}{255,255,204}
\\definecolor{ImportantColor}{RGB}{192,255,204}
\\definecolor{OtherColor}{RGB}{204,204,204}
\\newlength{\\mynoticelength}
\\makeatletter\\newenvironment{coloredbox}[1]{%
@ -301,9 +301,14 @@ latex_elements = {
\\ifthenelse%
{\\equal{\\py@noticetype}{attention}}%
{\\colorbox{AttentionColor}{\\usebox{\\@tempboxa}}}%
{%
\\ifthenelse%
{\\equal{\\py@noticetype}{important}}%
{\\colorbox{ImportantColor}{\\usebox{\\@tempboxa}}}%
{\\colorbox{OtherColor}{\\usebox{\\@tempboxa}}}%
}%
}%
}%
}\\makeatother
\\makeatletter
@ -336,30 +341,51 @@ latex_elements = {
if major == 1 and minor > 3:
latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n'
if major == 1 and minor <= 4:
latex_elements['preamble'] += '\\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry}'
elif major == 1 and (minor > 5 or (minor == 5 and patch >= 3)):
latex_elements['sphinxsetup'] = 'hmargin=0.5in, vmargin=0.5in'
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
# Sorted in alphabetical order
latex_documents = [
('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
'The kernel development community', 'manual'),
('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
'The kernel development community', 'manual'),
('core-api/index', 'core-api.tex', 'The kernel core API manual',
'The kernel development community', 'manual'),
('crypto/index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
'The kernel development community', 'manual'),
('dev-tools/index', 'dev-tools.tex', 'Development tools for the Kernel',
'The kernel development community', 'manual'),
('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
'The kernel development community', 'manual'),
('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
'The kernel development community', 'manual'),
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
'The kernel development community', 'manual'),
('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation',
'The kernel development community', 'manual'),
('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
'The kernel development community', 'manual'),
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
'The kernel development community', 'manual'),
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
'The kernel development community', 'manual'),
('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
'The kernel development community', 'manual'),
('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
'The kernel development community', 'manual'),
('networking/index', 'networking.tex', 'Linux Networking Documentation',
'The kernel development community', 'manual'),
('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
'The kernel development community', 'manual'),
('security/index', 'security.tex', 'The kernel security subsystem manual',
'The kernel development community', 'manual'),
('sh/index', 'sh.tex', 'SuperH architecture implementation manual',
'The kernel development community', 'manual'),
('sound/index', 'sound.tex', 'Linux Sound Subsystem Documentation',
'The kernel development community', 'manual'),
('userspace-api/index', 'userspace-api.tex', 'The Linux kernel user-space API guide',
'The kernel development community', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of

View File

@ -10,7 +10,10 @@ properties:
1. Objects are opaque pointers. The implementation does not care where they
point (if anywhere) or what they point to (if anything).
.. note:: Pointers to objects _must_ be zero in the least significant bit.
.. note::
Pointers to objects _must_ be zero in the least significant bit.
2. Objects do not need to contain linkage blocks for use by the array. This
permits an object to be located in multiple arrays simultaneously.

View File

@ -19,6 +19,7 @@ Core utilities
workqueue
genericirq
flexible-arrays
librs
Interfaces for kernel debugging
===============================

View File

@ -0,0 +1,212 @@
==========================================
Reed-Solomon Library Programming Interface
==========================================
:Author: Thomas Gleixner
Introduction
============
The generic Reed-Solomon Library provides encoding, decoding and error
correction functions.
Reed-Solomon codes are used in communication and storage applications to
ensure data integrity.
This documentation is provided for developers who want to utilize the
functions provided by the library.
Known Bugs And Assumptions
==========================
None.
Usage
=====
This chapter provides examples of how to use the library.
Initializing
------------
The init function init_rs returns a pointer to an rs decoder structure,
which holds the necessary information for encoding, decoding and error
correction with the given polynomial. It either uses an existing
matching decoder or creates a new one. On creation all the lookup tables
for fast en/decoding are created. The function may take a while, so make
sure not to call it in critical code paths.
::
/* the Reed Solomon control structure */
static struct rs_control *rs_decoder;
/* Symbolsize is 10 (bits)
* Primitive polynomial is x^10+x^3+1
* first consecutive root is 0
* primitive element to generate roots = 1
* generator polynomial degree (number of roots) = 6
*/
rs_decoder = init_rs (10, 0x409, 0, 1, 6);
Encoding
--------
The encoder calculates the Reed-Solomon code over the given data length
and stores the result in the parity buffer. Note that the parity buffer
must be initialized before calling the encoder.
The expanded data can be inverted on the fly by providing a non-zero
inversion mask. The expanded data is XOR'ed with the mask. This is used
e.g. for FLASH ECC, where the all 0xFF is inverted to an all 0x00. The
Reed-Solomon code for all 0x00 is all 0x00. The code is inverted before
storing to FLASH so it is 0xFF too. This prevents that reading from an
erased FLASH results in ECC errors.
The databytes are expanded to the given symbol size on the fly. There is
no support for encoding continuous bitstreams with a symbol size != 8 at
the moment. If it is necessary it should be not a big deal to implement
such functionality.
::
/* Parity buffer. Size = number of roots */
uint16_t par[6];
/* Initialize the parity buffer */
memset(par, 0, sizeof(par));
/* Encode 512 byte in data8. Store parity in buffer par */
encode_rs8 (rs_decoder, data8, 512, par, 0);
Decoding
--------
The decoder calculates the syndrome over the given data length and the
received parity symbols and corrects errors in the data.
If a syndrome is available from a hardware decoder then the syndrome
calculation is skipped.
The correction of the data buffer can be suppressed by providing a
correction pattern buffer and an error location buffer to the decoder.
The decoder stores the calculated error location and the correction
bitmask in the given buffers. This is useful for hardware decoders which
use a weird bit ordering scheme.
The databytes are expanded to the given symbol size on the fly. There is
no support for decoding continuous bitstreams with a symbolsize != 8 at
the moment. If it is necessary it should be not a big deal to implement
such functionality.
Decoding with syndrome calculation, direct data correction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
/* Parity buffer. Size = number of roots */
uint16_t par[6];
uint8_t data[512];
int numerr;
/* Receive data */
.....
/* Receive parity */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL);
Decoding with syndrome given by hardware decoder, direct data correction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
/* Parity buffer. Size = number of roots */
uint16_t par[6], syn[6];
uint8_t data[512];
int numerr;
/* Receive data */
.....
/* Receive parity */
.....
/* Get syndrome from hardware decoder */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL);
Decoding with syndrome given by hardware decoder, no direct data correction.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note: It's not necessary to give data and received parity to the
decoder.
::
/* Parity buffer. Size = number of roots */
uint16_t par[6], syn[6], corr[8];
uint8_t data[512];
int numerr, errpos[8];
/* Receive data */
.....
/* Receive parity */
.....
/* Get syndrome from hardware decoder */
.....
/* Decode 512 byte in data8.*/
numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr);
for (i = 0; i < numerr; i++) {
do_error_correction_in_your_buffer(errpos[i], corr[i]);
}
Cleanup
-------
The function free_rs frees the allocated resources, if the caller is
the last user of the decoder.
::
/* Release resources */
free_rs(rs_decoder);
Structures
==========
This chapter contains the autogenerated documentation of the structures
which are used in the Reed-Solomon Library and are relevant for a
developer.
.. kernel-doc:: include/linux/rslib.h
:internal:
Public Functions Provided
=========================
This chapter contains the autogenerated documentation of the
Reed-Solomon functions which are exported.
.. kernel-doc:: lib/reed_solomon/reed_solomon.c
:export:
Credits
=======
The library code for encoding and decoding was written by Phil Karn.
::
Copyright 2002, Phil Karn, KA9Q
May be used under the terms of the GNU General Public License (GPL)
The wrapper functions and interfaces are written by Thomas Gleixner.
Many users have provided bugfixes, improvements and helping hands for
testing. Thanks a lot.
The following people have contributed to this document:
Thomas Gleixner\ tglx@linutronix.de

View File

@ -265,7 +265,7 @@ mandatory:
The caller passes a pointer to the following struct with all of the fields
cleared, except for data, datalen and quotalen [see
Documentation/security/keys.txt].
Documentation/security/keys/core.rst].
struct key_preparsed_payload {
char *description;

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = 'Linux Kernel Crypto API'
tags.add("subproject")
latex_documents = [
('index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
'The kernel development community', 'manual'),
]

View File

@ -23,6 +23,7 @@ whole; patches welcome!
kmemleak
kmemcheck
gdb-kernel-debugging
kgdb
.. only:: subproject and html

View File

@ -0,0 +1,907 @@
=================================================
Using kgdb, kdb and the kernel debugger internals
=================================================
:Author: Jason Wessel
Introduction
============
The kernel has two different debugger front ends (kdb and kgdb) which
interface to the debug core. It is possible to use either of the
debugger front ends and dynamically transition between them if you
configure the kernel properly at compile and runtime.
Kdb is simplistic shell-style interface which you can use on a system
console with a keyboard or serial console. You can use it to inspect
memory, registers, process lists, dmesg, and even set breakpoints to
stop in a certain location. Kdb is not a source level debugger, although
you can set breakpoints and execute some basic kernel run control. Kdb
is mainly aimed at doing some analysis to aid in development or
diagnosing kernel problems. You can access some symbols by name in
kernel built-ins or in kernel modules if the code was built with
``CONFIG_KALLSYMS``.
Kgdb is intended to be used as a source level debugger for the Linux
kernel. It is used along with gdb to debug a Linux kernel. The
expectation is that gdb can be used to "break in" to the kernel to
inspect memory, variables and look through call stack information
similar to the way an application developer would use gdb to debug an
application. It is possible to place breakpoints in kernel code and
perform some limited execution stepping.
Two machines are required for using kgdb. One of these machines is a
development machine and the other is the target machine. The kernel to
be debugged runs on the target machine. The development machine runs an
instance of gdb against the vmlinux file which contains the symbols (not
a boot image such as bzImage, zImage, uImage...). In gdb the developer
specifies the connection parameters and connects to kgdb. The type of
connection a developer makes with gdb depends on the availability of
kgdb I/O modules compiled as built-ins or loadable kernel modules in the
test machine's kernel.
Compiling a kernel
==================
- In order to enable compilation of kdb, you must first enable kgdb.
- The kgdb test compile options are described in the kgdb test suite
chapter.
Kernel config options for kgdb
------------------------------
To enable ``CONFIG_KGDB`` you should look under
:menuselection:`Kernel hacking --> Kernel debugging` and select
:menuselection:`KGDB: kernel debugger`.
While it is not a hard requirement that you have symbols in your vmlinux
file, gdb tends not to be very useful without the symbolic data, so you
will want to turn on ``CONFIG_DEBUG_INFO`` which is called
:menuselection:`Compile the kernel with debug info` in the config menu.
It is advised, but not required, that you turn on the
``CONFIG_FRAME_POINTER`` kernel option which is called :menuselection:`Compile
the kernel with frame pointers` in the config menu. This option inserts code
to into the compiled executable which saves the frame information in
registers or on the stack at different points which allows a debugger
such as gdb to more accurately construct stack back traces while
debugging the kernel.
If the architecture that you are using supports the kernel option
``CONFIG_STRICT_KERNEL_RWX``, you should consider turning it off. This
option will prevent the use of software breakpoints because it marks
certain regions of the kernel's memory space as read-only. If kgdb
supports it for the architecture you are using, you can use hardware
breakpoints if you desire to run with the ``CONFIG_STRICT_KERNEL_RWX``
option turned on, else you need to turn off this option.
Next you should choose one of more I/O drivers to interconnect debugging
host and debugged target. Early boot debugging requires a KGDB I/O
driver that supports early debugging and the driver must be built into
the kernel directly. Kgdb I/O driver configuration takes place via
kernel or module parameters which you can learn more about in the in the
section that describes the parameter kgdboc.
Here is an example set of ``.config`` symbols to enable or disable for kgdb::
# CONFIG_STRICT_KERNEL_RWX is not set
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
Kernel config options for kdb
-----------------------------
Kdb is quite a bit more complex than the simple gdbstub sitting on top
of the kernel's debug core. Kdb must implement a shell, and also adds
some helper functions in other parts of the kernel, responsible for
printing out interesting data such as what you would see if you ran
``lsmod``, or ``ps``. In order to build kdb into the kernel you follow the
same steps as you would for kgdb.
The main config option for kdb is ``CONFIG_KGDB_KDB`` which is called
:menuselection:`KGDB_KDB: include kdb frontend for kgdb` in the config menu.
In theory you would have already also selected an I/O driver such as the
``CONFIG_KGDB_SERIAL_CONSOLE`` interface if you plan on using kdb on a
serial port, when you were configuring kgdb.
If you want to use a PS/2-style keyboard with kdb, you would select
``CONFIG_KDB_KEYBOARD`` which is called :menuselection:`KGDB_KDB: keyboard as
input device` in the config menu. The ``CONFIG_KDB_KEYBOARD`` option is not
used for anything in the gdb interface to kgdb. The ``CONFIG_KDB_KEYBOARD``
option only works with kdb.
Here is an example set of ``.config`` symbols to enable/disable kdb::
# CONFIG_STRICT_KERNEL_RWX is not set
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_KGDB_KDB=y
CONFIG_KDB_KEYBOARD=y
Kernel Debugger Boot Arguments
==============================
This section describes the various runtime kernel parameters that affect
the configuration of the kernel debugger. The following chapter covers
using kdb and kgdb as well as providing some examples of the
configuration parameters.
Kernel parameter: kgdboc
------------------------
The kgdboc driver was originally an abbreviation meant to stand for
"kgdb over console". Today it is the primary mechanism to configure how
to communicate from gdb to kgdb as well as the devices you want to use
to interact with the kdb shell.
For kgdb/gdb, kgdboc is designed to work with a single serial port. It
is intended to cover the circumstance where you want to use a serial
console as your primary console as well as using it to perform kernel
debugging. It is also possible to use kgdb on a serial port which is not
designated as a system console. Kgdboc may be configured as a kernel
built-in or a kernel loadable module. You can only make use of
``kgdbwait`` and early debugging if you build kgdboc into the kernel as
a built-in.
Optionally you can elect to activate kms (Kernel Mode Setting)
integration. When you use kms with kgdboc and you have a video driver
that has atomic mode setting hooks, it is possible to enter the debugger
on the graphics console. When the kernel execution is resumed, the
previous graphics mode will be restored. This integration can serve as a
useful tool to aid in diagnosing crashes or doing analysis of memory
with kdb while allowing the full graphics console applications to run.
kgdboc arguments
~~~~~~~~~~~~~~~~
Usage::
kgdboc=[kms][[,]kbd][[,]serial_device][,baud]
The order listed above must be observed if you use any of the optional
configurations together.
Abbreviations:
- kms = Kernel Mode Setting
- kbd = Keyboard
You can configure kgdboc to use the keyboard, and/or a serial device
depending on if you are using kdb and/or kgdb, in one of the following
scenarios. The order listed above must be observed if you use any of the
optional configurations together. Using kms + only gdb is generally not
a useful combination.
Using loadable module or built-in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1. As a kernel built-in:
Use the kernel boot argument::
kgdboc=<tty-device>,[baud]
2. As a kernel loadable module:
Use the command::
modprobe kgdboc kgdboc=<tty-device>,[baud]
Here are two examples of how you might format the kgdboc string. The
first is for an x86 target using the first serial port. The second
example is for the ARM Versatile AB using the second serial port.
1. ``kgdboc=ttyS0,115200``
2. ``kgdboc=ttyAMA1,115200``
Configure kgdboc at runtime with sysfs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
At run time you can enable or disable kgdboc by echoing a parameters
into the sysfs. Here are two examples:
1. Enable kgdboc on ttyS0::
echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
2. Disable kgdboc::
echo "" > /sys/module/kgdboc/parameters/kgdboc
.. note::
You do not need to specify the baud if you are configuring the
console on tty which is already configured or open.
More examples
^^^^^^^^^^^^^
You can configure kgdboc to use the keyboard, and/or a serial device
depending on if you are using kdb and/or kgdb, in one of the following
scenarios.
1. kdb and kgdb over only a serial port::
kgdboc=<serial_device>[,baud]
Example::
kgdboc=ttyS0,115200
2. kdb and kgdb with keyboard and a serial port::
kgdboc=kbd,<serial_device>[,baud]
Example::
kgdboc=kbd,ttyS0,115200
3. kdb with a keyboard::
kgdboc=kbd
4. kdb with kernel mode setting::
kgdboc=kms,kbd
5. kdb with kernel mode setting and kgdb over a serial port::
kgdboc=kms,kbd,ttyS0,115200
.. note::
Kgdboc does not support interrupting the target via the gdb remote
protocol. You must manually send a :kbd:`SysRq-G` unless you have a proxy
that splits console output to a terminal program. A console proxy has a
separate TCP port for the debugger and a separate TCP port for the
"human" console. The proxy can take care of sending the :kbd:`SysRq-G`
for you.
When using kgdboc with no debugger proxy, you can end up connecting the
debugger at one of two entry points. If an exception occurs after you
have loaded kgdboc, a message should print on the console stating it is
waiting for the debugger. In this case you disconnect your terminal
program and then connect the debugger in its place. If you want to
interrupt the target system and forcibly enter a debug session you have
to issue a :kbd:`Sysrq` sequence and then type the letter :kbd:`g`. Then you
disconnect the terminal session and connect gdb. Your options if you
don't like this are to hack gdb to send the :kbd:`SysRq-G` for you as well as
on the initial connect, or to use a debugger proxy that allows an
unmodified gdb to do the debugging.
Kernel parameter: ``kgdbwait``
------------------------------
The Kernel command line option ``kgdbwait`` makes kgdb wait for a
debugger connection during booting of a kernel. You can only use this
option if you compiled a kgdb I/O driver into the kernel and you
specified the I/O driver configuration as a kernel command line option.
The kgdbwait parameter should always follow the configuration parameter
for the kgdb I/O driver in the kernel command line else the I/O driver
will not be configured prior to asking the kernel to use it to wait.
The kernel will stop and wait as early as the I/O driver and
architecture allows when you use this option. If you build the kgdb I/O
driver as a loadable kernel module kgdbwait will not do anything.
Kernel parameter: ``kgdbcon``
-----------------------------
The ``kgdbcon`` feature allows you to see :c:func:`printk` messages inside gdb
while gdb is connected to the kernel. Kdb does not make use of the kgdbcon
feature.
Kgdb supports using the gdb serial protocol to send console messages to
the debugger when the debugger is connected and running. There are two
ways to activate this feature.
1. Activate with the kernel command line option::
kgdbcon
2. Use sysfs before configuring an I/O driver::
echo 1 > /sys/module/kgdb/parameters/kgdb_use_con
.. note::
If you do this after you configure the kgdb I/O driver, the
setting will not take effect until the next point the I/O is
reconfigured.
.. important::
You cannot use kgdboc + kgdbcon on a tty that is an
active system console. An example of incorrect usage is::
console=ttyS0,115200 kgdboc=ttyS0 kgdbcon
It is possible to use this option with kgdboc on a tty that is not a
system console.
Run time parameter: ``kgdbreboot``
----------------------------------
The kgdbreboot feature allows you to change how the debugger deals with
the reboot notification. You have 3 choices for the behavior. The
default behavior is always set to 0.
.. tabularcolumns:: |p{0.4cm}|p{11.5cm}|p{5.6cm}|
.. flat-table::
:widths: 1 10 8
* - 1
- ``echo -1 > /sys/module/debug_core/parameters/kgdbreboot``
- Ignore the reboot notification entirely.
* - 2
- ``echo 0 > /sys/module/debug_core/parameters/kgdbreboot``
- Send the detach message to any attached debugger client.
* - 3
- ``echo 1 > /sys/module/debug_core/parameters/kgdbreboot``
- Enter the debugger on reboot notify.
Using kdb
=========
Quick start for kdb on a serial port
------------------------------------
This is a quick example of how to use kdb.
1. Configure kgdboc at boot using kernel parameters::
console=ttyS0,115200 kgdboc=ttyS0,115200
OR
Configure kgdboc after the kernel has booted; assuming you are using
a serial port console::
echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
2. Enter the kernel debugger manually or by waiting for an oops or
fault. There are several ways you can enter the kernel debugger
manually; all involve using the :kbd:`SysRq-G`, which means you must have
enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
- When logged in as root or with a super user session you can run::
echo g > /proc/sysrq-trigger
- Example using minicom 2.2
Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g`
- When you have telneted to a terminal server that supports sending
a remote break
Press: :kbd:`CTRL-]`
Type in: ``send break``
Press: :kbd:`Enter` :kbd:`g`
3. From the kdb prompt you can run the ``help`` command to see a complete
list of the commands that are available.
Some useful commands in kdb include:
=========== =================================================================
``lsmod`` Shows where kernel modules are loaded
``ps`` Displays only the active processes
``ps A`` Shows all the processes
``summary`` Shows kernel version info and memory usage
``bt`` Get a backtrace of the current process using :c:func:`dump_stack`
``dmesg`` View the kernel syslog buffer
``go`` Continue the system
=========== =================================================================
4. When you are done using kdb you need to consider rebooting the system
or using the ``go`` command to resuming normal kernel execution. If you
have paused the kernel for a lengthy period of time, applications
that rely on timely networking or anything to do with real wall clock
time could be adversely affected, so you should take this into
consideration when using the kernel debugger.
Quick start for kdb using a keyboard connected console
------------------------------------------------------
This is a quick example of how to use kdb with a keyboard.
1. Configure kgdboc at boot using kernel parameters::
kgdboc=kbd
OR
Configure kgdboc after the kernel has booted::
echo kbd > /sys/module/kgdboc/parameters/kgdboc
2. Enter the kernel debugger manually or by waiting for an oops or
fault. There are several ways you can enter the kernel debugger
manually; all involve using the :kbd:`SysRq-G`, which means you must have
enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
- When logged in as root or with a super user session you can run::
echo g > /proc/sysrq-trigger
- Example using a laptop keyboard:
Press and hold down: :kbd:`Alt`
Press and hold down: :kbd:`Fn`
Press and release the key with the label: :kbd:`SysRq`
Release: :kbd:`Fn`
Press and release: :kbd:`g`
Release: :kbd:`Alt`
- Example using a PS/2 101-key keyboard
Press and hold down: :kbd:`Alt`
Press and release the key with the label: :kbd:`SysRq`
Press and release: :kbd:`g`
Release: :kbd:`Alt`
3. Now type in a kdb command such as ``help``, ``dmesg``, ``bt`` or ``go`` to
continue kernel execution.
Using kgdb / gdb
================
In order to use kgdb you must activate it by passing configuration
information to one of the kgdb I/O drivers. If you do not pass any
configuration information kgdb will not do anything at all. Kgdb will
only actively hook up to the kernel trap hooks if a kgdb I/O driver is
loaded and configured. If you unconfigure a kgdb I/O driver, kgdb will
unregister all the kernel hook points.
All kgdb I/O drivers can be reconfigured at run time, if
``CONFIG_SYSFS`` and ``CONFIG_MODULES`` are enabled, by echo'ing a new
config string to ``/sys/module/<driver>/parameter/<option>``. The driver
can be unconfigured by passing an empty string. You cannot change the
configuration while the debugger is attached. Make sure to detach the
debugger with the ``detach`` command prior to trying to unconfigure a
kgdb I/O driver.
Connecting with gdb to a serial port
------------------------------------
1. Configure kgdboc
Configure kgdboc at boot using kernel parameters::
kgdboc=ttyS0,115200
OR
Configure kgdboc after the kernel has booted::
echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
2. Stop kernel execution (break into the debugger)
In order to connect to gdb via kgdboc, the kernel must first be
stopped. There are several ways to stop the kernel which include
using kgdbwait as a boot argument, via a :kbd:`SysRq-G`, or running the
kernel until it takes an exception where it waits for the debugger to
attach.
- When logged in as root or with a super user session you can run::
echo g > /proc/sysrq-trigger
- Example using minicom 2.2
Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g`
- When you have telneted to a terminal server that supports sending
a remote break
Press: :kbd:`CTRL-]`
Type in: ``send break``
Press: :kbd:`Enter` :kbd:`g`
3. Connect from gdb
Example (using a directly connected port)::
% gdb ./vmlinux
(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyS0
Example (kgdb to a terminal server on TCP port 2012)::
% gdb ./vmlinux
(gdb) target remote 192.168.2.2:2012
Once connected, you can debug a kernel the way you would debug an
application program.
If you are having problems connecting or something is going seriously
wrong while debugging, it will most often be the case that you want
to enable gdb to be verbose about its target communications. You do
this prior to issuing the ``target remote`` command by typing in::
set debug remote 1
Remember if you continue in gdb, and need to "break in" again, you need
to issue an other :kbd:`SysRq-G`. It is easy to create a simple entry point by
putting a breakpoint at ``sys_sync`` and then you can run ``sync`` from a
shell or script to break into the debugger.
kgdb and kdb interoperability
=============================
It is possible to transition between kdb and kgdb dynamically. The debug
core will remember which you used the last time and automatically start
in the same mode.
Switching between kdb and kgdb
------------------------------
Switching from kgdb to kdb
~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two ways to switch from kgdb to kdb: you can use gdb to issue
a maintenance packet, or you can blindly type the command ``$3#33``.
Whenever the kernel debugger stops in kgdb mode it will print the
message ``KGDB or $3#33 for KDB``. It is important to note that you have
to type the sequence correctly in one pass. You cannot type a backspace
or delete because kgdb will interpret that as part of the debug stream.
1. Change from kgdb to kdb by blindly typing::
$3#33
2. Change from kgdb to kdb with gdb::
maintenance packet 3
.. note::
Now you must kill gdb. Typically you press :kbd:`CTRL-Z` and issue
the command::
kill -9 %
Change from kdb to kgdb
~~~~~~~~~~~~~~~~~~~~~~~
There are two ways you can change from kdb to kgdb. You can manually
enter kgdb mode by issuing the kgdb command from the kdb shell prompt,
or you can connect gdb while the kdb shell prompt is active. The kdb
shell looks for the typical first commands that gdb would issue with the
gdb remote protocol and if it sees one of those commands it
automatically changes into kgdb mode.
1. From kdb issue the command::
kgdb
Now disconnect your terminal program and connect gdb in its place
2. At the kdb prompt, disconnect the terminal program and connect gdb in
its place.
Running kdb commands from gdb
-----------------------------
It is possible to run a limited set of kdb commands from gdb, using the
gdb monitor command. You don't want to execute any of the run control or
breakpoint operations, because it can disrupt the state of the kernel
debugger. You should be using gdb for breakpoints and run control
operations if you have gdb connected. The more useful commands to run
are things like lsmod, dmesg, ps or possibly some of the memory
information commands. To see all the kdb commands you can run
``monitor help``.
Example::
(gdb) monitor ps
1 idle process (state I) and
27 sleeping system daemon (state M) processes suppressed,
use 'ps A' to see all.
Task Addr Pid Parent [*] cpu State Thread Command
0xc78291d0 1 0 0 0 S 0xc7829404 init
0xc7954150 942 1 0 0 S 0xc7954384 dropbear
0xc78789c0 944 1 0 0 S 0xc7878bf4 sh
(gdb)
kgdb Test Suite
===============
When kgdb is enabled in the kernel config you can also elect to enable
the config parameter ``KGDB_TESTS``. Turning this on will enable a special
kgdb I/O module which is designed to test the kgdb internal functions.
The kgdb tests are mainly intended for developers to test the kgdb
internals as well as a tool for developing a new kgdb architecture
specific implementation. These tests are not really for end users of the
Linux kernel. The primary source of documentation would be to look in
the ``drivers/misc/kgdbts.c`` file.
The kgdb test suite can also be configured at compile time to run the
core set of tests by setting the kernel config parameter
``KGDB_TESTS_ON_BOOT``. This particular option is aimed at automated
regression testing and does not require modifying the kernel boot config
arguments. If this is turned on, the kgdb test suite can be disabled by
specifying ``kgdbts=`` as a kernel boot argument.
Kernel Debugger Internals
=========================
Architecture Specifics
----------------------
The kernel debugger is organized into a number of components:
1. The debug core
The debug core is found in ``kernel/debugger/debug_core.c``. It
contains:
- A generic OS exception handler which includes sync'ing the
processors into a stopped state on an multi-CPU system.
- The API to talk to the kgdb I/O drivers
- The API to make calls to the arch-specific kgdb implementation
- The logic to perform safe memory reads and writes to memory while
using the debugger
- A full implementation for software breakpoints unless overridden
by the arch
- The API to invoke either the kdb or kgdb frontend to the debug
core.
- The structures and callback API for atomic kernel mode setting.
.. note:: kgdboc is where the kms callbacks are invoked.
2. kgdb arch-specific implementation
This implementation is generally found in ``arch/*/kernel/kgdb.c``. As
an example, ``arch/x86/kernel/kgdb.c`` contains the specifics to
implement HW breakpoint as well as the initialization to dynamically
register and unregister for the trap handlers on this architecture.
The arch-specific portion implements:
- contains an arch-specific trap catcher which invokes
:c:func:`kgdb_handle_exception` to start kgdb about doing its work
- translation to and from gdb specific packet format to :c:type:`pt_regs`
- Registration and unregistration of architecture specific trap
hooks
- Any special exception handling and cleanup
- NMI exception handling and cleanup
- (optional) HW breakpoints
3. gdbstub frontend (aka kgdb)
The gdbstub is located in ``kernel/debug/gdbstub.c``. It contains:
- All the logic to implement the gdb serial protocol
4. kdb frontend
The kdb debugger shell is broken down into a number of components.
The kdb core is located in kernel/debug/kdb. There are a number of
helper functions in some of the other kernel components to make it
possible for kdb to examine and report information about the kernel
without taking locks that could cause a kernel deadlock. The kdb core
contains implements the following functionality.
- A simple shell
- The kdb core command set
- A registration API to register additional kdb shell commands.
- A good example of a self-contained kdb module is the ``ftdump``
command for dumping the ftrace buffer. See:
``kernel/trace/trace_kdb.c``
- For an example of how to dynamically register a new kdb command
you can build the kdb_hello.ko kernel module from
``samples/kdb/kdb_hello.c``. To build this example you can set
``CONFIG_SAMPLES=y`` and ``CONFIG_SAMPLE_KDB=m`` in your kernel
config. Later run ``modprobe kdb_hello`` and the next time you
enter the kdb shell, you can run the ``hello`` command.
- The implementation for :c:func:`kdb_printf` which emits messages directly
to I/O drivers, bypassing the kernel log.
- SW / HW breakpoint management for the kdb shell
5. kgdb I/O driver
Each kgdb I/O driver has to provide an implementation for the
following:
- configuration via built-in or module
- dynamic configuration and kgdb hook registration calls
- read and write character interface
- A cleanup handler for unconfiguring from the kgdb core
- (optional) Early debug methodology
Any given kgdb I/O driver has to operate very closely with the
hardware and must do it in such a way that does not enable interrupts
or change other parts of the system context without completely
restoring them. The kgdb core will repeatedly "poll" a kgdb I/O
driver for characters when it needs input. The I/O driver is expected
to return immediately if there is no data available. Doing so allows
for the future possibility to touch watchdog hardware in such a way
as to have a target system not reset when these are enabled.
If you are intent on adding kgdb architecture specific support for a new
architecture, the architecture should define ``HAVE_ARCH_KGDB`` in the
architecture specific Kconfig file. This will enable kgdb for the
architecture, and at that point you must create an architecture specific
kgdb implementation.
There are a few flags which must be set on every architecture in their
``asm/kgdb.h`` file. These are:
- ``NUMREGBYTES``:
The size in bytes of all of the registers, so that we
can ensure they will all fit into a packet.
- ``BUFMAX``:
The size in bytes of the buffer GDB will read into. This must
be larger than NUMREGBYTES.
- ``CACHE_FLUSH_IS_SAFE``:
Set to 1 if it is always safe to call
flush_cache_range or flush_icache_range. On some architectures,
these functions may not be safe to call on SMP since we keep other
CPUs in a holding pattern.
There are also the following functions for the common backend, found in
``kernel/kgdb.c``, that must be supplied by the architecture-specific
backend unless marked as (optional), in which case a default function
maybe used if the architecture does not need to provide a specific
implementation.
.. kernel-doc:: include/linux/kgdb.h
:internal:
kgdboc internals
----------------
kgdboc and uarts
~~~~~~~~~~~~~~~~
The kgdboc driver is actually a very thin driver that relies on the
underlying low level to the hardware driver having "polling hooks" to
which the tty driver is attached. In the initial implementation of
kgdboc the serial_core was changed to expose a low level UART hook for
doing polled mode reading and writing of a single character while in an
atomic context. When kgdb makes an I/O request to the debugger, kgdboc
invokes a callback in the serial core which in turn uses the callback in
the UART driver.
When using kgdboc with a UART, the UART driver must implement two
callbacks in the :c:type:`struct uart_ops <uart_ops>`.
Example from ``drivers/8250.c``::
#ifdef CONFIG_CONSOLE_POLL
.poll_get_char = serial8250_get_poll_char,
.poll_put_char = serial8250_put_poll_char,
#endif
Any implementation specifics around creating a polling driver use the
``#ifdef CONFIG_CONSOLE_POLL``, as shown above. Keep in mind that
polling hooks have to be implemented in such a way that they can be
called from an atomic context and have to restore the state of the UART
chip on return such that the system can return to normal when the
debugger detaches. You need to be very careful with any kind of lock you
consider, because failing here is most likely going to mean pressing the
reset button.
kgdboc and keyboards
~~~~~~~~~~~~~~~~~~~~~~~~
The kgdboc driver contains logic to configure communications with an
attached keyboard. The keyboard infrastructure is only compiled into the
kernel when ``CONFIG_KDB_KEYBOARD=y`` is set in the kernel configuration.
The core polled keyboard driver driver for PS/2 type keyboards is in
``drivers/char/kdb_keyboard.c``. This driver is hooked into the debug core
when kgdboc populates the callback in the array called
:c:type:`kdb_poll_funcs[]`. The :c:func:`kdb_get_kbd_char` is the top-level
function which polls hardware for single character input.
kgdboc and kms
~~~~~~~~~~~~~~~~~~
The kgdboc driver contains logic to request the graphics display to
switch to a text context when you are using ``kgdboc=kms,kbd``, provided
that you have a video driver which has a frame buffer console and atomic
kernel mode setting support.
Every time the kernel debugger is entered it calls
:c:func:`kgdboc_pre_exp_handler` which in turn calls :c:func:`con_debug_enter`
in the virtual console layer. On resuming kernel execution, the kernel
debugger calls :c:func:`kgdboc_post_exp_handler` which in turn calls
:c:func:`con_debug_leave`.
Any video driver that wants to be compatible with the kernel debugger
and the atomic kms callbacks must implement the ``mode_set_base_atomic``,
``fb_debug_enter`` and ``fb_debug_leave operations``. For the
``fb_debug_enter`` and ``fb_debug_leave`` the option exists to use the
generic drm fb helper functions or implement something custom for the
hardware. The following example shows the initialization of the
.mode_set_base_atomic operation in
drivers/gpu/drm/i915/intel_display.c::
static const struct drm_crtc_helper_funcs intel_helper_funcs = {
[...]
.mode_set_base_atomic = intel_pipe_set_base_atomic,
[...]
};
Here is an example of how the i915 driver initializes the
fb_debug_enter and fb_debug_leave functions to use the generic drm
helpers in ``drivers/gpu/drm/i915/intel_fb.c``::
static struct fb_ops intelfb_ops = {
[...]
.fb_debug_enter = drm_fb_helper_debug_enter,
.fb_debug_leave = drm_fb_helper_debug_leave,
[...]
};
Credits
=======
The following people have contributed to this document:
1. Amit Kale <amitkale@linsyssoft.com>
2. Tom Rini <trini@kernel.crashing.org>
In March 2008 this document was completely rewritten by:
- Jason Wessel <jason.wessel@windriver.com>
In Jan 2010 this document was updated to include kdb.
- Jason Wessel <jason.wessel@windriver.com>

View File

@ -1,90 +0,0 @@
DocBook XML [DEPRECATED]
========================
.. attention::
This section describes the deprecated DocBook XML toolchain. Please do not
create new DocBook XML template files. Please consider converting existing
DocBook XML templates files to Sphinx/reStructuredText.
Converting DocBook to Sphinx
----------------------------
Over time, we expect all of the documents under ``Documentation/DocBook`` to be
converted to Sphinx and reStructuredText. For most DocBook XML documents, a good
enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script,
which uses ``pandoc`` under the hood. For example::
$ cd Documentation/sphinx
$ ./tmplcvt ../DocBook/in.tmpl ../out.rst
Then edit the resulting rst files to fix any remaining issues, and add the
document in the ``toctree`` in ``Documentation/index.rst``.
Components of the kernel-doc system
-----------------------------------
Many places in the source tree have extractable documentation in the form of
block comments above functions. The components of this system are:
- ``scripts/kernel-doc``
This is a perl script that hunts for the block comments and can mark them up
directly into reStructuredText, DocBook, man, text, and HTML. (No, not
texinfo.)
- ``Documentation/DocBook/*.tmpl``
These are XML template files, which are normal XML files with special
place-holders for where the extracted documentation should go.
- ``scripts/docproc.c``
This is a program for converting XML template files into XML files. When a
file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be
able to distinguish between internal and external functions.
It invokes kernel-doc, giving it the list of functions that are to be
documented.
Additionally it is used to scan the XML template files to locate all the files
referenced herein. This is used to generate dependency information as used by
make.
- ``Makefile``
The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build
DocBook XML files, PostScript files, PDF files, and html files in
Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'.
- ``Documentation/DocBook/Makefile``
This is where C files are associated with SGML templates.
How to use kernel-doc comments in DocBook XML template files
------------------------------------------------------------
DocBook XML template files (\*.tmpl) are like normal XML files, except that they
can contain escape sequences where extracted documentation should be inserted.
``!E<filename>`` is replaced by the documentation, in ``<filename>``, for
functions that are exported using ``EXPORT_SYMBOL``: the function list is
collected from files listed in ``Documentation/DocBook/Makefile``.
``!I<filename>`` is replaced by the documentation for functions that are **not**
exported using ``EXPORT_SYMBOL``.
``!D<filename>`` is used to name additional files to search for functions
exported using ``EXPORT_SYMBOL``.
``!F<filename> <function [functions...]>`` is replaced by the documentation, in
``<filename>``, for the functions listed.
``!P<filename> <section title>`` is replaced by the contents of the ``DOC:``
section titled ``<section title>`` from ``<filename>``. Spaces are allowed in
``<section title>``; do not quote the ``<section title>``.
``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC:
sections and documented functions, symbols, etc. are used. This makes sense to
use when you use ``!F`` or ``!P`` only and want to verify that all documentation
is included.

View File

@ -10,7 +10,6 @@ How to write kernel documentation
sphinx.rst
kernel-doc.rst
parse-headers.rst
docbook.rst
.. only:: subproject and html

View File

@ -149,6 +149,16 @@ Domain`_ references.
``%CONST``
Name of a constant. (No cross-referencing, just formatting.)
````literal````
A literal block that should be handled as-is. The output will use a
``monospaced font``.
Useful if you need to use special characters that would otherwise have some
meaning either by kernel-doc script of by reStructuredText.
This is particularly useful if you need to use things like ``%ph`` inside
a function description.
``$ENVVAR``
Name of an environment variable. (No cross-referencing, just formatting.)

View File

@ -15,11 +15,6 @@ are used to describe the functions and types and design of the code. The
kernel-doc comments have some special structure and formatting, but beyond that
they are also treated as reStructuredText.
There is also the deprecated DocBook toolchain to generate documentation from
DocBook XML template files under ``Documentation/DocBook``. The DocBook files
are to be converted to reStructuredText, and the toolchain is slated to be
removed.
Finally, there are thousands of plain text documentation files scattered around
``Documentation``. Some of these will likely be converted to reStructuredText
over time, but the bulk of them will remain in plain text.

View File

@ -118,7 +118,6 @@ defkeymap.c
devlist.h*
devicetable-offsets.h
dnotify_test
docproc
dslm
dtc
elf2ecoff

View File

@ -13,8 +13,8 @@ I2C is a multi-master bus; open drain signaling is used to arbitrate
between masters, as well as to handshake and to synchronize clocks from
slower clients.
The Linux I2C programming interfaces support only the master side of bus
interactions, not the slave side. The programming interface is
The Linux I2C programming interfaces support the master side of bus
interactions and the slave side. The programming interface is
structured around two kinds of driver, and two kinds of device. An I2C
"Adapter Driver" abstracts the controller hardware; it binds to a
physical device (perhaps a PCI device or platform_device) and exposes a
@ -22,9 +22,8 @@ physical device (perhaps a PCI device or platform_device) and exposes a
I2C bus segment it manages. On each I2C bus segment will be I2C devices
represented by a :c:type:`struct i2c_client <i2c_client>`.
Those devices will be bound to a :c:type:`struct i2c_driver
<i2c_driver>`, which should follow the standard Linux driver
model. (At this writing, a legacy model is more widely used.) There are
functions to perform various I2C protocol operations; at this writing
<i2c_driver>`, which should follow the standard Linux driver model. There
are functions to perform various I2C protocol operations; at this writing
all such functions are usable only from task context.
The System Management Bus (SMBus) is a sibling protocol. Most SMBus

View File

@ -32,7 +32,13 @@ available subsections can be seen below.
i2c
hsi
edac
scsi
libata
mtdnand
miscellaneous
w1
rapidio
s390-drivers
vme
80211/index
uio-howto

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,107 @@
=======================
RapidIO Subsystem Guide
=======================
:Author: Matt Porter
Introduction
============
RapidIO is a high speed switched fabric interconnect with features aimed
at the embedded market. RapidIO provides support for memory-mapped I/O
as well as message-based transactions over the switched fabric network.
RapidIO has a standardized discovery mechanism not unlike the PCI bus
standard that allows simple detection of devices in a network.
This documentation is provided for developers intending to support
RapidIO on new architectures, write new drivers, or to understand the
subsystem internals.
Known Bugs and Limitations
==========================
Bugs
----
None. ;)
Limitations
-----------
1. Access/management of RapidIO memory regions is not supported
2. Multiple host enumeration is not supported
RapidIO driver interface
========================
Drivers are provided a set of calls in order to interface with the
subsystem to gather info on devices, request/map memory region
resources, and manage mailboxes/doorbells.
Functions
---------
.. kernel-doc:: include/linux/rio_drv.h
:internal:
.. kernel-doc:: drivers/rapidio/rio-driver.c
:export:
.. kernel-doc:: drivers/rapidio/rio.c
:export:
Internals
=========
This chapter contains the autogenerated documentation of the RapidIO
subsystem.
Structures
----------
.. kernel-doc:: include/linux/rio.h
:internal:
Enumeration and Discovery
-------------------------
.. kernel-doc:: drivers/rapidio/rio-scan.c
:internal:
Driver functionality
--------------------
.. kernel-doc:: drivers/rapidio/rio.c
:internal:
.. kernel-doc:: drivers/rapidio/rio-access.c
:internal:
Device model support
--------------------
.. kernel-doc:: drivers/rapidio/rio-driver.c
:internal:
PPC32 support
-------------
.. kernel-doc:: arch/powerpc/sysdev/fsl_rio.c
:internal:
Credits
=======
The following people have contributed to the RapidIO subsystem directly
or indirectly:
1. Matt Porter\ mporter@kernel.crashing.org
2. Randy Vinson\ rvinson@mvista.com
3. Dan Malek\ dan@embeddedalley.com
The following people have contributed to this document:
1. Matt Porter\ mporter@kernel.crashing.org

View File

@ -0,0 +1,111 @@
===================================
Writing s390 channel device drivers
===================================
:Author: Cornelia Huck
Introduction
============
This document describes the interfaces available for device drivers that
drive s390 based channel attached I/O devices. This includes interfaces
for interaction with the hardware and interfaces for interacting with
the common driver core. Those interfaces are provided by the s390 common
I/O layer.
The document assumes a familarity with the technical terms associated
with the s390 channel I/O architecture. For a description of this
architecture, please refer to the "z/Architecture: Principles of
Operation", IBM publication no. SA22-7832.
While most I/O devices on a s390 system are typically driven through the
channel I/O mechanism described here, there are various other methods
(like the diag interface). These are out of the scope of this document.
Some additional information can also be found in the kernel source under
Documentation/s390/driver-model.txt.
The ccw bus
===========
The ccw bus typically contains the majority of devices available to a
s390 system. Named after the channel command word (ccw), the basic
command structure used to address its devices, the ccw bus contains
so-called channel attached devices. They are addressed via I/O
subchannels, visible on the css bus. A device driver for
channel-attached devices, however, will never interact with the
subchannel directly, but only via the I/O device on the ccw bus, the ccw
device.
I/O functions for channel-attached devices
------------------------------------------
Some hardware structures have been translated into C structures for use
by the common I/O layer and device drivers. For more information on the
hardware structures represented here, please consult the Principles of
Operation.
.. kernel-doc:: arch/s390/include/asm/cio.h
:internal:
ccw devices
-----------
Devices that want to initiate channel I/O need to attach to the ccw bus.
Interaction with the driver core is done via the common I/O layer, which
provides the abstractions of ccw devices and ccw device drivers.
The functions that initiate or terminate channel I/O all act upon a ccw
device structure. Device drivers must not bypass those functions or
strange side effects may happen.
.. kernel-doc:: arch/s390/include/asm/ccwdev.h
:internal:
.. kernel-doc:: drivers/s390/cio/device.c
:export:
.. kernel-doc:: drivers/s390/cio/device_ops.c
:export:
The channel-measurement facility
--------------------------------
The channel-measurement facility provides a means to collect measurement
data which is made available by the channel subsystem for each channel
attached device.
.. kernel-doc:: arch/s390/include/asm/cmb.h
:internal:
.. kernel-doc:: drivers/s390/cio/cmf.c
:export:
The ccwgroup bus
================
The ccwgroup bus only contains artificial devices, created by the user.
Many networking devices (e.g. qeth) are in fact composed of several ccw
devices (like read, write and data channel for qeth). The ccwgroup bus
provides a mechanism to create a meta-device which contains those ccw
devices as slave devices and can be associated with the netdevice.
ccw group devices
-----------------
.. kernel-doc:: arch/s390/include/asm/ccwgroup.h
:internal:
.. kernel-doc:: drivers/s390/cio/ccwgroup.c
:export:
Generic interfaces
==================
Some interfaces are available to other drivers that do not necessarily
have anything to do with the busses described above, but still are
indirectly using basic infrastructure in the common I/O layer. One
example is the support for adapter interrupts.
.. kernel-doc:: drivers/s390/cio/airq.c
:export:

View File

@ -0,0 +1,344 @@
=====================
SCSI Interfaces Guide
=====================
:Author: James Bottomley
:Author: Rob Landley
Introduction
============
Protocol vs bus
---------------
Once upon a time, the Small Computer Systems Interface defined both a
parallel I/O bus and a data protocol to connect a wide variety of
peripherals (disk drives, tape drives, modems, printers, scanners,
optical drives, test equipment, and medical devices) to a host computer.
Although the old parallel (fast/wide/ultra) SCSI bus has largely fallen
out of use, the SCSI command set is more widely used than ever to
communicate with devices over a number of different busses.
The `SCSI protocol <http://www.t10.org/scsi-3.htm>`__ is a big-endian
peer-to-peer packet based protocol. SCSI commands are 6, 10, 12, or 16
bytes long, often followed by an associated data payload.
SCSI commands can be transported over just about any kind of bus, and
are the default protocol for storage devices attached to USB, SATA, SAS,
Fibre Channel, FireWire, and ATAPI devices. SCSI packets are also
commonly exchanged over Infiniband,
`I20 <http://i2o.shadowconnect.com/faq.php>`__, TCP/IP
(`iSCSI <https://en.wikipedia.org/wiki/ISCSI>`__), even `Parallel
ports <http://cyberelk.net/tim/parport/parscsi.html>`__.
Design of the Linux SCSI subsystem
----------------------------------
The SCSI subsystem uses a three layer design, with upper, mid, and low
layers. Every operation involving the SCSI subsystem (such as reading a
sector from a disk) uses one driver at each of the 3 levels: one upper
layer driver, one lower layer driver, and the SCSI midlayer.
The SCSI upper layer provides the interface between userspace and the
kernel, in the form of block and char device nodes for I/O and ioctl().
The SCSI lower layer contains drivers for specific hardware devices.
In between is the SCSI mid-layer, analogous to a network routing layer
such as the IPv4 stack. The SCSI mid-layer routes a packet based data
protocol between the upper layer's /dev nodes and the corresponding
devices in the lower layer. It manages command queues, provides error
handling and power management functions, and responds to ioctl()
requests.
SCSI upper layer
================
The upper layer supports the user-kernel interface by providing device
nodes.
sd (SCSI Disk)
--------------
sd (sd_mod.o)
sr (SCSI CD-ROM)
----------------
sr (sr_mod.o)
st (SCSI Tape)
--------------
st (st.o)
sg (SCSI Generic)
-----------------
sg (sg.o)
ch (SCSI Media Changer)
-----------------------
ch (ch.c)
SCSI mid layer
==============
SCSI midlayer implementation
----------------------------
include/scsi/scsi_device.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. kernel-doc:: include/scsi/scsi_device.h
:internal:
drivers/scsi/scsi.c
~~~~~~~~~~~~~~~~~~~
Main file for the SCSI midlayer.
.. kernel-doc:: drivers/scsi/scsi.c
:export:
drivers/scsi/scsicam.c
~~~~~~~~~~~~~~~~~~~~~~
`SCSI Common Access
Method <http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf>`__ support
functions, for use with HDIO_GETGEO, etc.
.. kernel-doc:: drivers/scsi/scsicam.c
:export:
drivers/scsi/scsi_error.c
~~~~~~~~~~~~~~~~~~~~~~~~~~
Common SCSI error/timeout handling routines.
.. kernel-doc:: drivers/scsi/scsi_error.c
:export:
drivers/scsi/scsi_devinfo.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Manage scsi_dev_info_list, which tracks blacklisted and whitelisted
devices.
.. kernel-doc:: drivers/scsi/scsi_devinfo.c
:internal:
drivers/scsi/scsi_ioctl.c
~~~~~~~~~~~~~~~~~~~~~~~~~~
Handle ioctl() calls for SCSI devices.
.. kernel-doc:: drivers/scsi/scsi_ioctl.c
:export:
drivers/scsi/scsi_lib.c
~~~~~~~~~~~~~~~~~~~~~~~~
SCSI queuing library.
.. kernel-doc:: drivers/scsi/scsi_lib.c
:export:
drivers/scsi/scsi_lib_dma.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SCSI library functions depending on DMA (map and unmap scatter-gather
lists).
.. kernel-doc:: drivers/scsi/scsi_lib_dma.c
:export:
drivers/scsi/scsi_module.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_module.c contains legacy support for
old-style host templates. It should never be used by any new driver.
drivers/scsi/scsi_proc.c
~~~~~~~~~~~~~~~~~~~~~~~~~
The functions in this file provide an interface between the PROC file
system and the SCSI device drivers It is mainly used for debugging,
statistics and to pass information directly to the lowlevel driver. I.E.
plumbing to manage /proc/scsi/\*
.. kernel-doc:: drivers/scsi/scsi_proc.c
:internal:
drivers/scsi/scsi_netlink.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Infrastructure to provide async events from transports to userspace via
netlink, using a single NETLINK_SCSITRANSPORT protocol for all
transports. See `the original patch
submission <http://marc.info/?l=linux-scsi&m=115507374832500&w=2>`__ for
more details.
.. kernel-doc:: drivers/scsi/scsi_netlink.c
:internal:
drivers/scsi/scsi_scan.c
~~~~~~~~~~~~~~~~~~~~~~~~~
Scan a host to determine which (if any) devices are attached. The
general scanning/probing algorithm is as follows, exceptions are made to
it depending on device specific flags, compilation options, and global
variable (boot or module load time) settings. A specific LUN is scanned
via an INQUIRY command; if the LUN has a device attached, a scsi_device
is allocated and setup for it. For every id of every channel on the
given host, start by scanning LUN 0. Skip hosts that don't respond at
all to a scan of LUN 0. Otherwise, if LUN 0 has a device attached,
allocate and setup a scsi_device for it. If target is SCSI-3 or up,
issue a REPORT LUN, and scan all of the LUNs returned by the REPORT LUN;
else, sequentially scan LUNs up until some maximum is reached, or a LUN
is seen that cannot have a device attached to it.
.. kernel-doc:: drivers/scsi/scsi_scan.c
:internal:
drivers/scsi/scsi_sysctl.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Set up the sysctl entry: "/dev/scsi/logging_level"
(DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level.
drivers/scsi/scsi_sysfs.c
~~~~~~~~~~~~~~~~~~~~~~~~~~
SCSI sysfs interface routines.
.. kernel-doc:: drivers/scsi/scsi_sysfs.c
:export:
drivers/scsi/hosts.c
~~~~~~~~~~~~~~~~~~~~
mid to lowlevel SCSI driver interface
.. kernel-doc:: drivers/scsi/hosts.c
:export:
drivers/scsi/constants.c
~~~~~~~~~~~~~~~~~~~~~~~~
mid to lowlevel SCSI driver interface
.. kernel-doc:: drivers/scsi/constants.c
:export:
Transport classes
-----------------
Transport classes are service libraries for drivers in the SCSI lower
layer, which expose transport attributes in sysfs.
Fibre Channel transport
~~~~~~~~~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_transport_fc.c defines transport attributes
for Fibre Channel.
.. kernel-doc:: drivers/scsi/scsi_transport_fc.c
:export:
iSCSI transport class
~~~~~~~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_transport_iscsi.c defines transport
attributes for the iSCSI class, which sends SCSI packets over TCP/IP
connections.
.. kernel-doc:: drivers/scsi/scsi_transport_iscsi.c
:export:
Serial Attached SCSI (SAS) transport class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_transport_sas.c defines transport
attributes for Serial Attached SCSI, a variant of SATA aimed at large
high-end systems.
The SAS transport class contains common code to deal with SAS HBAs, an
aproximated representation of SAS topologies in the driver model, and
various sysfs attributes to expose these topologies and management
interfaces to userspace.
In addition to the basic SCSI core objects this transport class
introduces two additional intermediate objects: The SAS PHY as
represented by struct sas_phy defines an "outgoing" PHY on a SAS HBA or
Expander, and the SAS remote PHY represented by struct sas_rphy defines
an "incoming" PHY on a SAS Expander or end device. Note that this is
purely a software concept, the underlying hardware for a PHY and a
remote PHY is the exactly the same.
There is no concept of a SAS port in this code, users can see what PHYs
form a wide port based on the port_identifier attribute, which is the
same for all PHYs in a port.
.. kernel-doc:: drivers/scsi/scsi_transport_sas.c
:export:
SATA transport class
~~~~~~~~~~~~~~~~~~~~
The SATA transport is handled by libata, which has its own book of
documentation in this directory.
Parallel SCSI (SPI) transport class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_transport_spi.c defines transport
attributes for traditional (fast/wide/ultra) SCSI busses.
.. kernel-doc:: drivers/scsi/scsi_transport_spi.c
:export:
SCSI RDMA (SRP) transport class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_transport_srp.c defines transport
attributes for SCSI over Remote Direct Memory Access.
.. kernel-doc:: drivers/scsi/scsi_transport_srp.c
:export:
SCSI lower layer
================
Host Bus Adapter transport types
--------------------------------
Many modern device controllers use the SCSI command set as a protocol to
communicate with their devices through many different types of physical
connections.
In SCSI language a bus capable of carrying SCSI commands is called a
"transport", and a controller connecting to such a bus is called a "host
bus adapter" (HBA).
Debug transport
~~~~~~~~~~~~~~~
The file drivers/scsi/scsi_debug.c simulates a host adapter with a
variable number of disks (or disk like devices) attached, sharing a
common amount of RAM. Does a lot of checking to make sure that we are
not getting blocks mixed up, and panics the kernel if anything out of
the ordinary is seen.
To be more realistic, the simulated devices have the transport
attributes of SAS disks.
For documentation see http://sg.danny.cz/sg/sdebug26.html
todo
~~~~
Parallel (fast/wide/ultra) SCSI, USB, SATA, SAS, Fibre Channel,
FireWire, ATAPI devices, Infiniband, I20, iSCSI, Parallel ports,
netlink...

View File

@ -0,0 +1,70 @@
======================
W1: Dallas' 1-wire bus
======================
:Author: David Fries
W1 API internal to the kernel
=============================
W1 API internal to the kernel
-----------------------------
include/linux/w1.h
~~~~~~~~~~~~~~~~~~
W1 kernel API functions.
.. kernel-doc:: include/linux/w1.h
:internal:
drivers/w1/w1.c
~~~~~~~~~~~~~~~
W1 core functions.
.. kernel-doc:: drivers/w1/w1.c
:internal:
drivers/w1/w1_family.c
~~~~~~~~~~~~~~~~~~~~~~~
Allows registering device family operations.
.. kernel-doc:: drivers/w1/w1_family.c
:export:
drivers/w1/w1_internal.h
~~~~~~~~~~~~~~~~~~~~~~~~
W1 internal initialization for master devices.
.. kernel-doc:: drivers/w1/w1_internal.h
:internal:
drivers/w1/w1_int.c
~~~~~~~~~~~~~~~~~~~~
W1 internal initialization for master devices.
.. kernel-doc:: drivers/w1/w1_int.c
:export:
drivers/w1/w1_netlink.h
~~~~~~~~~~~~~~~~~~~~~~~~
W1 external netlink API structures and commands.
.. kernel-doc:: drivers/w1/w1_netlink.h
:internal:
drivers/w1/w1_io.c
~~~~~~~~~~~~~~~~~~~
W1 input/output.
.. kernel-doc:: drivers/w1/w1_io.c
:export:
.. kernel-doc:: drivers/w1/w1_io.c
:internal:

View File

@ -289,12 +289,12 @@ the FB_CAP_FOURCC bit in the fb_fix_screeninfo capabilities field.
FOURCC definitions are located in the linux/videodev2.h header. However, and
despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2
and don't require usage of the V4L2 subsystem. FOURCC documentation is
available in Documentation/DocBook/v4l/pixfmt.xml.
available in Documentation/media/uapi/v4l/pixfmt.rst.
To select a format, applications set the grayscale field to the desired FOURCC.
For YUV formats, they should also select the appropriate colorspace by setting
the colorspace field to one of the colorspaces listed in linux/videodev2.h and
documented in Documentation/DocBook/v4l/colorspaces.xml.
documented in Documentation/media/uapi/v4l/colorspaces.rst.
The red, green, blue and transp fields are not used with the FOURCC-based API.
For forward compatibility reasons applications must zero those fields, and

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "Linux Filesystems API"
tags.add("subproject")
latex_documents = [
('index', 'filesystems.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,317 @@
=====================
Linux Filesystems API
=====================
The Linux VFS
=============
The Filesystem types
--------------------
.. kernel-doc:: include/linux/fs.h
:internal:
The Directory Cache
-------------------
.. kernel-doc:: fs/dcache.c
:export:
.. kernel-doc:: include/linux/dcache.h
:internal:
Inode Handling
--------------
.. kernel-doc:: fs/inode.c
:export:
.. kernel-doc:: fs/bad_inode.c
:export:
Registration and Superblocks
----------------------------
.. kernel-doc:: fs/super.c
:export:
File Locks
----------
.. kernel-doc:: fs/locks.c
:export:
.. kernel-doc:: fs/locks.c
:internal:
Other Functions
---------------
.. kernel-doc:: fs/mpage.c
:export:
.. kernel-doc:: fs/namei.c
:export:
.. kernel-doc:: fs/buffer.c
:export:
.. kernel-doc:: block/bio.c
:export:
.. kernel-doc:: fs/seq_file.c
:export:
.. kernel-doc:: fs/filesystems.c
:export:
.. kernel-doc:: fs/fs-writeback.c
:export:
.. kernel-doc:: fs/block_dev.c
:export:
The proc filesystem
===================
sysctl interface
----------------
.. kernel-doc:: kernel/sysctl.c
:export:
proc filesystem interface
-------------------------
.. kernel-doc:: fs/proc/base.c
:internal:
Events based on file descriptors
================================
.. kernel-doc:: fs/eventfd.c
:export:
The Filesystem for Exporting Kernel Objects
===========================================
.. kernel-doc:: fs/sysfs/file.c
:export:
.. kernel-doc:: fs/sysfs/symlink.c
:export:
The debugfs filesystem
======================
debugfs interface
-----------------
.. kernel-doc:: fs/debugfs/inode.c
:export:
.. kernel-doc:: fs/debugfs/file.c
:export:
The Linux Journalling API
=========================
Overview
--------
Details
~~~~~~~
The journalling layer is easy to use. You need to first of all create a
journal_t data structure. There are two calls to do this dependent on
how you decide to allocate the physical media on which the journal
resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in
filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used
for journal stored on a raw device (in a continuous range of blocks). A
journal_t is a typedef for a struct pointer, so when you are finally
finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up
any used kernel memory.
Once you have got your journal_t object you need to 'mount' or load the
journal file. The journalling layer expects the space for the journal
was already allocated and initialized properly by the userspace tools.
When loading the journal you must call :c:func:`jbd2_journal_load` to process
journal contents. If the client file system detects the journal contents
does not need to be processed (or even need not have valid contents), it
may call :c:func:`jbd2_journal_wipe` to clear the journal contents before
calling :c:func:`jbd2_journal_load`.
Note that jbd2_journal_wipe(..,0) calls
:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding
transactions in the journal and similarly :c:func:`jbd2_journal_load` will
call :c:func:`jbd2_journal_recover` if necessary. I would advise reading
:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage.
Now you can go ahead and start modifying the underlying filesystem.
Almost.
You still need to actually journal your filesystem changes, this is done
by wrapping them into transactions. Additionally you also need to wrap
the modification of each of the buffers with calls to the journal layer,
so it knows what the modifications you are actually making are. To do
this use :c:func:`jbd2_journal_start` which returns a transaction handle.
:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`,
which indicates the end of a transaction are nestable calls, so you can
reenter a transaction if necessary, but remember you must call
:c:func:`jbd2_journal_stop` the same number of times as
:c:func:`jbd2_journal_start` before the transaction is completed (or more
accurately leaves the update phase). Ext4/VFS makes use of this feature to
simplify handling of inode dirtying, quota support, etc.
Inside each transaction you need to wrap the modifications to the
individual buffers (blocks). Before you start to modify a buffer you
need to call :c:func:`jbd2_journal_get_create_access()` /
:c:func:`jbd2_journal_get_write_access()` /
:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the
journalling layer to copy the unmodified
data if it needs to. After all the buffer may be part of a previously
uncommitted transaction. At this point you are at last ready to modify a
buffer, and once you are have done so you need to call
:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a
buffer you now know is now longer required to be pushed back on the
device you can call :c:func:`jbd2_journal_forget` in much the same way as you
might have used :c:func:`bforget` in the past.
A :c:func:`jbd2_journal_flush` may be called at any time to commit and
checkpoint all your transactions.
Then at umount time , in your :c:func:`put_super` you can then call
:c:func:`jbd2_journal_destroy` to clean up your in-core journal object.
Unfortunately there a couple of ways the journal layer can cause a
deadlock. The first thing to note is that each task can only have a
single outstanding transaction at any one time, remember nothing commits
until the outermost :c:func:`jbd2_journal_stop`. This means you must complete
the transaction at the end of each file/inode/address etc. operation you
perform, so that the journalling system isn't re-entered on another
journal. Since transactions can't be nested/batched across differing
journals, and another filesystem other than yours (say ext4) may be
modified in a later syscall.
The second case to bear in mind is that :c:func:`jbd2_journal_start` can block
if there isn't enough space in the journal for your transaction (based
on the passed nblocks param) - when it blocks it merely(!) needs to wait
for transactions to complete and be committed from other tasks, so
essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid
deadlocks you must treat :c:func:`jbd2_journal_start` /
:c:func:`jbd2_journal_stop` as if they were semaphores and include them in
your semaphore ordering rules to prevent
deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking
behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as
easily as on :c:func:`jbd2_journal_start`.
Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this
transaction. I advise having a look at at least ext4_jbd.h to see the
basis on which ext4 uses to make these decisions.
Another wriggle to watch out for is your on-disk block allocation
strategy. Why? Because, if you do a delete, you need to ensure you
haven't reused any of the freed blocks until the transaction freeing
these blocks commits. If you reused these blocks and crash happens,
there is no way to restore the contents of the reallocated blocks at the
end of the last fully committed transaction. One simple way of doing
this is to mark blocks as free in internal in-memory block allocation
structures only after the transaction freeing them commits. Ext4 uses
journal commit callback for this purpose.
With journal commit callbacks you can ask the journalling layer to call
a callback function when the transaction is finally committed to disk,
so that you can do some of your own management. You ask the journalling
layer for calling the callback by simply setting
``journal->j_commit_callback`` function pointer and that function is
called after each transaction commit. You can also use
``transaction->t_private_list`` for attaching entries to a transaction
that need processing when the transaction commits.
JBD2 also provides a way to block all transaction updates via
:c:func:`jbd2_journal_lock_updates()` /
:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a
window with a clean and stable fs for a moment. E.g.
::
jbd2_journal_lock_updates() //stop new stuff happening..
jbd2_journal_flush() // checkpoint everything.
..do stuff on stable fs
jbd2_journal_unlock_updates() // carry on with filesystem use.
The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing
these calls.
Summary
~~~~~~~
Using the journal is a matter of wrapping the different context changes,
being each mount, each modification (transaction) and each changed
buffer to tell the journalling layer about them.
Data Types
----------
The journalling layer uses typedefs to 'hide' the concrete definitions
of the structures used. As a client of the JBD2 layer you can just rely
on the using the pointer as a magic cookie of some sort. Obviously the
hiding is not enforced as this is 'C'.
Structures
~~~~~~~~~~
.. kernel-doc:: include/linux/jbd2.h
:internal:
Functions
---------
The functions here are split into two groups those that affect a journal
as a whole, and those which are used to manage transactions
Journal Level
~~~~~~~~~~~~~
.. kernel-doc:: fs/jbd2/journal.c
:export:
.. kernel-doc:: fs/jbd2/recovery.c
:internal:
Transasction Level
~~~~~~~~~~~~~~~~~~
.. kernel-doc:: fs/jbd2/transaction.c
See also
--------
`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
splice API
==========
splice is a method for moving blocks of data around inside the kernel,
without continually transferring them between the kernel and user space.
.. kernel-doc:: fs/splice.c
pipes API
=========
Pipe interfaces are all for in-kernel (builtin image) use. They are not
exported for use by modules.
.. kernel-doc:: include/linux/pipe_fs_i.h
:internal:
.. kernel-doc:: fs/pipe.c

View File

@ -55,7 +55,7 @@ request-key will find the first matching line and corresponding program. In
this case, /some/other/program will handle all uid lookups and
/usr/sbin/nfs.idmap will handle gid, user, and group lookups.
See <file:Documentation/security/keys-request-key.txt> for more information
See <file:Documentation/security/keys/request-key.rst> for more information
about the request-key function.

View File

@ -228,7 +228,7 @@ The DRM reference documentation is still lacking kerneldoc in a few areas. The
task would be to clean up interfaces like moving functions around between
files to better group them and improving the interfaces like dropping return
values for functions that never fail. Then write kerneldoc for all exported
functions and an overview section and integrate it all into the drm DocBook.
functions and an overview section and integrate it all into the drm book.
See https://dri.freedesktop.org/docs/drm/ for what's there already.

View File

@ -3,8 +3,8 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to The Linux Kernel's documentation
===========================================
The Linux Kernel documentation
==============================
This is the top level of the kernel's documentation tree. Kernel
documentation, like the kernel itself, is very much a work in progress;
@ -51,6 +51,7 @@ merged much easier.
process/index
dev-tools/index
doc-guide/index
kernel-hacking/index
Kernel API documentation
------------------------
@ -67,11 +68,24 @@ needed).
driver-api/index
core-api/index
media/index
networking/index
input/index
gpu/index
security/index
sound/index
crypto/index
filesystems/index
Architecture-specific documentation
-----------------------------------
These books provide programming details about architecture-specific
implementation.
.. toctree::
:maxdepth: 2
sh/index
Korean translations
-------------------

View File

@ -1331,7 +1331,7 @@ See subsequent chapter for the syntax of the Kbuild file.
--- 7.5 mandatory-y
mandatory-y is essentially used by include/uapi/asm-generic/Kbuild.asm
to define the minimun set of headers that must be exported in
to define the minimum set of headers that must be exported in
include/asm.
The convention is to list one subdir per line and

View File

@ -17,8 +17,8 @@ The format for this documentation is called the kernel-doc format.
It is documented in this Documentation/kernel-doc-nano-HOWTO.txt file.
This style embeds the documentation within the source files, using
a few simple conventions. The scripts/kernel-doc perl script, some
SGML templates in Documentation/DocBook, and other tools understand
a few simple conventions. The scripts/kernel-doc perl script, the
Documentation/sphinx/kerneldoc.py Sphinx extension and other tools understand
these conventions, and are used to extract this embedded documentation
into various documents.
@ -122,15 +122,9 @@ are:
- scripts/kernel-doc
This is a perl script that hunts for the block comments and can mark
them up directly into DocBook, man, text, and HTML. (No, not
them up directly into DocBook, ReST, man, text, and HTML. (No, not
texinfo.)
- Documentation/DocBook/*.tmpl
These are SGML template files, which are normal SGML files with
special place-holders for where the extracted documentation should
go.
- scripts/docproc.c
This is a program for converting SGML template files into SGML
@ -145,25 +139,18 @@ are:
- Makefile
The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used
to build XML DocBook files, PostScript files, PDF files, and html files
in Documentation/DocBook. The older target 'sgmldocs' is equivalent
to 'xmldocs'.
- Documentation/DocBook/Makefile
This is where C files are associated with SGML templates.
The targets 'xmldocs', 'latexdocs', 'pdfdocs', 'epubdocs'and 'htmldocs'
are used to build XML DocBook files, LaTeX files, PDF files,
ePub files and html files in Documentation/.
How to extract the documentation
--------------------------------
If you just want to read the ready-made books on the various
subsystems (see Documentation/DocBook/*.tmpl), just type 'make
psdocs', or 'make pdfdocs', or 'make htmldocs', depending on your
preference. If you would rather read a different format, you can type
'make xmldocs' and then use DocBook tools to convert
Documentation/DocBook/*.xml to a format of your choice (for example,
subsystems, just type 'make epubdocs', or 'make pdfdocs', or 'make htmldocs',
depending on your preference. If you would rather read a different format,
you can type 'make xmldocs' and then use DocBook tools to convert
Documentation/output/*.xml to a format of your choice (for example,
'db2html ...' if 'make htmldocs' was not defined).
If you want to see man pages instead, you can do this:
@ -329,37 +316,7 @@ This is done by using a DOC: section keyword with a section title. E.g.:
* hardware, software, or its subject(s).
*/
DOC: sections are used in SGML templates files as indicated below.
How to make new SGML template files
-----------------------------------
SGML template files (*.tmpl) are like normal SGML files, except that
they can contain escape sequences where extracted documentation should
be inserted.
!E<filename> is replaced by the documentation, in <filename>, for
functions that are exported using EXPORT_SYMBOL: the function list is
collected from files listed in Documentation/DocBook/Makefile.
!I<filename> is replaced by the documentation for functions that are
_not_ exported using EXPORT_SYMBOL.
!D<filename> is used to name additional files to search for functions
exported using EXPORT_SYMBOL.
!F<filename> <function [functions...]> is replaced by the
documentation, in <filename>, for the functions listed.
!P<filename> <section title> is replaced by the contents of the DOC:
section titled <section title> from <filename>.
Spaces are allowed in <section title>; do not quote the <section title>.
!C<filename> is replaced by nothing, but makes the tools check that
all DOC: sections and documented functions, symbols, etc. are used.
This makes sense to use when you use !F/!P only and want to verify
that all documentation is included.
DOC: sections are used in ReST files.
Tim.
*/ <twaugh@redhat.com>

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "Kernel Hacking Guides"
tags.add("subproject")
latex_documents = [
('index', 'kernel-hacking.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,811 @@
============================================
Unreliable Guide To Hacking The Linux Kernel
============================================
:Author: Rusty Russell
Introduction
============
Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux
Kernel Hacking. This document describes the common routines and general
requirements for kernel code: its goal is to serve as a primer for Linux
kernel development for experienced C programmers. I avoid implementation
details: that's what the code is for, and I ignore whole tracts of
useful routines.
Before you read this, please understand that I never wanted to write
this document, being grossly under-qualified, but I always wanted to
read it, and this was the only way. I hope it will grow into a
compendium of best practice, common starting points and random
information.
The Players
===========
At any time each of the CPUs in a system can be:
- not associated with any process, serving a hardware interrupt;
- not associated with any process, serving a softirq or tasklet;
- running in kernel space, associated with a process (user context);
- running a process in user space.
There is an ordering between these. The bottom two can preempt each
other, but above that is a strict hierarchy: each can only be preempted
by the ones above it. For example, while a softirq is running on a CPU,
no other softirq will preempt it, but a hardware interrupt can. However,
any other CPUs in the system execute independently.
We'll see a number of ways that the user context can block interrupts,
to become truly non-preemptable.
User Context
------------
User context is when you are coming in from a system call or other trap:
like userspace, you can be preempted by more important tasks and by
interrupts. You can sleep, by calling :c:func:`schedule()`.
.. note::
You are always in user context on module load and unload, and on
operations on the block device layer.
In user context, the ``current`` pointer (indicating the task we are
currently executing) is valid, and :c:func:`in_interrupt()`
(``include/linux/preempt.h``) is false.
.. warning::
Beware that if you have preemption or softirqs disabled (see below),
:c:func:`in_interrupt()` will return a false positive.
Hardware Interrupts (Hard IRQs)
-------------------------------
Timer ticks, network cards and keyboard are examples of real hardware
which produce interrupts at any time. The kernel runs interrupt
handlers, which services the hardware. The kernel guarantees that this
handler is never re-entered: if the same interrupt arrives, it is queued
(or dropped). Because it disables interrupts, this handler has to be
fast: frequently it simply acknowledges the interrupt, marks a 'software
interrupt' for execution and exits.
You can tell you are in a hardware interrupt, because
:c:func:`in_irq()` returns true.
.. warning::
Beware that this will return a false positive if interrupts are
disabled (see below).
Software Interrupt Context: Softirqs and Tasklets
-------------------------------------------------
Whenever a system call is about to return to userspace, or a hardware
interrupt handler exits, any 'software interrupts' which are marked
pending (usually by hardware interrupts) are run (``kernel/softirq.c``).
Much of the real interrupt handling work is done here. Early in the
transition to SMP, there were only 'bottom halves' (BHs), which didn't
take advantage of multiple CPUs. Shortly after we switched from wind-up
computers made of match-sticks and snot, we abandoned this limitation
and switched to 'softirqs'.
``include/linux/interrupt.h`` lists the different softirqs. A very
important softirq is the timer softirq (``include/linux/timer.h``): you
can register to have it call functions for you in a given length of
time.
Softirqs are often a pain to deal with, since the same softirq will run
simultaneously on more than one CPU. For this reason, tasklets
(``include/linux/interrupt.h``) are more often used: they are
dynamically-registrable (meaning you can have as many as you want), and
they also guarantee that any tasklet will only run on one CPU at any
time, although different tasklets can run simultaneously.
.. warning::
The name 'tasklet' is misleading: they have nothing to do with
'tasks', and probably more to do with some bad vodka Alexey
Kuznetsov had at the time.
You can tell you are in a softirq (or tasklet) using the
:c:func:`in_softirq()` macro (``include/linux/preempt.h``).
.. warning::
Beware that this will return a false positive if a
:ref:`botton half lock <local_bh_disable>` is held.
Some Basic Rules
================
No memory protection
If you corrupt memory, whether in user context or interrupt context,
the whole machine will crash. Are you sure you can't do what you
want in userspace?
No floating point or MMX
The FPU context is not saved; even in user context the FPU state
probably won't correspond with the current process: you would mess
with some user process' FPU state. If you really want to do this,
you would have to explicitly save/restore the full FPU state (and
avoid context switches). It is generally a bad idea; use fixed point
arithmetic first.
A rigid stack limit
Depending on configuration options the kernel stack is about 3K to
6K for most 32-bit architectures: it's about 14K on most 64-bit
archs, and often shared with interrupts so you can't use it all.
Avoid deep recursion and huge local arrays on the stack (allocate
them dynamically instead).
The Linux kernel is portable
Let's keep it that way. Your code should be 64-bit clean, and
endian-independent. You should also minimize CPU specific stuff,
e.g. inline assembly should be cleanly encapsulated and minimized to
ease porting. Generally it should be restricted to the
architecture-dependent part of the kernel tree.
ioctls: Not writing a new system call
=====================================
A system call generally looks like this::
asmlinkage long sys_mycall(int arg)
{
return 0;
}
First, in most cases you don't want to create a new system call. You
create a character device and implement an appropriate ioctl for it.
This is much more flexible than system calls, doesn't have to be entered
in every architecture's ``include/asm/unistd.h`` and
``arch/kernel/entry.S`` file, and is much more likely to be accepted by
Linus.
If all your routine does is read or write some parameter, consider
implementing a :c:func:`sysfs()` interface instead.
Inside the ioctl you're in user context to a process. When a error
occurs you return a negated errno (see
``include/uapi/asm-generic/errno-base.h``,
``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
otherwise you return 0.
After you slept you should check if a signal occurred: the Unix/Linux
way of handling signals is to temporarily exit the system call with the
``-ERESTARTSYS`` error. The system call entry code will switch back to
user context, process the signal handler and then your system call will
be restarted (unless the user disabled that). So you should be prepared
to process the restart, e.g. if you're in the middle of manipulating
some data structure.
::
if (signal_pending(current))
return -ERESTARTSYS;
If you're doing longer computations: first think userspace. If you
**really** want to do it in kernel you should regularly check if you need
to give up the CPU (remember there is cooperative multitasking per CPU).
Idiom::
cond_resched(); /* Will sleep */
A short note on interface design: the UNIX system call motto is "Provide
mechanism not policy".
Recipes for Deadlock
====================
You cannot call any routines which may sleep, unless:
- You are in user context.
- You do not own any spinlocks.
- You have interrupts enabled (actually, Andi Kleen says that the
scheduling code will enable them for you, but that's probably not
what you wanted).
Note that some functions may sleep implicitly: common ones are the user
space access functions (\*_user) and memory allocation functions
without ``GFP_ATOMIC``.
You should always compile your kernel ``CONFIG_DEBUG_ATOMIC_SLEEP`` on,
and it will warn you if you break these rules. If you **do** break the
rules, you will eventually lock up your box.
Really.
Common Routines
===============
:c:func:`printk()`
------------------
Defined in ``include/linux/printk.h``
:c:func:`printk()` feeds kernel messages to the console, dmesg, and
the syslog daemon. It is useful for debugging and reporting errors, and
can be used inside interrupt context, but use with caution: a machine
which has its console flooded with printk messages is unusable. It uses
a format string mostly compatible with ANSI C printf, and C string
concatenation to give it a first "priority" argument::
printk(KERN_INFO "i = %u\n", i);
See ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are
interpreted by syslog as the level. Special case: for printing an IP
address use::
__be32 ipaddress;
printk(KERN_INFO "my ip: %pI4\n", &ipaddress);
:c:func:`printk()` internally uses a 1K buffer and does not catch
overruns. Make sure that will be enough.
.. note::
You will know when you are a real kernel hacker when you start
typoing printf as printk in your user programs :)
.. note::
Another sidenote: the original Unix Version 6 sources had a comment
on top of its printf function: "Printf should not be used for
chit-chat". You should follow that advice.
:c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
---------------------------------------------------------------------------------------------------
Defined in ``include/linux/uaccess.h`` / ``asm/uaccess.h``
**[SLEEPS]**
:c:func:`put_user()` and :c:func:`get_user()` are used to get
and put single values (such as an int, char, or long) from and to
userspace. A pointer into userspace should never be simply dereferenced:
data should be copied using these routines. Both return ``-EFAULT`` or
0.
:c:func:`copy_to_user()` and :c:func:`copy_from_user()` are
more general: they copy an arbitrary amount of data to and from
userspace.
.. warning::
Unlike :c:func:`put_user()` and :c:func:`get_user()`, they
return the amount of uncopied data (ie. 0 still means success).
[Yes, this moronic interface makes me cringe. The flamewar comes up
every year or so. --RR.]
The functions may sleep implicitly. This should never be called outside
user context (it makes no sense), with interrupts disabled, or a
spinlock held.
:c:func:`kmalloc()`/:c:func:`kfree()`
-------------------------------------
Defined in ``include/linux/slab.h``
**[MAY SLEEP: SEE BELOW]**
These routines are used to dynamically request pointer-aligned chunks of
memory, like malloc and free do in userspace, but
:c:func:`kmalloc()` takes an extra flag word. Important values:
``GFP_KERNEL``
May sleep and swap to free memory. Only allowed in user context, but
is the most reliable way to allocate memory.
``GFP_ATOMIC``
Don't sleep. Less reliable than ``GFP_KERNEL``, but may be called
from interrupt context. You should **really** have a good
out-of-memory error-handling strategy.
``GFP_DMA``
Allocate ISA DMA lower than 16MB. If you don't know what that is you
don't need it. Very unreliable.
If you see a sleeping function called from invalid context warning
message, then maybe you called a sleeping allocation function from
interrupt context without ``GFP_ATOMIC``. You should really fix that.
Run, don't walk.
If you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or
``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
(``include/linux/gfp.h``). It takes an order argument (0 for page sized,
1 for double page, 2 for four pages etc.) and the same memory priority
flag word as above.
If you are allocating more than a page worth of bytes you can use
:c:func:`vmalloc()`. It'll allocate virtual memory in the kernel
map. This block is not contiguous in physical memory, but the MMU makes
it look like it is for you (so it'll only look contiguous to the CPUs,
not to external device drivers). If you really need large physically
contiguous memory for some weird device, you have a problem: it is
poorly supported in Linux because after some time memory fragmentation
in a running kernel makes it hard. The best way is to allocate the block
early in the boot process via the :c:func:`alloc_bootmem()`
routine.
Before inventing your own cache of often-used objects consider using a
slab cache in ``include/linux/slab.h``
:c:func:`current()`
-------------------
Defined in ``include/asm/current.h``
This global variable (really a macro) contains a pointer to the current
task structure, so is only valid in user context. For example, when a
process makes a system call, this will point to the task structure of
the calling process. It is **not NULL** in interrupt context.
:c:func:`mdelay()`/:c:func:`udelay()`
-------------------------------------
Defined in ``include/asm/delay.h`` / ``include/linux/delay.h``
The :c:func:`udelay()` and :c:func:`ndelay()` functions can be
used for small pauses. Do not use large values with them as you risk
overflow - the helper function :c:func:`mdelay()` is useful here, or
consider :c:func:`msleep()`.
:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
-----------------------------------------------------------------------------------------------
Defined in ``include/asm/byteorder.h``
The :c:func:`cpu_to_be32()` family (where the "32" can be replaced
by 64 or 16, and the "be" can be replaced by "le") are the general way
to do endian conversions in the kernel: they return the converted value.
All variations supply the reverse as well:
:c:func:`be32_to_cpu()`, etc.
There are two major variations of these functions: the pointer
variation, such as :c:func:`cpu_to_be32p()`, which take a pointer
to the given type, and return the converted value. The other variation
is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
convert value referred to by the pointer, and return void.
:c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
--------------------------------------------------------
Defined in ``include/linux/irqflags.h``
These routines disable hard interrupts on the local CPU, and restore
them. They are reentrant; saving the previous state in their one
``unsigned long flags`` argument. If you know that interrupts are
enabled, you can simply use :c:func:`local_irq_disable()` and
:c:func:`local_irq_enable()`.
.. _local_bh_disable:
:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
--------------------------------------------------------
Defined in ``include/linux/bottom_half.h``
These routines disable soft interrupts on the local CPU, and restore
them. They are reentrant; if soft interrupts were disabled before, they
will still be disabled after this pair of functions has been called.
They prevent softirqs and tasklets from running on the current CPU.
:c:func:`smp_processor_id()`
----------------------------
Defined in ``include/linux/smp.h``
:c:func:`get_cpu()` disables preemption (so you won't suddenly get
moved to another CPU) and returns the current processor number, between
0 and ``NR_CPUS``. Note that the CPU numbers are not necessarily
continuous. You return it again with :c:func:`put_cpu()` when you
are done.
If you know you cannot be preempted by another task (ie. you are in
interrupt context, or have preemption disabled) you can use
smp_processor_id().
``__init``/``__exit``/``__initdata``
------------------------------------
Defined in ``include/linux/init.h``
After boot, the kernel frees up a special section; functions marked with
``__init`` and data structures marked with ``__initdata`` are dropped
after boot is complete: similarly modules discard this memory after
initialization. ``__exit`` is used to declare a function which is only
required on exit: the function will be dropped if this file is not
compiled as a module. See the header file for use. Note that it makes no
sense for a function marked with ``__init`` to be exported to modules
with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
will break.
:c:func:`__initcall()`/:c:func:`module_init()`
----------------------------------------------
Defined in ``include/linux/init.h`` / ``include/linux/module.h``
Many parts of the kernel are well served as a module
(dynamically-loadable parts of the kernel). Using the
:c:func:`module_init()` and :c:func:`module_exit()` macros it
is easy to write code without #ifdefs which can operate both as a module
or built into the kernel.
The :c:func:`module_init()` macro defines which function is to be
called at module insertion time (if the file is compiled as a module),
or at boot time: if the file is not compiled as a module the
:c:func:`module_init()` macro becomes equivalent to
:c:func:`__initcall()`, which through linker magic ensures that
the function is called on boot.
The function can return a negative error number to cause module loading
to fail (unfortunately, this has no effect if the module is compiled
into the kernel). This function is called in user context with
interrupts enabled, so it can sleep.
:c:func:`module_exit()`
-----------------------
Defined in ``include/linux/module.h``
This macro defines the function to be called at module removal time (or
never, in the case of the file compiled into the kernel). It will only
be called if the module usage count has reached zero. This function can
also sleep, but cannot fail: everything must be cleaned up by the time
it returns.
Note that this macro is optional: if it is not present, your module will
not be removable (except for 'rmmod -f').
:c:func:`try_module_get()`/:c:func:`module_put()`
-------------------------------------------------
Defined in ``include/linux/module.h``
These manipulate the module usage count, to protect against removal (a
module also can't be removed if another module uses one of its exported
symbols: see below). Before calling into module code, you should call
:c:func:`try_module_get()` on that module: if it fails, then the
module is being removed and you should act as if it wasn't there.
Otherwise, you can safely enter the module, and call
:c:func:`module_put()` when you're finished.
Most registerable structures have an owner field, such as in the
:c:type:`struct file_operations <file_operations>` structure.
Set this field to the macro ``THIS_MODULE``.
Wait Queues ``include/linux/wait.h``
====================================
**[SLEEPS]**
A wait queue is used to wait for someone to wake you up when a certain
condition is true. They must be used carefully to ensure there is no
race condition. You declare a :c:type:`wait_queue_head_t`, and then processes
which want to wait for that condition declare a :c:type:`wait_queue_entry_t`
referring to themselves, and place that in the queue.
Declaring
---------
You declare a ``wait_queue_head_t`` using the
:c:func:`DECLARE_WAIT_QUEUE_HEAD()` macro, or using the
:c:func:`init_waitqueue_head()` routine in your initialization
code.
Queuing
-------
Placing yourself in the waitqueue is fairly complex, because you must
put yourself in the queue before checking the condition. There is a
macro to do this: :c:func:`wait_event_interruptible()`
(``include/linux/wait.h``) The first argument is the wait queue head, and
the second is an expression which is evaluated; the macro returns 0 when
this expression is true, or ``-ERESTARTSYS`` if a signal is received. The
:c:func:`wait_event()` version ignores signals.
Waking Up Queued Tasks
----------------------
Call :c:func:`wake_up()` (``include/linux/wait.h``);, which will wake
up every process in the queue. The exception is if one has
``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will
not be woken. There are other variants of this basic function available
in the same header.
Atomic Operations
=================
Certain operations are guaranteed atomic on all platforms. The first
class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
this contains a signed integer (at least 32 bits long), and you must use
these functions to manipulate or read :c:type:`atomic_t` variables.
:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
:c:func:`atomic_dec_and_test()` (returns true if it was
decremented to zero).
Yes. It returns true (i.e. != 0) if the atomic variable is zero.
Note that these functions are slower than normal arithmetic, and so
should not be used unnecessarily.
The second class of atomic operations is atomic bit operations on an
``unsigned long``, defined in ``include/linux/bitops.h``. These
operations generally take a pointer to the bit pattern, and a bit
number: 0 is the least significant bit. :c:func:`set_bit()`,
:c:func:`clear_bit()` and :c:func:`change_bit()` set, clear,
and flip the given bit. :c:func:`test_and_set_bit()`,
:c:func:`test_and_clear_bit()` and
:c:func:`test_and_change_bit()` do the same thing, except return
true if the bit was previously set; these are particularly useful for
atomically setting flags.
It is possible to call these operations with bit indices greater than
``BITS_PER_LONG``. The resulting behavior is strange on big-endian
platforms though so it is a good idea not to do this.
Symbols
=======
Within the kernel proper, the normal linking rules apply (ie. unless a
symbol is declared to be file scope with the ``static`` keyword, it can
be used anywhere in the kernel). However, for modules, a special
exported symbol table is kept which limits the entry points to the
kernel proper. Modules can also export symbols.
:c:func:`EXPORT_SYMBOL()`
-------------------------
Defined in ``include/linux/export.h``
This is the classic method of exporting a symbol: dynamically loaded
modules will be able to use the symbol as normal.
:c:func:`EXPORT_SYMBOL_GPL()`
-----------------------------
Defined in ``include/linux/export.h``
Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols
exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
modules with a :c:func:`MODULE_LICENSE()` that specifies a GPL
compatible license. It implies that the function is considered an
internal implementation issue, and not really an interface. Some
maintainers and developers may however require EXPORT_SYMBOL_GPL()
when adding any new APIs or functionality.
Routines and Conventions
========================
Double-linked lists ``include/linux/list.h``
--------------------------------------------
There used to be three sets of linked-list routines in the kernel
headers, but this one is the winner. If you don't have some particular
pressing need for a single list, it's a good choice.
In particular, :c:func:`list_for_each_entry()` is useful.
Return Conventions
------------------
For code called in user context, it's very common to defy C convention,
and return 0 for success, and a negative error number (eg. ``-EFAULT``) for
failure. This can be unintuitive at first, but it's fairly widespread in
the kernel.
Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
negative error number into a pointer, and :c:func:`IS_ERR()` and
:c:func:`PTR_ERR()` to get it back out again: avoids a separate
pointer parameter for the error number. Icky, but in a good way.
Breaking Compilation
--------------------
Linus and the other developers sometimes change function or structure
names in development kernels; this is not done just to keep everyone on
their toes: it reflects a fundamental change (eg. can no longer be
called with interrupts on, or does extra checks, or doesn't do checks
which were caught before). Usually this is accompanied by a fairly
complete note to the linux-kernel mailing list; search the archive.
Simply doing a global replace on the file usually makes things **worse**.
Initializing structure members
------------------------------
The preferred method of initializing structures is to use designated
initialisers, as defined by ISO C99, eg::
static struct block_device_operations opt_fops = {
.open = opt_open,
.release = opt_release,
.ioctl = opt_ioctl,
.check_media_change = opt_media_change,
};
This makes it easy to grep for, and makes it clear which structure
fields are set. You should do this because it looks cool.
GNU Extensions
--------------
GNU Extensions are explicitly allowed in the Linux kernel. Note that
some of the more complex ones are not very well supported, due to lack
of general use, but the following are considered standard (see the GCC
info page section "C Extensions" for more details - Yes, really the info
page, the man page is only a short summary of the stuff in info).
- Inline functions
- Statement expressions (ie. the ({ and }) constructs).
- Declaring attributes of a function / variable / type
(__attribute__)
- typeof
- Zero length arrays
- Macro varargs
- Arithmetic on void pointers
- Non-Constant initializers
- Assembler Instructions (not outside arch/ and include/asm/)
- Function names as strings (__func__).
- __builtin_constant_p()
Be wary when using long long in the kernel, the code gcc generates for
it is horrible and worse: division and multiplication does not work on
i386 because the GCC runtime functions for it are missing from the
kernel environment.
C++
---
Using C++ in the kernel is usually a bad idea, because the kernel does
not provide the necessary runtime environment and the include files are
not tested for it. It is still possible, but not recommended. If you
really want to do this, forget about exceptions at least.
NUMif
-----
It is generally considered cleaner to use macros in header files (or at
the top of .c files) to abstract away functions rather than using \`#if'
pre-processor statements throughout the source code.
Putting Your Stuff in the Kernel
================================
In order to get your stuff into shape for official inclusion, or even to
make a neat patch, there's administrative work to be done:
- Figure out whose pond you've been pissing in. Look at the top of the
source files, inside the ``MAINTAINERS`` file, and last of all in the
``CREDITS`` file. You should coordinate with this person to make sure
you're not duplicating effort, or trying something that's already
been rejected.
Make sure you put your name and EMail address at the top of any files
you create or mangle significantly. This is the first place people
will look when they find a bug, or when **they** want to make a change.
- Usually you want a configuration option for your kernel hack. Edit
``Kconfig`` in the appropriate directory. The Config language is
simple to use by cut and paste, and there's complete documentation in
``Documentation/kbuild/kconfig-language.txt``.
In your description of the option, make sure you address both the
expert user and the user who knows nothing about your feature.
Mention incompatibilities and issues here. **Definitely** end your
description with “if in doubt, say N” (or, occasionally, \`Y'); this
is for people who have no idea what you are talking about.
- Edit the ``Makefile``: the CONFIG variables are exported here so you
can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
is documented in ``Documentation/kbuild/makefiles.txt``.
- Put yourself in ``CREDITS`` if you've done something noteworthy,
usually beyond a single file (your name should be at the top of the
source files anyway). ``MAINTAINERS`` means you want to be consulted
when changes are made to a subsystem, and hear about bugs; it implies
a more-than-passing commitment to some part of the code.
- Finally, don't forget to read
``Documentation/process/submitting-patches.rst`` and possibly
``Documentation/process/submitting-drivers.rst``.
Kernel Cantrips
===============
Some favorites from browsing the source. Feel free to add to this list.
``arch/x86/include/asm/delay.h``::
#define ndelay(n) (__builtin_constant_p(n) ? \
((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
__ndelay(n))
``include/linux/fs.h``::
/*
* Kernel pointers have redundant information, so we can use a
* scheme where we can return either an error code or a dentry
* pointer with the same return value.
*
* This should be a per-architecture thing, to allow different
* error and pointer decisions.
*/
#define ERR_PTR(err) ((void *)((long)(err)))
#define PTR_ERR(ptr) ((long)(ptr))
#define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
``arch/x86/include/asm/uaccess_32.h:``::
#define copy_to_user(to,from,n) \
(__builtin_constant_p(n) ? \
__constant_copy_to_user((to),(from),(n)) : \
__generic_copy_to_user((to),(from),(n)))
``arch/sparc/kernel/head.S:``::
/*
* Sun people can't spell worth damn. "compatability" indeed.
* At least we *know* we can't spell, and use a spell-checker.
*/
/* Uh, actually Linus it is I who cannot spell. Too much murky
* Sparc assembly will do this to ya.
*/
C_LABEL(cputypvar):
.asciz "compatibility"
/* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */
.align 4
C_LABEL(cputypvar_sun4m):
.asciz "compatible"
``arch/sparc/lib/checksum.S:``::
/* Sun, you just can't beat me, you just can't. Stop trying,
* give up. I'm serious, I am going to kick the living shit
* out of you, game over, lights out.
*/
Thanks
======
Thanks to Andi Kleen for the idea, answering my questions, fixing my
mistakes, filling content, etc. Philipp Rumpf for more spelling and
clarity fixes, and some excellent non-obvious points. Werner Almesberger
for giving me a great summary of :c:func:`disable_irq()`, and Jes
Sorensen and Andrea Arcangeli added caveats. Michael Elizabeth Chastain
for checking and adding to the Configure section. Telsa Gwynne for
teaching me DocBook.

View File

@ -0,0 +1,9 @@
=====================
Kernel Hacking Guides
=====================
.. toctree::
:maxdepth: 2
hacking
locking

File diff suppressed because it is too large Load Diff

201
Documentation/lsm.txt Normal file
View File

@ -0,0 +1,201 @@
========================================================
Linux Security Modules: General Security Hooks for Linux
========================================================
:Author: Stephen Smalley
:Author: Timothy Fraser
:Author: Chris Vance
.. note::
The APIs described in this book are outdated.
Introduction
============
In March 2001, the National Security Agency (NSA) gave a presentation
about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit.
SELinux is an implementation of flexible and fine-grained
nondiscretionary access controls in the Linux kernel, originally
implemented as its own particular kernel patch. Several other security
projects (e.g. RSBAC, Medusa) have also developed flexible access
control architectures for the Linux kernel, and various projects have
developed particular access control models for Linux (e.g. LIDS, DTE,
SubDomain). Each project has developed and maintained its own kernel
patch to support its security needs.
In response to the NSA presentation, Linus Torvalds made a set of
remarks that described a security framework he would be willing to
consider for inclusion in the mainstream Linux kernel. He described a
general framework that would provide a set of security hooks to control
operations on kernel objects and a set of opaque security fields in
kernel data structures for maintaining security attributes. This
framework could then be used by loadable kernel modules to implement any
desired model of security. Linus also suggested the possibility of
migrating the Linux capabilities code into such a module.
The Linux Security Modules (LSM) project was started by WireX to develop
such a framework. LSM is a joint development effort by several security
projects, including Immunix, SELinux, SGI and Janus, and several
individuals, including Greg Kroah-Hartman and James Morris, to develop a
Linux kernel patch that implements this framework. The patch is
currently tracking the 2.4 series and is targeted for integration into
the 2.5 development series. This technical report provides an overview
of the framework and the example capabilities security module provided
by the LSM kernel patch.
LSM Framework
=============
The LSM kernel patch provides a general kernel framework to support
security modules. In particular, the LSM framework is primarily focused
on supporting access control modules, although future development is
likely to address other security needs such as auditing. By itself, the
framework does not provide any additional security; it merely provides
the infrastructure to support security modules. The LSM kernel patch
also moves most of the capabilities logic into an optional security
module, with the system defaulting to the traditional superuser logic.
This capabilities module is discussed further in
`LSM Capabilities Module <#cap>`__.
The LSM kernel patch adds security fields to kernel data structures and
inserts calls to hook functions at critical points in the kernel code to
manage the security fields and to perform access control. It also adds
functions for registering and unregistering security modules, and adds a
general :c:func:`security()` system call to support new system calls
for security-aware applications.
The LSM security fields are simply ``void*`` pointers. For process and
program execution security information, security fields were added to
:c:type:`struct task_struct <task_struct>` and
:c:type:`struct linux_binprm <linux_binprm>`. For filesystem
security information, a security field was added to :c:type:`struct
super_block <super_block>`. For pipe, file, and socket security
information, security fields were added to :c:type:`struct inode
<inode>` and :c:type:`struct file <file>`. For packet and
network device security information, security fields were added to
:c:type:`struct sk_buff <sk_buff>` and :c:type:`struct
net_device <net_device>`. For System V IPC security information,
security fields were added to :c:type:`struct kern_ipc_perm
<kern_ipc_perm>` and :c:type:`struct msg_msg
<msg_msg>`; additionally, the definitions for :c:type:`struct
msg_msg <msg_msg>`, struct msg_queue, and struct shmid_kernel
were moved to header files (``include/linux/msg.h`` and
``include/linux/shm.h`` as appropriate) to allow the security modules to
use these definitions.
Each LSM hook is a function pointer in a global table, security_ops.
This table is a :c:type:`struct security_operations
<security_operations>` structure as defined by
``include/linux/security.h``. Detailed documentation for each hook is
included in this header file. At present, this structure consists of a
collection of substructures that group related hooks based on the kernel
object (e.g. task, inode, file, sk_buff, etc) as well as some top-level
hook function pointers for system operations. This structure is likely
to be flattened in the future for performance. The placement of the hook
calls in the kernel code is described by the "called:" lines in the
per-hook documentation in the header file. The hook calls can also be
easily found in the kernel code by looking for the string
"security_ops->".
Linus mentioned per-process security hooks in his original remarks as a
possible alternative to global security hooks. However, if LSM were to
start from the perspective of per-process hooks, then the base framework
would have to deal with how to handle operations that involve multiple
processes (e.g. kill), since each process might have its own hook for
controlling the operation. This would require a general mechanism for
composing hooks in the base framework. Additionally, LSM would still
need global hooks for operations that have no process context (e.g.
network input operations). Consequently, LSM provides global security
hooks, but a security module is free to implement per-process hooks
(where that makes sense) by storing a security_ops table in each
process' security field and then invoking these per-process hooks from
the global hooks. The problem of composition is thus deferred to the
module.
The global security_ops table is initialized to a set of hook functions
provided by a dummy security module that provides traditional superuser
logic. A :c:func:`register_security()` function (in
``security/security.c``) is provided to allow a security module to set
security_ops to refer to its own hook functions, and an
:c:func:`unregister_security()` function is provided to revert
security_ops to the dummy module hooks. This mechanism is used to set
the primary security module, which is responsible for making the final
decision for each hook.
LSM also provides a simple mechanism for stacking additional security
modules with the primary security module. It defines
:c:func:`register_security()` and
:c:func:`unregister_security()` hooks in the :c:type:`struct
security_operations <security_operations>` structure and
provides :c:func:`mod_reg_security()` and
:c:func:`mod_unreg_security()` functions that invoke these hooks
after performing some sanity checking. A security module can call these
functions in order to stack with other modules. However, the actual
details of how this stacking is handled are deferred to the module,
which can implement these hooks in any way it wishes (including always
returning an error if it does not wish to support stacking). In this
manner, LSM again defers the problem of composition to the module.
Although the LSM hooks are organized into substructures based on kernel
object, all of the hooks can be viewed as falling into two major
categories: hooks that are used to manage the security fields and hooks
that are used to perform access control. Examples of the first category
of hooks include the :c:func:`alloc_security()` and
:c:func:`free_security()` hooks defined for each kernel data
structure that has a security field. These hooks are used to allocate
and free security structures for kernel objects. The first category of
hooks also includes hooks that set information in the security field
after allocation, such as the :c:func:`post_lookup()` hook in
:c:type:`struct inode_security_ops <inode_security_ops>`.
This hook is used to set security information for inodes after
successful lookup operations. An example of the second category of hooks
is the :c:func:`permission()` hook in :c:type:`struct
inode_security_ops <inode_security_ops>`. This hook checks
permission when accessing an inode.
LSM Capabilities Module
=======================
The LSM kernel patch moves most of the existing POSIX.1e capabilities
logic into an optional security module stored in the file
``security/capability.c``. This change allows users who do not want to
use capabilities to omit this code entirely from their kernel, instead
using the dummy module for traditional superuser logic or any other
module that they desire. This change also allows the developers of the
capabilities logic to maintain and enhance their code more freely,
without needing to integrate patches back into the base kernel.
In addition to moving the capabilities logic, the LSM kernel patch could
move the capability-related fields from the kernel data structures into
the new security fields managed by the security modules. However, at
present, the LSM kernel patch leaves the capability fields in the kernel
data structures. In his original remarks, Linus suggested that this
might be preferable so that other security modules can be easily stacked
with the capabilities module without needing to chain multiple security
structures on the security field. It also avoids imposing extra overhead
on the capabilities module to manage the security fields. However, the
LSM framework could certainly support such a move if it is determined to
be desirable, with only a few additional changes described below.
At present, the capabilities logic for computing process capabilities on
:c:func:`execve()` and :c:func:`set\*uid()`, checking
capabilities for a particular process, saving and checking capabilities
for netlink messages, and handling the :c:func:`capget()` and
:c:func:`capset()` system calls have been moved into the
capabilities module. There are still a few locations in the base kernel
where capability-related fields are directly examined or modified, but
the current version of the LSM patch does allow a security module to
completely replace the assignment and testing of capabilities. These few
locations would need to be changed if the capability-related fields were
moved into the security field. The following is a list of known
locations that still perform such direct examination or modification of
capability-related fields:
- ``fs/open.c``::c:func:`sys_access()`
- ``fs/lockd/host.c``::c:func:`nlm_bind_host()`
- ``fs/nfsd/auth.c``::c:func:`nfsd_setuser()`
- ``fs/proc/array.c``::c:func:`task_cap()`

View File

@ -129,8 +129,8 @@ Selection targets and flags are documented in
.. _sel-const-adjust:
.. figure:: constraints.*
:alt: constraints.pdf / constraints.svg
.. kernel-figure:: constraints.svg
:alt: constraints.svg
:align: center
Size adjustments with constraint flags.

View File

@ -498,11 +498,11 @@ And a couple of implicit varieties:
This means that ACQUIRE acts as a minimal "acquire" operation and
RELEASE acts as a minimal "release" operation.
A subset of the atomic operations described in atomic_ops.txt have ACQUIRE
and RELEASE variants in addition to fully-ordered and relaxed (no barrier
semantics) definitions. For compound atomics performing both a load and a
store, ACQUIRE semantics apply only to the load and RELEASE semantics apply
only to the store portion of the operation.
A subset of the atomic operations described in core-api/atomic_ops.rst have
ACQUIRE and RELEASE variants in addition to fully-ordered and relaxed (no
barrier semantics) definitions. For compound atomics performing both a load
and a store, ACQUIRE semantics apply only to the load and RELEASE semantics
apply only to the store portion of the operation.
Memory barriers are only required where there's a possibility of interaction
between two CPUs or between a CPU and a device. If it can be guaranteed that

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "Linux Networking Documentation"
tags.add("subproject")
latex_documents = [
('index', 'networking.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -143,7 +143,7 @@ the key will be discarded and recreated when the data it holds has expired.
dns_query() returns a copy of the value attached to the key, or an error if
that is indicated instead.
See <file:Documentation/security/keys-request-key.txt> for further
See <file:Documentation/security/keys/request-key.rst> for further
information about request-key function.

View File

@ -0,0 +1,18 @@
Linux Networking Documentation
==============================
Contents:
.. toctree::
:maxdepth: 2
kapi
z8530book
.. only:: subproject
Indices
=======
* :ref:`genindex`

View File

@ -0,0 +1,147 @@
=========================================
Linux Networking and Network Devices APIs
=========================================
Linux Networking
================
Networking Base Types
---------------------
.. kernel-doc:: include/linux/net.h
:internal:
Socket Buffer Functions
-----------------------
.. kernel-doc:: include/linux/skbuff.h
:internal:
.. kernel-doc:: include/net/sock.h
:internal:
.. kernel-doc:: net/socket.c
:export:
.. kernel-doc:: net/core/skbuff.c
:export:
.. kernel-doc:: net/core/sock.c
:export:
.. kernel-doc:: net/core/datagram.c
:export:
.. kernel-doc:: net/core/stream.c
:export:
Socket Filter
-------------
.. kernel-doc:: net/core/filter.c
:export:
Generic Network Statistics
--------------------------
.. kernel-doc:: include/uapi/linux/gen_stats.h
:internal:
.. kernel-doc:: net/core/gen_stats.c
:export:
.. kernel-doc:: net/core/gen_estimator.c
:export:
SUN RPC subsystem
-----------------
.. kernel-doc:: net/sunrpc/xdr.c
:export:
.. kernel-doc:: net/sunrpc/svc_xprt.c
:export:
.. kernel-doc:: net/sunrpc/xprt.c
:export:
.. kernel-doc:: net/sunrpc/sched.c
:export:
.. kernel-doc:: net/sunrpc/socklib.c
:export:
.. kernel-doc:: net/sunrpc/stats.c
:export:
.. kernel-doc:: net/sunrpc/rpc_pipe.c
:export:
.. kernel-doc:: net/sunrpc/rpcb_clnt.c
:export:
.. kernel-doc:: net/sunrpc/clnt.c
:export:
WiMAX
-----
.. kernel-doc:: net/wimax/op-msg.c
:export:
.. kernel-doc:: net/wimax/op-reset.c
:export:
.. kernel-doc:: net/wimax/op-rfkill.c
:export:
.. kernel-doc:: net/wimax/stack.c
:export:
.. kernel-doc:: include/net/wimax.h
:internal:
.. kernel-doc:: include/uapi/linux/wimax.h
:internal:
Network device support
======================
Driver Support
--------------
.. kernel-doc:: net/core/dev.c
:export:
.. kernel-doc:: net/ethernet/eth.c
:export:
.. kernel-doc:: net/sched/sch_generic.c
:export:
.. kernel-doc:: include/linux/etherdevice.h
:internal:
.. kernel-doc:: include/linux/netdevice.h
:internal:
PHY Support
-----------
.. kernel-doc:: drivers/net/phy/phy.c
:export:
.. kernel-doc:: drivers/net/phy/phy.c
:internal:
.. kernel-doc:: drivers/net/phy/phy_device.c
:export:
.. kernel-doc:: drivers/net/phy/phy_device.c
:internal:
.. kernel-doc:: drivers/net/phy/mdio_bus.c
:export:
.. kernel-doc:: drivers/net/phy/mdio_bus.c
:internal:

View File

@ -0,0 +1,256 @@
=======================
Z8530 Programming Guide
=======================
:Author: Alan Cox
Introduction
============
The Z85x30 family synchronous/asynchronous controller chips are used on
a large number of cheap network interface cards. The kernel provides a
core interface layer that is designed to make it easy to provide WAN
services using this chip.
The current driver only support synchronous operation. Merging the
asynchronous driver support into this code to allow any Z85x30 device to
be used as both a tty interface and as a synchronous controller is a
project for Linux post the 2.4 release
Driver Modes
============
The Z85230 driver layer can drive Z8530, Z85C30 and Z85230 devices in
three different modes. Each mode can be applied to an individual channel
on the chip (each chip has two channels).
The PIO synchronous mode supports the most common Z8530 wiring. Here the
chip is interface to the I/O and interrupt facilities of the host
machine but not to the DMA subsystem. When running PIO the Z8530 has
extremely tight timing requirements. Doing high speeds, even with a
Z85230 will be tricky. Typically you should expect to achieve at best
9600 baud with a Z8C530 and 64Kbits with a Z85230.
The DMA mode supports the chip when it is configured to use dual DMA
channels on an ISA bus. The better cards tend to support this mode of
operation for a single channel. With DMA running the Z85230 tops out
when it starts to hit ISA DMA constraints at about 512Kbits. It is worth
noting here that many PC machines hang or crash when the chip is driven
fast enough to hold the ISA bus solid.
Transmit DMA mode uses a single DMA channel. The DMA channel is used for
transmission as the transmit FIFO is smaller than the receive FIFO. it
gives better performance than pure PIO mode but is nowhere near as ideal
as pure DMA mode.
Using the Z85230 driver
=======================
The Z85230 driver provides the back end interface to your board. To
configure a Z8530 interface you need to detect the board and to identify
its ports and interrupt resources. It is also your problem to verify the
resources are available.
Having identified the chip you need to fill in a struct z8530_dev,
which describes each chip. This object must exist until you finally
shutdown the board. Firstly zero the active field. This ensures nothing
goes off without you intending it. The irq field should be set to the
interrupt number of the chip. (Each chip has a single interrupt source
rather than each channel). You are responsible for allocating the
interrupt line. The interrupt handler should be set to
:c:func:`z8530_interrupt()`. The device id should be set to the
z8530_dev structure pointer. Whether the interrupt can be shared or not
is board dependent, and up to you to initialise.
The structure holds two channel structures. Initialise chanA.ctrlio and
chanA.dataio with the address of the control and data ports. You can or
this with Z8530_PORT_SLEEP to indicate your interface needs the 5uS
delay for chip settling done in software. The PORT_SLEEP option is
architecture specific. Other flags may become available on future
platforms, eg for MMIO. Initialise the chanA.irqs to &z8530_nop to
start the chip up as disabled and discarding interrupt events. This
ensures that stray interrupts will be mopped up and not hang the bus.
Set chanA.dev to point to the device structure itself. The private and
name field you may use as you wish. The private field is unused by the
Z85230 layer. The name is used for error reporting and it may thus make
sense to make it match the network name.
Repeat the same operation with the B channel if your chip has both
channels wired to something useful. This isn't always the case. If it is
not wired then the I/O values do not matter, but you must initialise
chanB.dev.
If your board has DMA facilities then initialise the txdma and rxdma
fields for the relevant channels. You must also allocate the ISA DMA
channels and do any necessary board level initialisation to configure
them. The low level driver will do the Z8530 and DMA controller
programming but not board specific magic.
Having initialised the device you can then call
:c:func:`z8530_init()`. This will probe the chip and reset it into
a known state. An identification sequence is then run to identify the
chip type. If the checks fail to pass the function returns a non zero
error code. Typically this indicates that the port given is not valid.
After this call the type field of the z8530_dev structure is
initialised to either Z8530, Z85C30 or Z85230 according to the chip
found.
Once you have called z8530_init you can also make use of the utility
function :c:func:`z8530_describe()`. This provides a consistent
reporting format for the Z8530 devices, and allows all the drivers to
provide consistent reporting.
Attaching Network Interfaces
============================
If you wish to use the network interface facilities of the driver, then
you need to attach a network device to each channel that is present and
in use. In addition to use the generic HDLC you need to follow some
additional plumbing rules. They may seem complex but a look at the
example hostess_sv11 driver should reassure you.
The network device used for each channel should be pointed to by the
netdevice field of each channel. The hdlc-> priv field of the network
device points to your private data - you will need to be able to find
your private data from this.
The way most drivers approach this particular problem is to create a
structure holding the Z8530 device definition and put that into the
private field of the network device. The network device fields of the
channels then point back to the network devices.
If you wish to use the generic HDLC then you need to register the HDLC
device.
Before you register your network device you will also need to provide
suitable handlers for most of the network device callbacks. See the
network device documentation for more details on this.
Configuring And Activating The Port
===================================
The Z85230 driver provides helper functions and tables to load the port
registers on the Z8530 chips. When programming the register settings for
a channel be aware that the documentation recommends initialisation
orders. Strange things happen when these are not followed.
:c:func:`z8530_channel_load()` takes an array of pairs of
initialisation values in an array of u8 type. The first value is the
Z8530 register number. Add 16 to indicate the alternate register bank on
the later chips. The array is terminated by a 255.
The driver provides a pair of public tables. The z8530_hdlc_kilostream
table is for the UK 'Kilostream' service and also happens to cover most
other end host configurations. The z8530_hdlc_kilostream_85230 table
is the same configuration using the enhancements of the 85230 chip. The
configuration loaded is standard NRZ encoded synchronous data with HDLC
bitstuffing. All of the timing is taken from the other end of the link.
When writing your own tables be aware that the driver internally tracks
register values. It may need to reload values. You should therefore be
sure to set registers 1-7, 9-11, 14 and 15 in all configurations. Where
the register settings depend on DMA selection the driver will update the
bits itself when you open or close. Loading a new table with the
interface open is not recommended.
There are three standard configurations supported by the core code. In
PIO mode the interface is programmed up to use interrupt driven PIO.
This places high demands on the host processor to avoid latency. The
driver is written to take account of latency issues but it cannot avoid
latencies caused by other drivers, notably IDE in PIO mode. Because the
drivers allocate buffers you must also prevent MTU changes while the
port is open.
Once the port is open it will call the rx_function of each channel
whenever a completed packet arrived. This is invoked from interrupt
context and passes you the channel and a network buffer (struct
sk_buff) holding the data. The data includes the CRC bytes so most
users will want to trim the last two bytes before processing the data.
This function is very timing critical. When you wish to simply discard
data the support code provides the function
:c:func:`z8530_null_rx()` to discard the data.
To active PIO mode sending and receiving the ``z8530_sync_open`` is called.
This expects to be passed the network device and the channel. Typically
this is called from your network device open callback. On a failure a
non zero error status is returned.
The :c:func:`z8530_sync_close()` function shuts down a PIO
channel. This must be done before the channel is opened again and before
the driver shuts down and unloads.
The ideal mode of operation is dual channel DMA mode. Here the kernel
driver will configure the board for DMA in both directions. The driver
also handles ISA DMA issues such as controller programming and the
memory range limit for you. This mode is activated by calling the
:c:func:`z8530_sync_dma_open()` function. On failure a non zero
error value is returned. Once this mode is activated it can be shut down
by calling the :c:func:`z8530_sync_dma_close()`. You must call
the close function matching the open mode you used.
The final supported mode uses a single DMA channel to drive the transmit
side. As the Z85C30 has a larger FIFO on the receive channel this tends
to increase the maximum speed a little. This is activated by calling the
``z8530_sync_txdma_open``. This returns a non zero error code on failure. The
:c:func:`z8530_sync_txdma_close()` function closes down the Z8530
interface from this mode.
Network Layer Functions
=======================
The Z8530 layer provides functions to queue packets for transmission.
The driver internally buffers the frame currently being transmitted and
one further frame (in order to keep back to back transmission running).
Any further buffering is up to the caller.
The function :c:func:`z8530_queue_xmit()` takes a network buffer
in sk_buff format and queues it for transmission. The caller must
provide the entire packet with the exception of the bitstuffing and CRC.
This is normally done by the caller via the generic HDLC interface
layer. It returns 0 if the buffer has been queued and non zero values
for queue full. If the function accepts the buffer it becomes property
of the Z8530 layer and the caller should not free it.
The function :c:func:`z8530_get_stats()` returns a pointer to an
internally maintained per interface statistics block. This provides most
of the interface code needed to implement the network layer get_stats
callback.
Porting The Z8530 Driver
========================
The Z8530 driver is written to be portable. In DMA mode it makes
assumptions about the use of ISA DMA. These are probably warranted in
most cases as the Z85230 in particular was designed to glue to PC type
machines. The PIO mode makes no real assumptions.
Should you need to retarget the Z8530 driver to another architecture the
only code that should need changing are the port I/O functions. At the
moment these assume PC I/O port accesses. This may not be appropriate
for all platforms. Replacing :c:func:`z8530_read_port()` and
``z8530_write_port`` is intended to be all that is required to port
this driver layer.
Known Bugs And Assumptions
==========================
Interrupt Locking
The locking in the driver is done via the global cli/sti lock. This
makes for relatively poor SMP performance. Switching this to use a
per device spin lock would probably materially improve performance.
Occasional Failures
We have reports of occasional failures when run for very long
periods of time and the driver starts to receive junk frames. At the
moment the cause of this is not clear.
Public Functions Provided
=========================
.. kernel-doc:: drivers/net/wan/z85230.c
:export:
Internal Functions
==================
.. kernel-doc:: drivers/net/wan/z85230.c
:internal:

View File

@ -116,12 +116,11 @@ DevFS has been obsoleted in favour of udev
Linux documentation for functions is transitioning to inline
documentation via specially-formatted comments near their
definitions in the source. These comments can be combined with the
SGML templates in the Documentation/DocBook directory to make DocBook
files, which can then be converted by DocBook stylesheets to PostScript,
HTML, PDF files, and several other formats. In order to convert from
DocBook format to a format of your choice, you'll need to install Jade as
well as the desired DocBook stylesheets.
definitions in the source. These comments can be combined with ReST
files the Documentation/ directory to make enriched documentation, which can
then be converted to PostScript, HTML, LaTex, ePUB and PDF files.
In order to convert from ReST format to a format of your choice, you'll need
Sphinx.
Util-linux
----------
@ -323,12 +322,6 @@ PDF outputs, it is recommended to use version 1.4.6.
functionalities required for ``XeLaTex`` to work. For PDF output you'll also
need ``convert(1)`` from ImageMagick (https://www.imagemagick.org).
Other tools
-----------
In order to produce documentation from DocBook, you'll also need ``xmlto``.
Please notice, however, that we're currently migrating all documents to use
``Sphinx``.
Getting updated software
========================
@ -409,15 +402,6 @@ Quota-tools
- <http://sourceforge.net/projects/linuxquota/>
DocBook Stylesheets
-------------------
- <http://sourceforge.net/projects/docbook/files/docbook-dsssl/>
XMLTO XSLT Frontend
-------------------
- <http://cyberelk.net/tim/xmlto/>
Intel P6 microcode
------------------

View File

@ -980,8 +980,8 @@ do so, though, and doing so unnecessarily can limit optimization.
When writing a single inline assembly statement containing multiple
instructions, put each instruction on a separate line in a separate quoted
string, and end each string except the last with \n\t to properly indent the
next instruction in the assembly output:
string, and end each string except the last with ``\n\t`` to properly indent
the next instruction in the assembly output:
.. code-block:: c

View File

@ -167,6 +167,11 @@ Lotus Notes (GUI)
Run away from it.
IBM Verse (Web GUI)
*******************
See Lotus Notes.
Mutt (TUI)
**********

View File

@ -180,14 +180,6 @@ They can also be generated on LaTeX and ePub formats with::
make latexdocs
make epubdocs
Currently, there are some documents written on DocBook that are in
the process of conversion to ReST. Such documents will be created in the
Documentation/DocBook/ directory and can be generated also as
Postscript or man pages by running::
make psdocs
make mandocs
Becoming A Kernel Developer
---------------------------

View File

@ -40,50 +40,18 @@ Enjoy!
Docs at the Linux Kernel tree
-----------------------------
The DocBook books should be built with ``make {htmldocs | psdocs | pdfdocs}``.
The Sphinx books should be built with ``make {htmldocs | pdfdocs | epubdocs}``.
* Name: **linux/Documentation**
:Author: Many.
:Location: Documentation/
:Keywords: text files, Sphinx, DocBook.
:Keywords: text files, Sphinx.
:Description: Documentation that comes with the kernel sources,
inside the Documentation directory. Some pages from this document
(including this document itself) have been moved there, and might
be more up to date than the web version.
* Title: **The Kernel Hacking HOWTO**
:Author: Various Talented People, and Rusty.
:Location: Documentation/DocBook/kernel-hacking.tmpl
:Keywords: HOWTO, kernel contexts, deadlock, locking, modules,
symbols, return conventions.
:Description: From the Introduction: "Please understand that I
never wanted to write this document, being grossly underqualified,
but I always wanted to read it, and this was the only way. I
simply explain some best practices, and give reading entry-points
into the kernel sources. I avoid implementation details: that's
what the code is for, and I ignore whole tracts of useful
routines. This document assumes familiarity with C, and an
understanding of what the kernel is, and how it is used. It was
originally written for the 2.3 kernels, but nearly all of it
applies to 2.2 too; 2.0 is slightly different".
* Title: **Linux Kernel Locking HOWTO**
:Author: Various Talented People, and Rusty.
:Location: Documentation/DocBook/kernel-locking.tmpl
:Keywords: locks, locking, spinlock, semaphore, atomic, race
condition, bottom halves, tasklets, softirqs.
:Description: The title says it all: document describing the
locking system in the Linux Kernel either in uniprocessor or SMP
systems.
:Notes: "It was originally written for the later (>2.3.47) 2.3
kernels, but most of it applies to 2.2 too; 2.0 is slightly
different". Freely redistributable under the conditions of the GNU
General Public License.
On-line docs
------------

View File

@ -1,26 +0,0 @@
00-INDEX
- this file.
LSM.txt
- description of the Linux Security Module framework.
SELinux.txt
- how to get started with the SELinux security enhancement.
Smack.txt
- documentation on the Smack Linux Security Module.
Yama.txt
- documentation on the Yama Linux Security Module.
apparmor.txt
- documentation on the AppArmor security extension.
credentials.txt
- documentation about credentials in Linux.
keys-ecryptfs.txt
- description of the encryption keys for the ecryptfs filesystem.
keys-request-key.txt
- description of the kernel key request service.
keys-trusted-encrypted.txt
- info on the Trusted and Encrypted keys in the kernel key ring service.
keys.txt
- description of the kernel key retention service.
tomoyo.txt
- documentation on the TOMOYO Linux Security Module.
IMA-templates.txt
- documentation on the template management mechanism for IMA.

View File

@ -1,9 +1,12 @@
=================================
IMA Template Management Mechanism
=================================
==== INTRODUCTION ====
Introduction
============
The original 'ima' template is fixed length, containing the filedata hash
The original ``ima`` template is fixed length, containing the filedata hash
and pathname. The filedata hash is limited to 20 bytes (md5/sha1).
The pathname is a null terminated string, limited to 255 characters.
To overcome these limitations and to add additional file metadata, it is
@ -28,33 +31,34 @@ a new data type, developers define the field identifier and implement
two functions, init() and show(), respectively to generate and display
measurement entries. Defining a new template descriptor requires
specifying the template format (a string of field identifiers separated
by the '|' character) through the 'ima_template_fmt' kernel command line
by the ``|`` character) through the ``ima_template_fmt`` kernel command line
parameter. At boot time, IMA initializes the chosen template descriptor
by translating the format into an array of template fields structures taken
from the set of the supported ones.
After the initialization step, IMA will call ima_alloc_init_template()
After the initialization step, IMA will call ``ima_alloc_init_template()``
(new function defined within the patches for the new template management
mechanism) to generate a new measurement entry by using the template
descriptor chosen through the kernel configuration or through the newly
introduced 'ima_template' and 'ima_template_fmt' kernel command line parameters.
introduced ``ima_template`` and ``ima_template_fmt`` kernel command line parameters.
It is during this phase that the advantages of the new architecture are
clearly shown: the latter function will not contain specific code to handle
a given template but, instead, it simply calls the init() method of the template
a given template but, instead, it simply calls the ``init()`` method of the template
fields associated to the chosen template descriptor and store the result
(pointer to allocated data and data length) in the measurement entry structure.
The same mechanism is employed to display measurements entries.
The functions ima[_ascii]_measurements_show() retrieve, for each entry,
The functions ``ima[_ascii]_measurements_show()`` retrieve, for each entry,
the template descriptor used to produce that entry and call the show()
method for each item of the array of template fields structures.
==== SUPPORTED TEMPLATE FIELDS AND DESCRIPTORS ====
Supported Template Fields and Descriptors
=========================================
In the following, there is the list of supported template fields
('<identifier>': description), that can be used to define new template
``('<identifier>': description)``, that can be used to define new template
descriptors by adding their identifier to the format string
(support for more data types will be added later):
@ -69,20 +73,22 @@ descriptors by adding their identifier to the format string
Below, there is the list of defined template descriptors:
- "ima": its format is 'd|n';
- "ima-ng" (default): its format is 'd-ng|n-ng';
- "ima-sig": its format is 'd-ng|n-ng|sig'.
- "ima": its format is ``d|n``;
- "ima-ng" (default): its format is ``d-ng|n-ng``;
- "ima-sig": its format is ``d-ng|n-ng|sig``.
==== USE ====
Use
===
To specify the template descriptor to be used to generate measurement entries,
currently the following methods are supported:
- select a template descriptor among those supported in the kernel
configuration ('ima-ng' is the default choice);
configuration (``ima-ng`` is the default choice);
- specify a template descriptor name from the kernel command line through
the 'ima_template=' parameter;
the ``ima_template=`` parameter;
- register a new template descriptor with custom format through the kernel
command line parameter 'ima_template_fmt='.
command line parameter ``ima_template_fmt=``.

View File

@ -0,0 +1,14 @@
=================================
Linux Security Module Development
=================================
Based on https://lkml.org/lkml/2007/10/26/215,
a new LSM is accepted into the kernel when its intent (a description of
what it tries to protect against and in what cases one would expect to
use it) has been appropriately documented in ``Documentation/security/LSM``.
This allows an LSM's code to be easily compared to its goals, and so
that end users and distros can make a more informed decision about which
LSMs suit their requirements.
For extensive documentation on the available LSM hook interfaces, please
see ``include/linux/lsm_hooks.h``.

View File

@ -1,8 +0,0 @@
project = "The kernel security subsystem manual"
tags.add("subproject")
latex_documents = [
('index', 'security.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -1,38 +1,18 @@
====================
CREDENTIALS IN LINUX
Credentials in Linux
====================
By: David Howells <dhowells@redhat.com>
Contents:
.. contents:: :local:
(*) Overview.
(*) Types of credentials.
(*) File markings.
(*) Task credentials.
- Immutable credentials.
- Accessing task credentials.
- Accessing another task's credentials.
- Altering credentials.
- Managing credentials.
(*) Open file credentials.
(*) Overriding the VFS's use of credentials.
========
OVERVIEW
Overview
========
There are several parts to the security check performed by Linux when one
object acts upon another:
(1) Objects.
1. Objects.
Objects are things in the system that may be acted upon directly by
userspace programs. Linux has a variety of actionable objects, including:
@ -48,7 +28,7 @@ object acts upon another:
As a part of the description of all these objects there is a set of
credentials. What's in the set depends on the type of object.
(2) Object ownership.
2. Object ownership.
Amongst the credentials of most objects, there will be a subset that
indicates the ownership of that object. This is used for resource
@ -57,7 +37,7 @@ object acts upon another:
In a standard UNIX filesystem, for instance, this will be defined by the
UID marked on the inode.
(3) The objective context.
3. The objective context.
Also amongst the credentials of those objects, there will be a subset that
indicates the 'objective context' of that object. This may or may not be
@ -67,7 +47,7 @@ object acts upon another:
The objective context is used as part of the security calculation that is
carried out when an object is acted upon.
(4) Subjects.
4. Subjects.
A subject is an object that is acting upon another object.
@ -77,10 +57,10 @@ object acts upon another:
Objects other than tasks may under some circumstances also be subjects.
For instance an open file may send SIGIO to a task using the UID and EUID
given to it by a task that called fcntl(F_SETOWN) upon it. In this case,
given to it by a task that called ``fcntl(F_SETOWN)`` upon it. In this case,
the file struct will have a subjective context too.
(5) The subjective context.
5. The subjective context.
A subject has an additional interpretation of its credentials. A subset
of its credentials forms the 'subjective context'. The subjective context
@ -92,7 +72,7 @@ object acts upon another:
from the real UID and GID that normally form the objective context of the
task.
(6) Actions.
6. Actions.
Linux has a number of actions available that a subject may perform upon an
object. The set of actions available depends on the nature of the subject
@ -101,7 +81,7 @@ object acts upon another:
Actions include reading, writing, creating and deleting files; forking or
signalling and tracing tasks.
(7) Rules, access control lists and security calculations.
7. Rules, access control lists and security calculations.
When a subject acts upon an object, a security calculation is made. This
involves taking the subjective context, the objective context and the
@ -111,7 +91,7 @@ object acts upon another:
There are two main sources of rules:
(a) Discretionary access control (DAC):
a. Discretionary access control (DAC):
Sometimes the object will include sets of rules as part of its
description. This is an 'Access Control List' or 'ACL'. A Linux
@ -127,7 +107,7 @@ object acts upon another:
A Linux file might also sport a POSIX ACL. This is a list of rules
that grants various permissions to arbitrary subjects.
(b) Mandatory access control (MAC):
b. Mandatory access control (MAC):
The system as a whole may have one or more sets of rules that get
applied to all subjects and objects, regardless of their source.
@ -139,65 +119,65 @@ object acts upon another:
that says that this action is either granted or denied.
====================
TYPES OF CREDENTIALS
Types of Credentials
====================
The Linux kernel supports the following types of credentials:
(1) Traditional UNIX credentials.
1. Traditional UNIX credentials.
Real User ID
Real Group ID
- Real User ID
- Real Group ID
The UID and GID are carried by most, if not all, Linux objects, even if in
some cases it has to be invented (FAT or CIFS files for example, which are
derived from Windows). These (mostly) define the objective context of
that object, with tasks being slightly different in some cases.
Effective, Saved and FS User ID
Effective, Saved and FS Group ID
Supplementary groups
- Effective, Saved and FS User ID
- Effective, Saved and FS Group ID
- Supplementary groups
These are additional credentials used by tasks only. Usually, an
EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
will be used as the objective. For tasks, it should be noted that this is
not always true.
(2) Capabilities.
2. Capabilities.
Set of permitted capabilities
Set of inheritable capabilities
Set of effective capabilities
Capability bounding set
- Set of permitted capabilities
- Set of inheritable capabilities
- Set of effective capabilities
- Capability bounding set
These are only carried by tasks. They indicate superior capabilities
granted piecemeal to a task that an ordinary task wouldn't otherwise have.
These are manipulated implicitly by changes to the traditional UNIX
credentials, but can also be manipulated directly by the capset() system
call.
credentials, but can also be manipulated directly by the ``capset()``
system call.
The permitted capabilities are those caps that the process might grant
itself to its effective or permitted sets through capset(). This
itself to its effective or permitted sets through ``capset()``. This
inheritable set might also be so constrained.
The effective capabilities are the ones that a task is actually allowed to
make use of itself.
The inheritable capabilities are the ones that may get passed across
execve().
``execve()``.
The bounding set limits the capabilities that may be inherited across
execve(), especially when a binary is executed that will execute as UID 0.
``execve()``, especially when a binary is executed that will execute as
UID 0.
(3) Secure management flags (securebits).
3. Secure management flags (securebits).
These are only carried by tasks. These govern the way the above
credentials are manipulated and inherited over certain operations such as
execve(). They aren't used directly as objective or subjective
credentials.
(4) Keys and keyrings.
4. Keys and keyrings.
These are only carried by tasks. They carry and cache security tokens
that don't fit into the other standard UNIX credentials. They are for
@ -218,7 +198,7 @@ The Linux kernel supports the following types of credentials:
For more information on using keys, see Documentation/security/keys.txt.
(5) LSM
5. LSM
The Linux Security Module allows extra controls to be placed over the
operations that a task may do. Currently Linux supports several LSM
@ -228,7 +208,7 @@ The Linux kernel supports the following types of credentials:
rules (policies) that say what operations a task with one label may do to
an object with another label.
(6) AF_KEY
6. AF_KEY
This is a socket-based approach to credential management for networking
stacks [RFC 2367]. It isn't discussed by this document as it doesn't
@ -244,25 +224,19 @@ network filesystem where the credentials of the opened file should be presented
to the server, regardless of who is actually doing a read or a write upon it.
=============
FILE MARKINGS
File Markings
=============
Files on disk or obtained over the network may have annotations that form the
objective security context of that file. Depending on the type of filesystem,
this may include one or more of the following:
(*) UNIX UID, GID, mode;
(*) Windows user ID;
(*) Access control list;
(*) LSM security label;
(*) UNIX exec privilege escalation bits (SUID/SGID);
(*) File capabilities exec privilege escalation bits.
* UNIX UID, GID, mode;
* Windows user ID;
* Access control list;
* LSM security label;
* UNIX exec privilege escalation bits (SUID/SGID);
* File capabilities exec privilege escalation bits.
These are compared to the task's subjective security context, and certain
operations allowed or disallowed as a result. In the case of execve(), the
@ -270,8 +244,7 @@ privilege escalation bits come into play, and may allow the resulting process
extra privileges, based on the annotations on the executable file.
================
TASK CREDENTIALS
Task Credentials
================
In Linux, all of a task's credentials are held in (uid, gid) or through
@ -282,18 +255,18 @@ task_struct.
Once a set of credentials has been prepared and committed, it may not be
changed, barring the following exceptions:
(1) its reference count may be changed;
1. its reference count may be changed;
(2) the reference count on the group_info struct it points to may be changed;
2. the reference count on the group_info struct it points to may be changed;
(3) the reference count on the security data it points to may be changed;
3. the reference count on the security data it points to may be changed;
(4) the reference count on any keyrings it points to may be changed;
4. the reference count on any keyrings it points to may be changed;
(5) any keyrings it points to may be revoked, expired or have their security
5. any keyrings it points to may be revoked, expired or have their security
attributes changed; and
(6) the contents of any keyrings to which it points may be changed (the whole
6. the contents of any keyrings to which it points may be changed (the whole
point of keyrings being a shared set of credentials, modifiable by anyone
with appropriate access).
@ -303,37 +276,37 @@ the task pointer to make it point to the new copy. There are wrappers to aid
with this (see below).
A task may only alter its _own_ credentials; it is no longer permitted for a
task to alter another's credentials. This means the capset() system call is no
longer permitted to take any PID other than the one of the current process.
Also keyctl_instantiate() and keyctl_negate() functions no longer permit
attachment to process-specific keyrings in the requesting process as the
instantiating process may need to create them.
task to alter another's credentials. This means the ``capset()`` system call
is no longer permitted to take any PID other than the one of the current
process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
longer permit attachment to process-specific keyrings in the requesting
process as the instantiating process may need to create them.
IMMUTABLE CREDENTIALS
Immutable Credentials
---------------------
Once a set of credentials has been made public (by calling commit_creds() for
example), it must be considered immutable, barring two exceptions:
Once a set of credentials has been made public (by calling ``commit_creds()``
for example), it must be considered immutable, barring two exceptions:
(1) The reference count may be altered.
1. The reference count may be altered.
(2) Whilst the keyring subscriptions of a set of credentials may not be
2. Whilst the keyring subscriptions of a set of credentials may not be
changed, the keyrings subscribed to may have their contents altered.
To catch accidental credential alteration at compile time, struct task_struct
has _const_ pointers to its credential sets, as does struct file. Furthermore,
certain functions such as get_cred() and put_cred() operate on const pointers,
thus rendering casts unnecessary, but require to temporarily ditch the const
qualification to be able to alter the reference count.
certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
pointers, thus rendering casts unnecessary, but require to temporarily ditch
the const qualification to be able to alter the reference count.
ACCESSING TASK CREDENTIALS
Accessing Task Credentials
--------------------------
A task being able to alter only its own credentials permits the current process
to read or replace its own credentials without the need for any form of locking
- which simplifies things greatly. It can just call:
-- which simplifies things greatly. It can just call::
const struct cred *current_cred()
@ -341,7 +314,7 @@ to get a pointer to its credentials structure, and it doesn't have to release
it afterwards.
There are convenience wrappers for retrieving specific aspects of a task's
credentials (the value is simply returned in each case):
credentials (the value is simply returned in each case)::
uid_t current_uid(void) Current's real UID
gid_t current_gid(void) Current's real GID
@ -354,7 +327,7 @@ credentials (the value is simply returned in each case):
struct user_struct *current_user(void) Current's user account
There are also convenience wrappers for retrieving specific associated pairs of
a task's credentials:
a task's credentials::
void current_uid_gid(uid_t *, gid_t *);
void current_euid_egid(uid_t *, gid_t *);
@ -365,12 +338,12 @@ them from the current task's credentials.
In addition, there is a function for obtaining a reference on the current
process's current set of credentials:
process's current set of credentials::
const struct cred *get_current_cred(void);
and functions for getting references to one of the credentials that don't
actually live in struct cred:
actually live in struct cred::
struct user_struct *get_current_user(void);
struct group_info *get_current_groups(void);
@ -378,22 +351,22 @@ actually live in struct cred:
which get references to the current process's user accounting structure and
supplementary groups list respectively.
Once a reference has been obtained, it must be released with put_cred(),
free_uid() or put_group_info() as appropriate.
Once a reference has been obtained, it must be released with ``put_cred()``,
``free_uid()`` or ``put_group_info()`` as appropriate.
ACCESSING ANOTHER TASK'S CREDENTIALS
Accessing Another Task's Credentials
------------------------------------
Whilst a task may access its own credentials without the need for locking, the
same is not true of a task wanting to access another task's credentials. It
must use the RCU read lock and rcu_dereference().
must use the RCU read lock and ``rcu_dereference()``.
The rcu_dereference() is wrapped by:
The ``rcu_dereference()`` is wrapped by::
const struct cred *__task_cred(struct task_struct *task);
This should be used inside the RCU read lock, as in the following example:
This should be used inside the RCU read lock, as in the following example::
void foo(struct task_struct *t, struct foo_data *f)
{
@ -410,39 +383,40 @@ This should be used inside the RCU read lock, as in the following example:
Should it be necessary to hold another task's credentials for a long period of
time, and possibly to sleep whilst doing so, then the caller should get a
reference on them using:
reference on them using::
const struct cred *get_task_cred(struct task_struct *task);
This does all the RCU magic inside of it. The caller must call put_cred() on
the credentials so obtained when they're finished with.
[*] Note: The result of __task_cred() should not be passed directly to
get_cred() as this may race with commit_cred().
.. note::
The result of ``__task_cred()`` should not be passed directly to
``get_cred()`` as this may race with ``commit_cred()``.
There are a couple of convenience functions to access bits of another task's
credentials, hiding the RCU magic from the caller:
credentials, hiding the RCU magic from the caller::
uid_t task_uid(task) Task's real UID
uid_t task_euid(task) Task's effective UID
If the caller is holding the RCU read lock at the time anyway, then:
If the caller is holding the RCU read lock at the time anyway, then::
__task_cred(task)->uid
__task_cred(task)->euid
should be used instead. Similarly, if multiple aspects of a task's credentials
need to be accessed, RCU read lock should be used, __task_cred() called, the
result stored in a temporary pointer and then the credential aspects called
need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
the result stored in a temporary pointer and then the credential aspects called
from that before dropping the lock. This prevents the potentially expensive
RCU magic from being invoked multiple times.
Should some other single aspect of another task's credentials need to be
accessed, then this can be used:
accessed, then this can be used::
task_cred_xxx(task, member)
where 'member' is a non-pointer member of the cred struct. For instance:
where 'member' is a non-pointer member of the cred struct. For instance::
uid_t task_cred_xxx(task, suid);
@ -451,7 +425,7 @@ magic. This may not be used for pointer members as what they point to may
disappear the moment the RCU read lock is dropped.
ALTERING CREDENTIALS
Altering Credentials
--------------------
As previously mentioned, a task may only alter its own credentials, and may not
@ -459,7 +433,7 @@ alter those of another task. This means that it doesn't need to use any
locking to alter its own credentials.
To alter the current process's credentials, a function should first prepare a
new set of credentials by calling:
new set of credentials by calling::
struct cred *prepare_creds(void);
@ -467,9 +441,10 @@ this locks current->cred_replace_mutex and then allocates and constructs a
duplicate of the current process's credentials, returning with the mutex still
held if successful. It returns NULL if not successful (out of memory).
The mutex prevents ptrace() from altering the ptrace state of a process whilst
security checks on credentials construction and changing is taking place as
the ptrace state may alter the outcome, particularly in the case of execve().
The mutex prevents ``ptrace()`` from altering the ptrace state of a process
whilst security checks on credentials construction and changing is taking place
as the ptrace state may alter the outcome, particularly in the case of
``execve()``.
The new credentials set should be altered appropriately, and any security
checks and hooks done. Both the current and the proposed sets of credentials
@ -478,36 +453,37 @@ still at this point.
When the credential set is ready, it should be committed to the current process
by calling:
by calling::
int commit_creds(struct cred *new);
This will alter various aspects of the credentials and the process, giving the
LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually
commit the new credentials to current->cred, it will release
current->cred_replace_mutex to allow ptrace() to take place, and it will notify
the scheduler and others of the changes.
LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
actually commit the new credentials to ``current->cred``, it will release
``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
will notify the scheduler and others of the changes.
This function is guaranteed to return 0, so that it can be tail-called at the
end of such functions as sys_setresuid().
end of such functions as ``sys_setresuid()``.
Note that this function consumes the caller's reference to the new credentials.
The caller should _not_ call put_cred() on the new credentials afterwards.
The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
Furthermore, once this function has been called on a new set of credentials,
those credentials may _not_ be changed further.
Should the security checks fail or some other error occur after prepare_creds()
has been called, then the following function should be invoked:
Should the security checks fail or some other error occur after
``prepare_creds()`` has been called, then the following function should be
invoked::
void abort_creds(struct cred *new);
This releases the lock on current->cred_replace_mutex that prepare_creds() got
and then releases the new credentials.
This releases the lock on ``current->cred_replace_mutex`` that
``prepare_creds()`` got and then releases the new credentials.
A typical credentials alteration function would look something like this:
A typical credentials alteration function would look something like this::
int alter_suid(uid_t suid)
{
@ -529,53 +505,50 @@ A typical credentials alteration function would look something like this:
}
MANAGING CREDENTIALS
Managing Credentials
--------------------
There are some functions to help manage credentials:
(*) void put_cred(const struct cred *cred);
- ``void put_cred(const struct cred *cred);``
This releases a reference to the given set of credentials. If the
reference count reaches zero, the credentials will be scheduled for
destruction by the RCU system.
(*) const struct cred *get_cred(const struct cred *cred);
- ``const struct cred *get_cred(const struct cred *cred);``
This gets a reference on a live set of credentials, returning a pointer to
that set of credentials.
(*) struct cred *get_new_cred(struct cred *cred);
- ``struct cred *get_new_cred(struct cred *cred);``
This gets a reference on a set of credentials that is under construction
and is thus still mutable, returning a pointer to that set of credentials.
=====================
OPEN FILE CREDENTIALS
Open File Credentials
=====================
When a new file is opened, a reference is obtained on the opening task's
credentials and this is attached to the file struct as 'f_cred' in place of
'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid
should now access file->f_cred->fsuid and file->f_cred->fsgid.
credentials and this is attached to the file struct as ``f_cred`` in place of
``f_uid`` and ``f_gid``. Code that used to access ``file->f_uid`` and
``file->f_gid`` should now access ``file->f_cred->fsuid`` and
``file->f_cred->fsgid``.
It is safe to access f_cred without the use of RCU or locking because the
It is safe to access ``f_cred`` without the use of RCU or locking because the
pointer will not change over the lifetime of the file struct, and nor will the
contents of the cred struct pointed to, barring the exceptions listed above
(see the Task Credentials section).
=======================================
OVERRIDING THE VFS'S USE OF CREDENTIALS
Overriding the VFS's Use of Credentials
=======================================
Under some circumstances it is desirable to override the credentials used by
the VFS, and that can be done by calling into such as vfs_mkdir() with a
the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
different set of credentials. This is done in the following places:
(*) sys_faccessat().
(*) do_coredump().
(*) nfs4recover.c.
* ``sys_faccessat()``.
* ``do_coredump()``.
* nfs4recover.c.

View File

@ -1,7 +1,13 @@
======================
Security documentation
Security Documentation
======================
.. toctree::
:maxdepth: 1
credentials
IMA-templates
keys/index
LSM
self-protection
tpm/index

View File

@ -1,4 +1,6 @@
==========================================
Encrypted keys for the eCryptfs filesystem
==========================================
ECryptfs is a stacked filesystem which transparently encrypts and decrypts each
file using a randomly generated File Encryption Key (FEK).
@ -35,11 +37,14 @@ controlled environment. Another advantage is that the key is not exposed to
threats of malicious software, because it is available in clear form only at
kernel level.
Usage:
Usage::
keyctl add encrypted name "new ecryptfs key-type:master-key-name keylen" ring
keyctl add encrypted name "load hex_blob" ring
keyctl update keyid "update key-type:master-key-name"
Where::
name:= '<16 hexadecimal characters>'
key-type:= 'trusted' | 'user'
keylen:= 64
@ -48,7 +53,7 @@ keylen:= 64
Example of encrypted key usage with the eCryptfs filesystem:
Create an encrypted key "1000100010001000" of length 64 bytes with format
'ecryptfs' and save it using a previously loaded user key "test":
'ecryptfs' and save it using a previously loaded user key "test"::
$ keyctl add encrypted 1000100010001000 "new ecryptfs user:test 64" @u
19184530
@ -62,7 +67,7 @@ Create an encrypted key "1000100010001000" of length 64 bytes with format
$ keyctl pipe 19184530 > ecryptfs.blob
Mount an eCryptfs filesystem using the created encrypted key "1000100010001000"
into the '/secret' directory:
into the '/secret' directory::
$ mount -i -t ecryptfs -oecryptfs_sig=1000100010001000,\
ecryptfs_cipher=aes,ecryptfs_key_bytes=32 /secret /secret

View File

@ -0,0 +1,11 @@
===========
Kernel Keys
===========
.. toctree::
:maxdepth: 1
core
ecryptfs
request-key
trusted-encrypted

View File

@ -1,5 +1,5 @@
===================
KEY REQUEST SERVICE
Key Request Service
===================
The key request service is part of the key retention service (refer to
@ -7,13 +7,13 @@ Documentation/security/keys.txt). This document explains more fully how
the requesting algorithm works.
The process starts by either the kernel requesting a service by calling
request_key*():
``request_key*()``::
struct key *request_key(const struct key_type *type,
const char *description,
const char *callout_info);
or:
or::
struct key *request_key_with_auxdata(const struct key_type *type,
const char *description,
@ -21,14 +21,14 @@ or:
size_t callout_len,
void *aux);
or:
or::
struct key *request_key_async(const struct key_type *type,
const char *description,
const char *callout_info,
size_t callout_len);
or:
or::
struct key *request_key_async_with_auxdata(const struct key_type *type,
const char *description,
@ -36,7 +36,7 @@ or:
size_t callout_len,
void *aux);
Or by userspace invoking the request_key system call:
Or by userspace invoking the request_key system call::
key_serial_t request_key(const char *type,
const char *description,
@ -67,38 +67,37 @@ own upcall mechanisms. If they do, then those should be substituted for the
forking and execution of /sbin/request-key.
===========
THE PROCESS
The Process
===========
A request proceeds in the following manner:
(1) Process A calls request_key() [the userspace syscall calls the kernel
1) Process A calls request_key() [the userspace syscall calls the kernel
interface].
(2) request_key() searches the process's subscribed keyrings to see if there's
2) request_key() searches the process's subscribed keyrings to see if there's
a suitable key there. If there is, it returns the key. If there isn't,
and callout_info is not set, an error is returned. Otherwise the process
proceeds to the next step.
(3) request_key() sees that A doesn't have the desired key yet, so it creates
3) request_key() sees that A doesn't have the desired key yet, so it creates
two things:
(a) An uninstantiated key U of requested type and description.
a) An uninstantiated key U of requested type and description.
(b) An authorisation key V that refers to key U and notes that process A
b) An authorisation key V that refers to key U and notes that process A
is the context in which key U should be instantiated and secured, and
from which associated key requests may be satisfied.
(4) request_key() then forks and executes /sbin/request-key with a new session
4) request_key() then forks and executes /sbin/request-key with a new session
keyring that contains a link to auth key V.
(5) /sbin/request-key assumes the authority associated with key U.
5) /sbin/request-key assumes the authority associated with key U.
(6) /sbin/request-key execs an appropriate program to perform the actual
6) /sbin/request-key execs an appropriate program to perform the actual
instantiation.
(7) The program may want to access another key from A's context (say a
7) The program may want to access another key from A's context (say a
Kerberos TGT key). It just requests the appropriate key, and the keyring
search notes that the session keyring has auth key V in its bottom level.
@ -106,14 +105,14 @@ A request proceeds in the following manner:
UID, GID, groups and security info of process A as if it was process A,
and come up with key W.
(8) The program then does what it must to get the data with which to
8) The program then does what it must to get the data with which to
instantiate key U, using key W as a reference (perhaps it contacts a
Kerberos server using the TGT) and then instantiates key U.
(9) Upon instantiating key U, auth key V is automatically revoked so that it
9) Upon instantiating key U, auth key V is automatically revoked so that it
may not be used again.
(10) The program then exits 0 and request_key() deletes key V and returns key
10) The program then exits 0 and request_key() deletes key V and returns key
U to the caller.
This also extends further. If key W (step 7 above) didn't exist, key W would
@ -127,8 +126,7 @@ This is because process A's keyrings can't simply be attached to
of them, and (b) it requires the same UID/GID/Groups all the way through.
====================================
NEGATIVE INSTANTIATION AND REJECTION
Negative Instantiation And Rejection
====================================
Rather than instantiating a key, it is possible for the possessor of an
@ -145,23 +143,22 @@ signal, the key under construction will be automatically negatively
instantiated for a short amount of time.
====================
THE SEARCH ALGORITHM
The Search Algorithm
====================
A search of any particular keyring proceeds in the following fashion:
(1) When the key management code searches for a key (keyring_search_aux) it
1) When the key management code searches for a key (keyring_search_aux) it
firstly calls key_permission(SEARCH) on the keyring it's starting with,
if this denies permission, it doesn't search further.
(2) It considers all the non-keyring keys within that keyring and, if any key
2) It considers all the non-keyring keys within that keyring and, if any key
matches the criteria specified, calls key_permission(SEARCH) on it to see
if the key is allowed to be found. If it is, that key is returned; if
not, the search continues, and the error code is retained if of higher
priority than the one currently set.
(3) It then considers all the keyring-type keys in the keyring it's currently
3) It then considers all the keyring-type keys in the keyring it's currently
searching. It calls key_permission(SEARCH) on each keyring, and if this
grants permission, it recurses, executing steps (2) and (3) on that
keyring.
@ -173,20 +170,20 @@ returned.
When search_process_keyrings() is invoked, it performs the following searches
until one succeeds:
(1) If extant, the process's thread keyring is searched.
1) If extant, the process's thread keyring is searched.
(2) If extant, the process's process keyring is searched.
2) If extant, the process's process keyring is searched.
(3) The process's session keyring is searched.
3) The process's session keyring is searched.
(4) If the process has assumed the authority associated with a request_key()
4) If the process has assumed the authority associated with a request_key()
authorisation key then:
(a) If extant, the calling process's thread keyring is searched.
a) If extant, the calling process's thread keyring is searched.
(b) If extant, the calling process's process keyring is searched.
b) If extant, the calling process's process keyring is searched.
(c) The calling process's session keyring is searched.
c) The calling process's session keyring is searched.
The moment one succeeds, all pending errors are discarded and the found key is
returned.
@ -194,7 +191,7 @@ returned.
Only if all these fail does the whole thing fail with the highest priority
error. Note that several errors may have come from LSM.
The error priority is:
The error priority is::
EKEYREVOKED > EKEYEXPIRED > ENOKEY

View File

@ -1,4 +1,6 @@
==========================
Trusted and Encrypted Keys
==========================
Trusted and Encrypted Keys are two new key types added to the existing kernel
key ring service. Both of these new types are variable length symmetric keys,
@ -20,7 +22,8 @@ By default, trusted keys are sealed under the SRK, which has the default
authorization value (20 zeros). This can be set at takeownership time with the
trouser's utility: "tpm_takeownership -u -z".
Usage:
Usage::
keyctl add trusted name "new keylen [options]" ring
keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring
keyctl update key "update [options]"
@ -64,19 +67,22 @@ The decrypted portion of encrypted keys can contain either a simple symmetric
key or a more complex structure. The format of the more complex structure is
application specific, which is identified by 'format'.
Usage:
Usage::
keyctl add encrypted name "new [format] key-type:master-key-name keylen"
ring
keyctl add encrypted name "load hex_blob" ring
keyctl update keyid "update key-type:master-key-name"
Where::
format:= 'default | ecryptfs'
key-type:= 'trusted' | 'user'
Examples of trusted and encrypted key usage:
Create and save a trusted key named "kmk" of length 32 bytes:
Create and save a trusted key named "kmk" of length 32 bytes::
$ keyctl add trusted kmk "new 32" @u
440502848
@ -99,7 +105,7 @@ Create and save a trusted key named "kmk" of length 32 bytes:
$ keyctl pipe 440502848 > kmk.blob
Load a trusted key from the saved blob:
Load a trusted key from the saved blob::
$ keyctl add trusted kmk "load `cat kmk.blob`" @u
268728824
@ -114,7 +120,7 @@ Load a trusted key from the saved blob:
f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
e4a8aea2b607ec96931e6f4d4fe563ba
Reseal a trusted key under new pcr values:
Reseal a trusted key under new pcr values::
$ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`"
$ keyctl print 268728824
@ -135,11 +141,13 @@ compromised by a user level problem, and when sealed to specific boot PCR
values, protects against boot and offline attacks. Create and save an
encrypted key "evm" using the above trusted key "kmk":
option 1: omitting 'format'
option 1: omitting 'format'::
$ keyctl add encrypted evm "new trusted:kmk 32" @u
159771175
option 2: explicitly defining 'format' as 'default'
option 2: explicitly defining 'format' as 'default'::
$ keyctl add encrypted evm "new default trusted:kmk 32" @u
159771175
@ -150,7 +158,7 @@ option 2: explicitly defining 'format' as 'default'
$ keyctl pipe 159771175 > evm.blob
Load an encrypted key "evm" from saved blob:
Load an encrypted key "evm" from saved blob::
$ keyctl add encrypted evm "load `cat evm.blob`" @u
831684262
@ -164,4 +172,4 @@ Other uses for trusted and encrypted keys, such as for disk and file encryption
are anticipated. In particular the new format 'ecryptfs' has been defined in
in order to use encrypted keys to mount an eCryptfs filesystem. More details
about the usage can be found in the file
'Documentation/security/keys-ecryptfs.txt'.
``Documentation/security/keys-ecryptfs.txt``.

View File

@ -1,4 +1,6 @@
# Kernel Self-Protection
======================
Kernel Self-Protection
======================
Kernel self-protection is the design and implementation of systems and
structures within the Linux kernel to protect against security flaws in
@ -26,7 +28,8 @@ mentioning them, since these aspects need to be explored, dealt with,
and/or accepted.
## Attack Surface Reduction
Attack Surface Reduction
========================
The most fundamental defense against security exploits is to reduce the
areas of the kernel that can be used to redirect execution. This ranges
@ -34,13 +37,15 @@ from limiting the exposed APIs available to userspace, making in-kernel
APIs hard to use incorrectly, minimizing the areas of writable kernel
memory, etc.
### Strict kernel memory permissions
Strict kernel memory permissions
--------------------------------
When all of kernel memory is writable, it becomes trivial for attacks
to redirect execution flow. To reduce the availability of these targets
the kernel needs to protect its memory with a tight set of permissions.
#### Executable code and read-only data must not be writable
Executable code and read-only data must not be writable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Any areas of the kernel with executable memory must not be writable.
While this obviously includes the kernel text itself, we must consider
@ -51,18 +56,19 @@ kernel, they are implemented in a way where the memory is temporarily
made writable during the update, and then returned to the original
permissions.)
In support of this are CONFIG_STRICT_KERNEL_RWX and
CONFIG_STRICT_MODULE_RWX, which seek to make sure that code is not
In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and
``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not
writable, data is not executable, and read-only data is neither writable
nor executable.
Most architectures have these options on by default and not user selectable.
For some architectures like arm that wish to have these be selectable,
the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable
a Kconfig prompt. CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT determines
a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines
the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled.
#### Function pointers and sensitive variables must not be writable
Function pointers and sensitive variables must not be writable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Vast areas of kernel memory contain function pointers that are looked
up by the kernel and used to continue execution (e.g. descriptor/vector
@ -74,8 +80,8 @@ so that they live in the .rodata section instead of the .data section
of the kernel, gaining the protection of the kernel's strict memory
permissions as described above.
For variables that are initialized once at __init time, these can
be marked with the (new and under development) __ro_after_init
For variables that are initialized once at ``__init`` time, these can
be marked with the (new and under development) ``__ro_after_init``
attribute.
What remains are variables that are updated rarely (e.g. GDT). These
@ -85,7 +91,8 @@ of their lifetime read-only. (For example, when being updated, only the
CPU thread performing the update would be given uninterruptible write
access to the memory.)
#### Segregation of kernel memory from userspace memory
Segregation of kernel memory from userspace memory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The kernel must never execute userspace memory. The kernel must also never
access userspace memory without explicit expectation to do so. These
@ -95,10 +102,11 @@ By blocking userspace memory in this way, execution and data parsing
cannot be passed to trivially-controlled userspace memory, forcing
attacks to operate entirely in kernel memory.
### Reduced access to syscalls
Reduced access to syscalls
--------------------------
One trivial way to eliminate many syscalls for 64-bit systems is building
without CONFIG_COMPAT. However, this is rarely a feasible scenario.
without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario.
The "seccomp" system provides an opt-in feature made available to
userspace, which provides a way to reduce the number of kernel entry
@ -112,7 +120,8 @@ to trusted processes. This would keep the scope of kernel entry points
restricted to the more regular set of normally available to unprivileged
userspace.
### Restricting access to kernel modules
Restricting access to kernel modules
------------------------------------
The kernel should never allow an unprivileged user the ability to
load specific kernel modules, since that would provide a facility to
@ -127,11 +136,12 @@ for debate in some scenarios.)
To protect against even privileged users, systems may need to either
disable module loading entirely (e.g. monolithic kernel builds or
modules_disabled sysctl), or provide signed modules (e.g.
CONFIG_MODULE_SIG_FORCE, or dm-crypt with LoadPin), to keep from having
``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having
root load arbitrary kernel code via the module loader interface.
## Memory integrity
Memory integrity
================
There are many memory structures in the kernel that are regularly abused
to gain execution control during an attack, By far the most commonly
@ -139,16 +149,18 @@ understood is that of the stack buffer overflow in which the return
address stored on the stack is overwritten. Many other examples of this
kind of attack exist, and protections exist to defend against them.
### Stack buffer overflow
Stack buffer overflow
---------------------
The classic stack buffer overflow involves writing past the expected end
of a variable stored on the stack, ultimately writing a controlled value
to the stack frame's stored return address. The most widely used defense
is the presence of a stack canary between the stack variables and the
return address (CONFIG_CC_STACKPROTECTOR), which is verified just before
return address (``CONFIG_CC_STACKPROTECTOR``), which is verified just before
the function returns. Other defenses include things like shadow stacks.
### Stack depth overflow
Stack depth overflow
--------------------
A less well understood attack is using a bug that triggers the
kernel to consume stack memory with deep function calls or large stack
@ -158,27 +170,31 @@ important changes need to be made for better protections: moving the
sensitive thread_info structure elsewhere, and adding a faulting memory
hole at the bottom of the stack to catch these overflows.
### Heap memory integrity
Heap memory integrity
---------------------
The structures used to track heap free lists can be sanity-checked during
allocation and freeing to make sure they aren't being used to manipulate
other memory areas.
### Counter integrity
Counter integrity
-----------------
Many places in the kernel use atomic counters to track object references
or perform similar lifetime management. When these counters can be made
to wrap (over or under) this traditionally exposes a use-after-free
flaw. By trapping atomic wrapping, this class of bug vanishes.
### Size calculation overflow detection
Size calculation overflow detection
-----------------------------------
Similar to counter overflow, integer overflows (usually size calculations)
need to be detected at runtime to kill this class of bug, which
traditionally leads to being able to write past the end of kernel buffers.
## Statistical defenses
Probabilistic defenses
======================
While many protections can be considered deterministic (e.g. read-only
memory cannot be written to), some protections provide only statistical
@ -186,7 +202,8 @@ defense, in that an attack must gather enough information about a
running system to overcome the defense. While not perfect, these do
provide meaningful defenses.
### Canaries, blinding, and other secrets
Canaries, blinding, and other secrets
-------------------------------------
It should be noted that things like the stack canary discussed earlier
are technically statistical defenses, since they rely on a secret value,
@ -201,7 +218,8 @@ It is critical that the secret values used must be separate (e.g.
different canary per stack) and high entropy (e.g. is the RNG actually
working?) in order to maximize their success.
### Kernel Address Space Layout Randomization (KASLR)
Kernel Address Space Layout Randomization (KASLR)
-------------------------------------------------
Since the location of kernel memory is almost always instrumental in
mounting a successful attack, making the location non-deterministic
@ -209,22 +227,25 @@ raises the difficulty of an exploit. (Note that this in turn makes
the value of information exposures higher, since they may be used to
discover desired memory locations.)
#### Text and module base
Text and module base
~~~~~~~~~~~~~~~~~~~~
By relocating the physical and virtual base address of the kernel at
boot-time (CONFIG_RANDOMIZE_BASE), attacks needing kernel code will be
boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be
frustrated. Additionally, offsetting the module loading base address
means that even systems that load the same set of modules in the same
order every boot will not share a common base address with the rest of
the kernel text.
#### Stack base
Stack base
~~~~~~~~~~
If the base address of the kernel stack is not the same between processes,
or even not the same between syscalls, targets on or beyond the stack
become more difficult to locate.
#### Dynamic memory base
Dynamic memory base
~~~~~~~~~~~~~~~~~~~
Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up
being relatively deterministic in layout due to the order of early-boot
@ -232,7 +253,8 @@ initializations. If the base address of these areas is not the same
between boots, targeting them is frustrated, requiring an information
exposure specific to the region.
#### Structure layout
Structure layout
~~~~~~~~~~~~~~~~
By performing a per-build randomization of the layout of sensitive
structures, attacks must either be tuned to known kernel builds or expose
@ -240,26 +262,30 @@ enough kernel memory to determine structure layouts before manipulating
them.
## Preventing Information Exposures
Preventing Information Exposures
================================
Since the locations of sensitive structures are the primary target for
attacks, it is important to defend against exposure of both kernel memory
addresses and kernel memory contents (since they may contain kernel
addresses or other sensitive things like canary values).
### Unique identifiers
Unique identifiers
------------------
Kernel memory addresses must never be used as identifiers exposed to
userspace. Instead, use an atomic counter, an idr, or similar unique
identifier.
### Memory initialization
Memory initialization
---------------------
Memory copied to userspace must always be fully initialized. If not
explicitly memset(), this will require changes to the compiler to make
sure structure holes are cleared.
### Memory poisoning
Memory poisoning
----------------
When releasing memory, it is best to poison the contents (clear stack on
syscall return, wipe heap memory on a free), to avoid reuse attacks that
@ -267,9 +293,10 @@ rely on the old contents of memory. This frustrates many uninitialized
variable attacks, stack content exposures, heap content exposures, and
use-after-free attacks.
### Destination tracking
Destination tracking
--------------------
To help kill classes of bugs that result in kernel addresses being
written to userspace, the destination of writes needs to be tracked. If
the buffer is destined for userspace (e.g. seq_file backed /proc files),
the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files),
it should automatically censor sensitive values.

10
Documentation/sh/conf.py Normal file
View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "SuperH architecture implementation manual"
tags.add("subproject")
latex_documents = [
('index', 'sh.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -0,0 +1,59 @@
=======================
SuperH Interfaces Guide
=======================
:Author: Paul Mundt
Memory Management
=================
SH-4
----
Store Queue API
~~~~~~~~~~~~~~~
.. kernel-doc:: arch/sh/kernel/cpu/sh4/sq.c
:export:
SH-5
----
TLB Interfaces
~~~~~~~~~~~~~~
.. kernel-doc:: arch/sh/mm/tlb-sh5.c
:internal:
.. kernel-doc:: arch/sh/include/asm/tlb_64.h
:internal:
Machine Specific Interfaces
===========================
mach-dreamcast
--------------
.. kernel-doc:: arch/sh/boards/mach-dreamcast/rtc.c
:internal:
mach-x3proto
------------
.. kernel-doc:: arch/sh/boards/mach-x3proto/ilsel.c
:export:
Busses
======
SuperHyway
----------
.. kernel-doc:: drivers/sh/superhyway/superhyway.c
:export:
Maple
-----
.. kernel-doc:: drivers/sh/maple/maple.c
:export:

View File

@ -0,0 +1,10 @@
# -*- coding: utf-8; mode: python -*-
project = "Linux Sound Subsystem Documentation"
tags.add("subproject")
latex_documents = [
('index', 'sound.tex', project,
'The kernel development community', 'manual'),
]

View File

@ -1,18 +0,0 @@
#
# Pandoc doesn't grok <function> or <structname>, so convert them
# ahead of time.
#
# Use the following escapes to pass through pandoc:
# $bq = "`"
# $lt = "<"
# $gt = ">"
#
s%<function>\([^<(]\+\)()</function>%:c:func:$bq\1()$bq%g
s%<function>\([^<(]\+\)</function>%:c:func:$bq\1()$bq%g
s%<structname>struct *\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g
s%struct <structname>\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g
s%<structname>\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g
#
# Wrap docproc directives in para and code blocks.
#
s%^\(!.*\)$%<para><code>DOCPROC: \1</code></para>%

View File

@ -1,23 +0,0 @@
#
# Unescape.
#
s/$bq/`/g
s/$lt/</g
s/$gt/>/g
#
# pandoc thinks that both "_" needs to be escaped. Remove the extra
# backslashes.
#
s/\\_/_/g
#
# Unwrap docproc directives.
#
s/^``DOCPROC: !E\(.*\)``$/.. kernel-doc:: \1\n :export:/
s/^``DOCPROC: !I\(.*\)``$/.. kernel-doc:: \1\n :internal:/
s/^``DOCPROC: !F\([^ ]*\) \(.*\)``$/.. kernel-doc:: \1\n :functions: \2/
s/^``DOCPROC: !P\([^ ]*\) \(.*\)``$/.. kernel-doc:: \1\n :doc: \2/
s/^``DOCPROC: \(!.*\)``$/.. WARNING: DOCPROC directive not supported: \1/
#
# Trim trailing whitespace.
#
s/[[:space:]]*$//

View File

@ -1,28 +0,0 @@
#!/bin/bash
#
# Convert a template file into something like RST
#
# fix <function>
# feed to pandoc
# fix \_
# title line?
#
set -eu
if [ "$#" != "2" ]; then
echo "$0 <docbook file> <rst file>"
exit
fi
DIR=$(dirname $0)
in=$1
rst=$2
tmp=$rst.tmp
cp $in $tmp
sed --in-place -f $DIR/convert_template.sed $tmp
pandoc -s -S -f docbook -t rst -o $rst $tmp
sed --in-place -f $DIR/post_convert.sed $rst
rm $tmp
echo "book writen to $rst"

View File

@ -197,13 +197,6 @@ ReSTマークアップを使ったドキュメントは Documentation/outputに
make latexdocs
make epubdocs
現在、幾つかの DocBook形式で書かれたドキュメントは ReST形式に転換中で
す。それらのドキュメントはDocumentation/DocBook ディレクトリに生成され、
Postscript または man ページの形式を生成するには以下のようにします - ::
make psdocs
make mandocs
カーネル開発者になるには
------------------------

View File

@ -191,13 +191,6 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된
make latexdocs
make epubdocs
현재, ReST 로의 변환이 진행중인, DocBook 으로 쓰인 문서들이 존재한다. 그런
문서들은 Documentation/DocBook/ 디렉토리 안에 생성될 것이고 다음 커맨드를 통해
Postscript 나 man page 로도 만들어질 수 있다::
make psdocs
make mandocs
커널 개발자가 되는 것
---------------------
@ -270,15 +263,17 @@ pub/linux/kernel/v4.x/ 디렉토리에서 참조될 수 있다.개발 프로세
선호되는 방법은 git(커널의 소스 관리 툴, 더 많은 정보들은
https://git-scm.com/ 에서 참조할 수 있다)를 사용하는 것이지만 순수한
패치파일의 형식으로 보내는 것도 무관하다.
- 2주 후에 -rc1 커널이 배포되며 지금부터는 전체 커널의 안정성에 영향을
미칠수 있는 새로운 기능들을 포함하지 않는 패치들만이 추가될 수 있다.
완전히 새로운 드라이버(혹은 파일시스템)는 -rc1 이후에만 받아들여진다는
것을 기억해라. 왜냐하면 변경이 자체내에서만 발생하고 추가된 코드가
드라이버 외부의 다른 부분에는 영향을 주지 않으므로 그런 변경은
- 2주 후에 -rc1 커널이 릴리즈되며 여기서부터의 주안점은 새로운 커널을
가능한한 안정되게 하는 것이다. 이 시점에서의 대부분의 패치들은
회귀(역자주: 이전에는 존재하지 않았지만 새로운 기능추가나 변경으로 인해
생겨난 버그)를 일으킬 만한 위험을 가지고 있지 않기 때문이다. -rc1이
배포된 이후에 git를 사용하여 패치들을 Linus에게 보낼수 있지만 패치들은
공식적인 메일링 리스트로 보내서 검토를 받을 필요가 있다.
생겨난 버그)를 고쳐야 한다. 이전부터 존재한 버그는 회귀가 아니므로, 그런
버그에 대한 수정사항은 중요한 경우에만 보내져야 한다. 완전히 새로운
드라이버(혹은 파일시스템)는 -rc1 이후에만 받아들여진다는 것을 기억해라.
왜냐하면 변경이 자체내에서만 발생하고 추가된 코드가 드라이버 외부의 다른
부분에는 영향을 주지 않으므로 그런 변경은 회귀를 일으킬 만한 위험을 가지고
있지 않기 때문이다. -rc1이 배포된 이후에 git를 사용하여 패치들을 Linus에게
보낼수 있지만 패치들은 공식적인 메일링 리스트로 보내서 검토를 받을 필요가
있다.
- 새로운 -rc는 Linus가 현재 git tree가 테스트 하기에 충분히 안정된 상태에
있다고 판단될 때마다 배포된다. 목표는 새로운 -rc 커널을 매주 배포하는
것이다.

Some files were not shown because too many files have changed in this diff Show More