Pull poll annotations from Al Viro:
"This introduces a __bitwise type for POLL### bitmap, and propagates
the annotations through the tree. Most of that stuff is as simple as
'make ->poll() instances return __poll_t and do the same to local
variables used to hold the future return value'.
Some of the obvious brainos found in process are fixed (e.g. POLLIN
misspelled as POLL_IN). At that point the amount of sparse warnings is
low and most of them are for genuine bugs - e.g. ->poll() instance
deciding to return -EINVAL instead of a bitmap. I hadn't touched those
in this series - it's large enough as it is.
Another problem it has caught was eventpoll() ABI mess; select.c and
eventpoll.c assumed that corresponding POLL### and EPOLL### were
equal. That's true for some, but not all of them - EPOLL### are
arch-independent, but POLL### are not.
The last commit in this series separates userland POLL### values from
the (now arch-independent) kernel-side ones, converting between them
in the few places where they are copied to/from userland. AFAICS, this
is the least disruptive fix preserving poll(2) ABI and making epoll()
work on all architectures.
As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
it will trigger only on what would've triggered EPOLLWRBAND on other
architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
at all on sparc. With this patch they should work consistently on all
architectures"
* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
make kernel-side POLL... arch-independent
eventpoll: no need to mask the result of epi_item_poll() again
eventpoll: constify struct epoll_event pointers
debugging printk in sg_poll() uses %x to print POLL... bitmap
annotate poll(2) guts
9p: untangle ->poll() mess
->si_band gets POLL... bitmap stored into a user-visible long field
ring_buffer_poll_wait() return value used as return value of ->poll()
the rest of drivers/*: annotate ->poll() instances
media: annotate ->poll() instances
fs: annotate ->poll() instances
ipc, kernel, mm: annotate ->poll() instances
net: annotate ->poll() instances
apparmor: annotate ->poll() instances
tomoyo: annotate ->poll() instances
sound: annotate ->poll() instances
acpi: annotate ->poll() instances
crypto: annotate ->poll() instances
block: annotate ->poll() instances
x86: annotate ->poll() instances
...
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 651e28c553.
This caused a regression:
"The specific problem is that dnsmasq refuses to start on openSUSE Leap
42.2. The specific cause is that and attempt to open a PF_LOCAL socket
gets EACCES. This means that networking doesn't function on a system
with a 4.14-rc2 system."
Sadly, the developers involved seemed to be in denial for several weeks
about this, delaying the revert. This has not been a good release for
the security subsystem, and this area needs to change development
practices.
Reported-and-bisected-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Seth Arnold <seth.arnold@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The DAC access permissions for several apparmorfs files are wrong.
.access - needs to be writable by all tasks to perform queries
the others in the set only provide a read fn so should be read only.
With policy namespace virtualization all apparmor needs to control
the permission and visibility checks directly which means DAC
access has to be allowed for all user, group, and other.
BugLink: http://bugs.launchpad.net/bugs/1713103
Fixes: c97204baf8 ("apparmor: rename apparmor file fns and data to indicate use")
Signed-off-by: John Johansen <john.johansen@canonical.com>
Add signal mediation where the signal can be mediated based on the
signal, direction, or the label or the peer/target. The signal perms
are verified on a cross check to ensure policy consistency in the case
of incremental policy load/replacement.
The optimization of skipping the cross check when policy is guaranteed
to be consistent (single compile unit) remains to be done.
policy rules have the form of
SIGNAL_RULE = [ QUALIFIERS ] 'signal' [ SIGNAL ACCESS PERMISSIONS ]
[ SIGNAL SET ] [ SIGNAL PEER ]
SIGNAL ACCESS PERMISSIONS = SIGNAL ACCESS | SIGNAL ACCESS LIST
SIGNAL ACCESS LIST = '(' Comma or space separated list of SIGNAL
ACCESS ')'
SIGNAL ACCESS = ( 'r' | 'w' | 'rw' | 'read' | 'write' | 'send' |
'receive' )
SIGNAL SET = 'set' '=' '(' SIGNAL LIST ')'
SIGNAL LIST = Comma or space separated list of SIGNALS
SIGNALS = ( 'hup' | 'int' | 'quit' | 'ill' | 'trap' | 'abrt' |
'bus' | 'fpe' | 'kill' | 'usr1' | 'segv' | 'usr2' |
'pipe' | 'alrm' | 'term' | 'stkflt' | 'chld' | 'cont' |
'stop' | 'stp' | 'ttin' | 'ttou' | 'urg' | 'xcpu' |
'xfsz' | 'vtalrm' | 'prof' | 'winch' | 'io' | 'pwr' |
'sys' | 'emt' | 'exists' | 'rtmin+0' ... 'rtmin+32'
)
SIGNAL PEER = 'peer' '=' AARE
eg.
signal, # allow all signals
signal send set=(hup, kill) peer=foo,
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@canonical.com>
We accidentally forgot to set the error code on this path. It means we
return NULL instead of an error pointer. I looked through a bunch of
callers and I don't think it really causes a big issue, but the
documentation says we're supposed to return error pointers here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Begin the actual switch to using domain labels by storing them on
the context and converting the label to a singular profile where
possible.
Signed-off-by: John Johansen <john.johansen@canonical.com>
There are still a few places where profile replacement fails to update
and a stale profile is used for mediation. Fix this by moving to
accessing the current label through a critical section that will
always ensure mediation is using the current label regardless of
whether the tasks cred has been updated or not.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The data being queried isn't always the current profile and a lookup
relative to the current profile should be done.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The namespace being passed into the replace/remove profiles fns() is
not the view, but the namespace specified by the inode from the
file hook (if present) or the loading tasks ns, if accessing the
top level virtualized load/replace file interface.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Allow userspace to query a profile about permissions, through the
transaction interface that is already used to allow userspace to
query about key,value data.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The simple_transaction interface is slow. It requires 4 syscalls
(open, write, read, close) per query and shares a single lock for each
queries.
So replace its use with a compatible in multi_transaction interface.
It allows for a faster 2 syscall pattern per query. After an initial
open, an arbitrary number of writes and reads can be issued. Each
write will reset the query with new data that can be read. Reads do
not clear the data, and can be issued multiple times, and used with
seek, until a new write is performed which will reset the data
available and the seek position.
Note: this keeps the single lock design, if needed moving to a per
file lock will have to come later.
Signed-off-by: John Johansen <john.johansen@canonical.com>
gsettings mediation needs to be able to determine if apparmor supports
label data queries. A label data query can be done to test for support
but its failure is indistinguishable from other failures, making it an
unreliable indicator.
Fix by making support of label data queries available as a flag in the
apparmorfs features dir tree.
Signed-off-by: John Johansen <john.johansen@canonical.com>
When setting up namespaces for containers its easier for them to use
an fs interface to create the namespace for the containers
policy. Allow mkdir/rmdir under the policy/namespaces/ dir to be used
to create and remove namespaces.
BugLink: http://bugs.launchpad.net/bugs/1611078
Signed-off-by: John Johansen <john.johansen@canonical.com>
Add a policy revision file to find the current revision of a ns's policy.
There is a revision file per ns, as well as a virtualized global revision
file in the base apparmor fs directory. The global revision file when
opened will provide the revision of the opening task namespace.
The revision file can be waited on via select/poll to detect apparmor
policy changes from the last read revision of the opened file. This
means that the revision file must be read after the select/poll other
wise update data will remain ready for reading.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Virtualize the apparmor policy/ directory so that the current
namespace affects what part of policy is seen. To do this convert to
using apparmorfs for policy namespace files and setup a magic symlink
in the securityfs apparmor dir to access those files.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
prefixes are used for fns/data that are not static to apparmorfs.c
with the prefixes being
aafs - special magic apparmorfs for policy namespace data
aa_sfs - for fns/data that go into securityfs
aa_fs - for fns/data that may be used in the either of aafs or
securityfs
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
AppArmor policy needs to be able to be resolved based on the policy
namespace a task is confined by. Add a base apparmorfs filesystem that
(like nsfs) will exist as a kern mount and be accessed via jump_link
through a securityfs file.
Setup the base apparmorfs fns and data, but don't use it yet.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
The loaddata sets cover more than just a single profile and should
be tracked at the ns level. Move the load data files under the namespace
and reference the files from the profiles via a symlink.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
We can either return PTR_ERR(NULL) or a PTR_ERR(a valid pointer) here.
Returning NULL is probably not good, but since this happens at boot
then we are probably already toasted if we were to hit this bug in real
life. In other words, it seems like a very low severity bug to me.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Two single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: John Johansen <john.johansen@canonical.com>
A bit of data was put into a sequence by two separate function calls.
Print the same data by a single function call instead.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: John Johansen <john.johansen@canonical.com>
CURRENT_TIME macro is not y2038 safe on 32 bit systems.
The patch replaces all the uses of CURRENT_TIME by current_time().
This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.
CURRENT_TIME macro will be deleted before merging the aforementioned
change.
Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kvmalloc", v5.
There are many open coded kmalloc with vmalloc fallback instances in the
tree. Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.
As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.
Most callers, I could find, have been converted to use the helper
instead. This is patch 6. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.
[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
This patch (of 9):
Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.
This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL). Those need to be
fixed separately.
While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers. Existing ones would have to die
slowly.
[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow a profile to carry extra data that can be queried via userspace.
This provides a means to store extra data in a profile that a trusted
helper can extract and use from live policy.
Signed-off-by: William Hua <william.hua@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Moving the use of fqname to later allows learning profiles to be based
on the fqname request instead of just the hname. It also allows cleaning
up some of the name parsing and lookup by allowing the use of
the fqlookupn_profile() lib fn.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Having ops be an integer that is an index into an op name table is
awkward and brittle. Every op change requires an edit for both the
op constant and a string in the table. Instead switch to using const
strings directly, eliminating the need for the table that needs to
be kept in sync.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Store loaded policy and allow introspecting it through apparmorfs. This
has several uses from debugging, policy validation, and policy checkpoint
and restore for containers.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Policy management will be expanded beyond traditional unconfined root.
This will require knowning the profile of the task doing the management
and the ns view.
Signed-off-by: John Johansen <john.johansen@canonical.com>