This completes the matrix of interfield comparisons between uid/gid
information for the current task and the uid/gid information for inodes.
aka I can audit based on differences between the euid of the process and
the uid of fs objects.
Signed-off-by: Peter Moody <pmoody@google.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than code the same loop over and over implement a helper function which
uses some pointer magic to make it generic enough to be used numerous places
as we implement more audit interfield comparisons
Signed-off-by: Eric Paris <eparis@redhat.com>
We wish to be able to audit when a uid=500 task accesses a file which is
uid=0. Or vice versa. This patch introduces a new audit filter type
AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
should be compared. At this point we only define the task->uid vs
inode->uid, but other comparisons can be added.
Signed-off-by: Eric Paris <eparis@redhat.com>
Just a code cleanup really. We don't need to make a function call just for
it to return on error. This also makes the VFS function even easier to follow
and removes a conditional on a hot path.
Signed-off-by: Eric Paris <eparis@redhat.com>
At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset. We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd. Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.
Systemd has changed how userspace works and allowed us to make the kernel
work the way it should. With systemd users (even admins) are not supposed
to restart services directly. The system will restart the service for
them. Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.
If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used! Since we have old systems I make this a
Kconfig option.
Signed-off-by: Eric Paris <eparis@redhat.com>
The function always deals with current. Don't expose an option
pretending one can use it for something. You can't.
Signed-off-by: Eric Paris <eparis@redhat.com>
Much like the ability to filter audit on the uid of an inode collected, we
should be able to filter on the gid of the inode.
Signed-off-by: Eric Paris <eparis@redhat.com>
Allow syscall exit filter matching based on the uid of the owner of an
inode used in a syscall. aka:
auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa
Signed-off-by: Eric Paris <eparis@redhat.com>
Audit entry,always rules are not allowed and are automatically changed in
exit,always rules in userspace. The kernel refuses to load such rules.
Thus a task in the middle of a syscall (and thus in audit_finish_fork())
can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
going to actually use the code in audit_finish_fork() since it will
return without doing anything. Thus drop the code.
Signed-off-by: Eric Paris <eparis@redhat.com>
A number of audit hooks make function calls before they determine that
auxilary records do not need to be collected. Do those checks as static
inlines since the most common case is going to be that records are not
needed and we can skip the function call overhead.
Signed-off-by: Eric Paris <eparis@redhat.com>
The audit code makes heavy use of likely() and unlikely() macros, but they
don't always make sense. Drop any that seem questionable and let the
computer do it's thing.
Signed-off-by: Eric Paris <eparis@redhat.com>
Every arch calls:
if (unlikely(current->audit_context))
audit_syscall_entry()
which requires knowledge about audit (the existance of audit_context) in
the arch code. Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.
Signed-off-by: Eric Paris <eparis@redhat.com>
The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.
We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.
The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.
In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].
For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.
Signed-off-by: Eric Paris <eparis@redhat.com>
The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon. Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot. Now lets
say we add a watch with a filter on 8,1. If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.
Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents. We might hope that
this would also be logged, but it isn't. The rules will check the
major,minor of the device containing /dev/sda1. In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.
I believe these rules should trigger on either device. The man page is
devoid of useful information about the intended semantics. It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch does 2 things. First it reduces the number of audit_names
allocated in every audit context from 20 to 5. 5 should be enough for all
'normal' syscalls (rename being the worst). Some syscalls can still touch
more the 5 inodes such as mount. When rpc filesystem is mounted it will
create inodes and those can exceed 5. To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5. This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.
Signed-off-by: Eric Paris <eparis@redhat.com>
Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list. The filetype matching
however had a strange way of doing things. It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel. Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea. As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field. The kernel always checked the first
name collected which for the tested rules was always correct.
This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected. It
also changes the rule validation to reject the old unused rule types.
Noone knew it was there. Noone used it. Why keep around the extra code?
Signed-off-by: Eric Paris <eparis@redhat.com>
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.
Nothing to see here but a whole lot of instances of:
-#include <linux/module.h>
+#include <linux/export.h>
This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c69e8d9c01 ("CRED: Use RCU to access another task's creds and to
release a task's own creds") added calls to get_task_cred and put_cred in
audit_filter_rules. Profiling with a large number of audit rules active
on the exit chain shows that we are spending upto 48% in this routine for
syscall intensive tests, most of which is in the atomic ops.
1. The code should be accessing tsk->cred rather than tsk->real_cred.
2. Since tsk is current (or tsk is being created by copy_process) access to
tsk->cred without rcu read lock is possible. At the request of the audit
maintainer, a new flag has been added to audit_filter_rules in order to make
this explicit and guide future code.
Signed-off-by: Tony Jones <tonyj@suse.de>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Normal syscall audit doesn't catch 5th argument of syscall. It also
doesn't catch the contents of userland structures pointed to be
syscall argument, so for both old and new mmap(2) ABI it doesn't
record the descriptor we are mapping. For old one it also misses
flags.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.
get_fs_root()
get_fs_pwd()
get_fs_root_and_pwd()
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.
Signed-off-by: Eric Paris <eparis@redhat.com>
No real changes, just cleanup to the audit_watch split patch which we done
with minimal code changes for easy review. Now fix interfaces to make
things work better.
Signed-off-by: Eric Paris <eparis@redhat.com>
There have been a number of reports of people seeing the message:
"name_count maxed, losing inode data: dev=00:05, inode=3185"
in dmesg. These usually lead to people reporting problems to the filesystem
group who are in turn clueless what they mean.
Eventually someone finds me and I explain what is going on and that
these come from the audit system. The basics of the problem is that the
audit subsystem never expects a single syscall to 'interact' (for some
wish washy meaning of interact) with more than 20 inodes. But in fact
some operations like loading kernel modules can cause changes to lots of
inodes in debugfs.
There are a couple real fixes being bandied about including removing the
fixed compile time limit of 20 or not auditing changes in debugfs (or
both) but neither are small and obvious so I am not sending them for
immediate inclusion (I hope Al forwards a real solution next devel
window).
In the meantime this patch simply adds 'audit' to the beginning of the
crap message so if a user sees it, they come blame me first and we can
talk about what it means and make sure we understand all of the reasons
it can happen and make sure this gets solved correctly in the long run.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* pull ACC_MODE to fs.h; we have several copies all over the place
* nightmarish expression calculating f_mode by f_flags deserves a helper
too (OPEN_FMODE(flags))
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
pahole pointed out that on x86_64 struct audit_context can be rearrainged
to save 16 bytes per struct. Since we have an audit_context per task this
can acually be a pretty significant gain.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If syscall removes the root of subtree being watched, we
definitely do not want the rules refering that subtree
to be destroyed without the syscall in question having
a chance to match them.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A number of places in the audit system we send an op= followed by a string
that includes spaces. Somehow this works but it's just wrong. This patch
moves all of those that I could find to be quoted.
Example:
Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule
key="number2" list=4 res=0
Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule"
key="number2" list=4 res=0
Signed-off-by: Eric Paris <eparis@redhat.com>
In preparation for converting audit to use fsnotify instead of inotify we
seperate the inode watching code into it's own file. This is similar to
how the audit tree watching code is already seperated into audit_tree.c
Signed-off-by: Eric Paris <eparis@redhat.com>
The audit execve record splitting code estimates the length of the message
generated. But it forgot to include the "" that wrap each string in its
estimation. This means that execve messages with lots of tiny (1-2 byte)
arguments could still cause records greater than 8k to be emitted. Simply
fix the estimate.
Signed-off-by: Eric Paris <eparis@redhat.com>
audit_log_d_path had spaces in the strings which would be emitted on the
error paths. This patch simply replaces those spaces with an _ or removes
the needless spaces entirely.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
after 0590b9335a audit_set_auditable() is now only
used by the audit tree code. If CONFIG_AUDIT_TREE is unset it will be defined
but unused. This patch simply moves the function inside a CONFIG_AUDIT_TREE
block.
cc1: warnings being treated as errors
/home/acme_unencrypted/git/linux-2.6-tip/kernel/auditsc.c:745: error: ‘audit_set_auditable’ defined but not used
make[2]: *** [kernel/auditsc.o] Error 1
make[1]: *** [kernel] Error 2
make[1]: *** Waiting for unfinished jobs....
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The audit subsystem treats syscall return codes as type long, unfortunately
the audit_get_context() function mistakenly converts the return code to an
int type in the parameters which could cause problems on systems where the
sizeof(int) != sizeof(long).
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix auditsc kernel-doc notation:
Warning(linux-2.6.28-git7//kernel/auditsc.c:2156): No description found for parameter 'attr'
Warning(linux-2.6.28-git7//kernel/auditsc.c:2156): Excess function parameter 'u_attr' description in '__audit_mq_open'
Warning(linux-2.6.28-git7//kernel/auditsc.c:2204): No description found for parameter 'notification'
Warning(linux-2.6.28-git7//kernel/auditsc.c:2204): Excess function parameter 'u_notification' description in '__audit_mq_notify'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(updated)
Added hunk that changes the comment, the rest is the same.
EXECVE records contain a newline after every argument. auditd converts
"\n" to " " so you cannot see newlines even in raw logs, but they're
there nevertheless. If you're not using auditd, you need to work round
them. These '\n' chars are can be easily replaced by spaces when
creating record in kernel. Note there is no need for trailing '\n' in
an audit record.
record before this patch:
"type=EXECVE msg=audit(1231421801.566:31): argc=4 a0=\"./test\"\na1=\"a\"\na2=\"b\"\na3=\"c\"\n"
record after this patch:
"type=EXECVE msg=audit(1231421801.566:31): argc=4 a0=\"./test\" a1=\"a\" a2=\"b\" a3=\"c\""
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't pull it in sched.h; very few files actually need it and those
can include directly. sched.h itself only needs forward declaration
of struct fs_struct;
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Problem: ordering between the rules on exit chain is currently lost;
all watch and inode rules are listed after everything else _and_
exit,never on one kind doesn't stop exit,always on another from
being matched.
Solution: assign priorities to rules, keep track of the current
highest-priority matching rule and its result (always/never).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't bother with allocations
* don't do double copy_from_user()
* don't duplicate parts of check for audit_dummy_context()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* logging the original value of *msg_prio in mq_timedreceive(2)
is insane - the argument is write-only (i.e. syscall always
ignores the original value and only overwrites it).
* merge __audit_mq_timed{send,receive}
* don't do copy_from_user() twice
* don't mess with allocations in auditsc part
* ... and don't bother checking !audit_enabled and !context in there -
we'd already checked for audit_dummy_context().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't copy_from_user() twice
* don't bother with allocations
* don't duplicate parts of audit_dummy_context()
* make it return void
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
No need to do that more than once per process lifetime; allocating/freeing
on each sendto/accept/etc. is bloody pointless.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Delete excess kernel-doc notation in kernel/auditsc.c:
Warning(linux-2.6.27-git10//kernel/auditsc.c:1481): Excess function parameter or struct member 'tsk' description in 'audit_syscall_entry'
Warning(linux-2.6.27-git10//kernel/auditsc.c:1564): Excess function parameter or struct member 'tsk' description in 'audit_syscall_exit'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Inaugurate copy-on-write credentials management. This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.
A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().
With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:
struct cred *new = prepare_creds();
int ret = blah(new);
if (ret < 0) {
abort_creds(new);
return ret;
}
return commit_creds(new);
There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.
To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const. The purpose of this is compile-time
discouragement of altering credentials through those pointers. Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:
(1) Its reference count may incremented and decremented.
(2) The keyrings to which it points may be modified, but not replaced.
The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).
This patch and the preceding patches have been tested with the LTP SELinux
testsuite.
This patch makes several logical sets of alteration:
(1) execve().
This now prepares and commits credentials in various places in the
security code rather than altering the current creds directly.
(2) Temporary credential overrides.
do_coredump() and sys_faccessat() now prepare their own credentials and
temporarily override the ones currently on the acting thread, whilst
preventing interference from other threads by holding cred_replace_mutex
on the thread being dumped.
This will be replaced in a future patch by something that hands down the
credentials directly to the functions being called, rather than altering
the task's objective credentials.
(3) LSM interface.
A number of functions have been changed, added or removed:
(*) security_capset_check(), ->capset_check()
(*) security_capset_set(), ->capset_set()
Removed in favour of security_capset().
(*) security_capset(), ->capset()
New. This is passed a pointer to the new creds, a pointer to the old
creds and the proposed capability sets. It should fill in the new
creds or return an error. All pointers, barring the pointer to the
new creds, are now const.
(*) security_bprm_apply_creds(), ->bprm_apply_creds()
Changed; now returns a value, which will cause the process to be
killed if it's an error.
(*) security_task_alloc(), ->task_alloc_security()
Removed in favour of security_prepare_creds().
(*) security_cred_free(), ->cred_free()
New. Free security data attached to cred->security.
(*) security_prepare_creds(), ->cred_prepare()
New. Duplicate any security data attached to cred->security.
(*) security_commit_creds(), ->cred_commit()
New. Apply any security effects for the upcoming installation of new
security by commit_creds().
(*) security_task_post_setuid(), ->task_post_setuid()
Removed in favour of security_task_fix_setuid().
(*) security_task_fix_setuid(), ->task_fix_setuid()
Fix up the proposed new credentials for setuid(). This is used by
cap_set_fix_setuid() to implicitly adjust capabilities in line with
setuid() changes. Changes are made to the new credentials, rather
than the task itself as in security_task_post_setuid().
(*) security_task_reparent_to_init(), ->task_reparent_to_init()
Removed. Instead the task being reparented to init is referred
directly to init's credentials.
NOTE! This results in the loss of some state: SELinux's osid no
longer records the sid of the thread that forked it.
(*) security_key_alloc(), ->key_alloc()
(*) security_key_permission(), ->key_permission()
Changed. These now take cred pointers rather than task pointers to
refer to the security context.
(4) sys_capset().
This has been simplified and uses less locking. The LSM functions it
calls have been merged.
(5) reparent_to_kthreadd().
This gives the current thread the same credentials as init by simply using
commit_thread() to point that way.
(6) __sigqueue_alloc() and switch_uid()
__sigqueue_alloc() can't stop the target task from changing its creds
beneath it, so this function gets a reference to the currently applicable
user_struct which it then passes into the sigqueue struct it returns if
successful.
switch_uid() is now called from commit_creds(), and possibly should be
folded into that. commit_creds() should take care of protecting
__sigqueue_alloc().
(7) [sg]et[ug]id() and co and [sg]et_current_groups.
The set functions now all use prepare_creds(), commit_creds() and
abort_creds() to build and check a new set of credentials before applying
it.
security_task_set[ug]id() is called inside the prepared section. This
guarantees that nothing else will affect the creds until we've finished.
The calling of set_dumpable() has been moved into commit_creds().
Much of the functionality of set_user() has been moved into
commit_creds().
The get functions all simply access the data directly.
(8) security_task_prctl() and cap_task_prctl().
security_task_prctl() has been modified to return -ENOSYS if it doesn't
want to handle a function, or otherwise return the return value directly
rather than through an argument.
Additionally, cap_task_prctl() now prepares a new set of credentials, even
if it doesn't end up using it.
(9) Keyrings.
A number of changes have been made to the keyrings code:
(a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
all been dropped and built in to the credentials functions directly.
They may want separating out again later.
(b) key_alloc() and search_process_keyrings() now take a cred pointer
rather than a task pointer to specify the security context.
(c) copy_creds() gives a new thread within the same thread group a new
thread keyring if its parent had one, otherwise it discards the thread
keyring.
(d) The authorisation key now points directly to the credentials to extend
the search into rather pointing to the task that carries them.
(e) Installing thread, process or session keyrings causes a new set of
credentials to be created, even though it's not strictly necessary for
process or session keyrings (they're shared).
(10) Usermode helper.
The usermode helper code now carries a cred struct pointer in its
subprocess_info struct instead of a new session keyring pointer. This set
of credentials is derived from init_cred and installed on the new process
after it has been cloned.
call_usermodehelper_setup() allocates the new credentials and
call_usermodehelper_freeinfo() discards them if they haven't been used. A
special cred function (prepare_usermodeinfo_creds()) is provided
specifically for call_usermodehelper_setup() to call.
call_usermodehelper_setkeys() adjusts the credentials to sport the
supplied keyring as the new session keyring.
(11) SELinux.
SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:
(a) selinux_setprocattr() no longer does its check for whether the
current ptracer can access processes with the new SID inside the lock
that covers getting the ptracer's SID. Whilst this lock ensures that
the check is done with the ptracer pinned, the result is only valid
until the lock is released, so there's no point doing it inside the
lock.
(12) is_single_threaded().
This function has been extracted from selinux_setprocattr() and put into
a file of its own in the lib/ directory as join_session_keyring() now
wants to use it too.
The code in SELinux just checked to see whether a task shared mm_structs
with other tasks (CLONE_VM), but that isn't good enough. We really want
to know if they're part of the same thread group (CLONE_THREAD).
(13) nfsd.
The NFS server daemon now has to use the COW credentials to set the
credentials it is going to use. It really needs to pass the credentials
down to the functions it calls, but it can't do that until other patches
in this series have been applied.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.
Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.
With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-audit@redhat.com
Cc: containers@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Signed-off-by: James Morris <jmorris@namei.org>
actual capbilities being added/removed. This patch adds a new record type
which emits the target pid and the eff, inh, and perm cap sets.
example output if you audit capset syscalls would be:
type=SYSCALL msg=audit(1225743140.465:76): arch=c000003e syscall=126 success=yes exit=0 a0=17f2014 a1=17f201c a2=80000000 a3=7fff2ab7f060 items=0 ppid=2160 pid=2223 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="setcap" exe="/usr/sbin/setcap" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1322] msg=audit(1225743140.465:76): pid=0 cap_pi=ffffffffffffffff cap_pp=ffffffffffffffff cap_pe=ffffffffffffffff
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
non-zero pE we will crate a new audit record which contains the entire set
of known information about the executable in question, fP, fI, fE, fversion
and includes the process's pE, pI, pP. Before and after the bprm capability
are applied. This record type will only be emitted from execve syscalls.
an example of making ping use fcaps instead of setuid:
setcap "cat_net_raw+pe" /bin/ping
type=SYSCALL msg=audit(1225742021.015:236): arch=c000003e syscall=59 success=yes exit=0 a0=1457f30 a1=14606b0 a2=1463940 a3=321b770a70 items=2 ppid=2929 pid=2963 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1321] msg=audit(1225742021.015:236): fver=2 fp=0000000000002000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 new_pp=0000000000002000 new_pi=0000000000000000 new_pe=0000000000002000
type=EXECVE msg=audit(1225742021.015:236): argc=2 a0="ping" a1="127.0.0.1"
type=CWD msg=audit(1225742021.015:236): cwd="/home/test"
type=PATH msg=audit(1225742021.015:236): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fe=1 cap_fver=2
type=PATH msg=audit(1225742021.015:236): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
records of any file that has file capabilities set. Files which do not
have fcaps set will not have different PATH records.
An example audit record if you run:
setcap "cap_net_admin+pie" /bin/bash
/bin/bash
type=SYSCALL msg=audit(1225741937.363:230): arch=c000003e syscall=59 success=yes exit=0 a0=2119230 a1=210da30 a2=20ee290 a3=8 items=2 ppid=2149 pid=2923 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=EXECVE msg=audit(1225741937.363:230): argc=2 a0="ping" a1="www.google.com"
type=CWD msg=audit(1225741937.363:230): cwd="/root"
type=PATH msg=audit(1225741937.363:230): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0104755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fi=0000000000002000 cap_fe=1 cap_fver=2
type=PATH msg=audit(1225741937.363:230): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Various people outside the tty layer still stick their noses in behind the
scenes. We need to make sure they also obey the locking and referencing rules.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
got rid of compilation warning:
ISO C90 forbids mixed declarations and code
Signed-off-by: Cordelia Sam <cordesam@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sorry, I miss a blank between if and "(".
And I add "unlikely" to check "ctx" in audit_match_perm() and audit_match_filetype().
This is a new patch for it.
Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When calling audit_filter_task(), it calls audit_filter_rules() with audit_context is NULL.
If the key field is set, the result in audit_filter_rules() will be set to 1 and
ctx->filterkey will be set to key.
But the ctx is NULL in this condition, so kernel will panic.
Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Makes the kernel audit subsystem collect information about the sending
process when that process sends SIGUSR2 to the userspace audit daemon.
SIGUSR2 is a new interesting signal to auditd telling auditd that it
should try to start logging to disk again and the error condition which
caused it to stop logging to disk (usually out of space) has been
rectified.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This adds a fast path for 64-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
This path does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Signed-off-by: Roland McGrath <roland@redhat.com>
Argument is S_IF... | <index>, where index is normally 0 or 1.
Triggers if chosen element of ctx->names[] is present and the
mode of object in question matches the upper bits of argument.
I.e. for things like "is the argument of that chmod a directory",
etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use msglen as the identifier.
kernel/audit.c:724:10: warning: symbol 'len' shadows an earlier one
kernel/audit.c:575:8: originally declared here
Don't use ino_f to check the inode field at the end of the functions.
kernel/auditfilter.c:429:22: warning: symbol 'f' shadows an earlier one
kernel/auditfilter.c:420:21: originally declared here
kernel/auditfilter.c:542:22: warning: symbol 'f' shadows an earlier one
kernel/auditfilter.c:529:21: originally declared here
i always used as a counter for a for loop and initialized to zero before
use. Eliminate the inner i variables.
kernel/auditsc.c:1295:8: warning: symbol 'i' shadows an earlier one
kernel/auditsc.c:1152:6: originally declared here
kernel/auditsc.c:1320:7: warning: symbol 'i' shadows an earlier one
kernel/auditsc.c:1152:6: originally declared here
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Leave audit_sig_{uid|pid|sid} protected by #ifdef CONFIG_AUDITSYSCALL.
Noticed by sparse:
kernel/audit.c:73:6: warning: symbol 'audit_ever_enabled' was not declared. Should it be static?
kernel/audit.c💯8: warning: symbol 'audit_sig_uid' was not declared. Should it be static?
kernel/audit.c:101:8: warning: symbol 'audit_sig_pid' was not declared. Should it be static?
kernel/audit.c:102:6: warning: symbol 'audit_sig_sid' was not declared. Should it be static?
kernel/audit.c:117:23: warning: symbol 'audit_ih' was not declared. Should it be static?
kernel/auditfilter.c:78:18: warning: symbol 'audit_filter_list' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch standardized the string auditing interfaces. No userspace
changes will be visible and this is all just cleanup and consistancy
work. We have the following string audit interfaces to use:
void audit_log_n_hex(struct audit_buffer *ab, const unsigned char *buf, size_t len);
void audit_log_n_string(struct audit_buffer *ab, const char *buf, size_t n);
void audit_log_string(struct audit_buffer *ab, const char *buf);
void audit_log_n_untrustedstring(struct audit_buffer *ab, const char *string, size_t n);
void audit_log_untrustedstring(struct audit_buffer *ab, const char *string);
This may be the first step to possibly fixing some of the issues that
people have with the string output from the kernel audit system. But we
still don't have an agreed upon solution to that problem.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A couple of audit printk statements did not have a newline.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rename the se_str and se_rule audit fields elements to
lsm_str and lsm_rule to avoid confusion.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Convert Audit to use the new LSM Audit hooks instead of
the exported SELinux interface.
Basically, use:
security_audit_rule_init
secuirty_audit_rule_free
security_audit_rule_known
security_audit_rule_match
instad of (respectively) :
selinux_audit_rule_init
selinux_audit_rule_free
audit_rule_has_selinux
selinux_audit_rule_match
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Stop using the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
kfree(ctx)
and use following generic LSM equivalents respectively:
security_inode_getsecid(inode, secid)
security_ipc_getsecid*(ipcp, secid)
security_task_getsecid(tsk, secid)
security_sid_to_secctx(sid, ctx, len)
security_release_secctx(ctx, len)
Call security_release_secctx only if security_secid_to_secctx
succeeded.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Fix the following compiler warning by using "%zu" as defined in C99.
CC kernel/auditsc.o
kernel/auditsc.c: In function 'audit_log_single_execve_arg':
kernel/auditsc.c:1074: warning: format '%ld' expects type 'long int', but
argument 4 has type 'size_t'
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Clearly this was supposed to be an == not an = in the if statement.
This patch also causes us to stop processing execve args once we have
failed rather than continuing to loop on failure over and over and over.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
audit_log_d_path() is a d_path() wrapper that is used by the audit code. To
use a struct path in audit_log_d_path() I need to embed it into struct
avc_audit_data.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use struct path in fs_struct.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Disabling audit at runtime by auditctl doesn't mean that we can
stop allocating contexts for new processes; we don't want to miss them
when that sucker is reenabled.
(based on work from Al Viro in the RHEL kernel series)
Signed-off-by: Eric Paris <eparis@redhat.com>
execve arguments can be quite large. There is no limit on the number of
arguments and a 4G limit on the size of an argument.
this patch prints those aruguments in bite sized pieces. a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k
single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=
Signed-off-by: Eric Paris <eparis@redhat.com>
If we fail to get an ab in audit_log_pid_context this may be due to an exclude
rule rather than a memory allocation failure. If it was due to a memory
allocation failue we would have already paniced and no need to do it again.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch adds an end of event record type. It will be sent by the kernel as
the last record when a multi-record event is triggered. This will aid realtime
analysis programs since they will now reliably know they have the last record
to complete an event. The audit daemon filters this and will not write it to
disk.
Signed-off-by: Steve Grubb <sgrubb redhat com>
Signed-off-by: Eric Paris <eparis@redhat.com>
In order to correlate audit records to an individual login add a session
id. This is incremented every time a user logs in and is included in
almost all messages which currently output the auid. The field is
labeled ses= or oses=
Signed-off-by: Eric Paris <eparis@redhat.com>
Add uid, loginuid, and comm collection to OBJ_PID records. This just
gives users a little more information about the task that received a
signal. pid is rather meaningless after the fact, and even though comm
isn't great we can't collect exe reasonably on this code path for
performance reasons.
Signed-off-by: Eric Paris <eparis@redhat.com>
The syscall exit code will change ERESTART* kernel internal return codes
to EINTR if it does not restart the syscall. Since we collect the audit
info before that point we should fix those in the audit log as well.
Signed-off-by: Eric Paris <eparis@redhat.com>
Keeping loginuid in audit_context is racy and results in messier
code. Taken to task_struct, out of the way of ->audit_context
changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix kernel-doc for auditsc parameter changes.
Warning(linux-2.6.23-git17//kernel/auditsc.c:1623): No description found for parameter 'dentry'
Warning(linux-2.6.23-git17//kernel/auditsc.c:1666): No description found for parameter 'dentry'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New kind of audit rule predicates: "object is visible in given subtree".
The part that can be sanely implemented, that is. Limitations:
* if you have hardlink from outside of tree, you'd better watch
it too (or just watch the object itself, obviously)
* if you mount something under a watched tree, tell audit
that new chunk should be added to watched subtrees
* if you umount something in a watched tree and it's still mounted
elsewhere, you will get matches on events happening there. New command
tells audit to recalculate the trees, trimming such sources of false
positives.
Note that it's _not_ about path - if something mounted in several places
(multiple mount, bindings, different namespaces, etc.), the match does
_not_ depend on which one we are using for access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>