Commit Graph

3415 Commits

Author SHA1 Message Date
Linus Torvalds 11fbf53d66 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted bits and pieces from various people. No common topic in this
  pile, sorry"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/affs: add rename exchange
  fs/affs: add rename2 to prepare multiple methods
  Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
  fs: don't set *REFERENCED on single use objects
  fs: compat: Remove warning from COMPATIBLE_IOCTL
  remove pointless extern of atime_need_update_rcu()
  fs: completely ignore unknown open flags
  fs: add a VALID_OPEN_FLAGS
  fs: remove _submit_bh()
  fs: constify tree_descr arrays passed to simple_fill_super()
  fs: drop duplicate header percpu-rwsem.h
  fs/affs: bugfix: Write files greater than page size on OFS
  fs/affs: bugfix: enable writes on OFS disks
  fs/affs: remove node generation check
  fs/affs: import amigaffs.h
  fs/affs: bugfix: make symbolic links work again
2017-05-09 09:12:53 -07:00
Deepa Dinamani 24d0d03c2e apparmorfs: replace CURRENT_TIME with current_time()
CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time().

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Michal Hocko 752ade68cb treewide: use kv[mz]alloc* rather than opencoded variants
There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Michal Hocko a7c3e901a4 mm: introduce kv[mz]alloc helpers
Patch series "kvmalloc", v5.

There are many open coded kmalloc with vmalloc fallback instances in the
tree.  Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

This patch (of 9):

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code.  Yet we do not have any common helper
for that and so users have invented their own helpers.  Some of them are
really creative when doing so.  Let's just add kv[mz]alloc and make sure
it is implemented properly.  This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures.  This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them.  In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL).  Those need to be
fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers.  Existing ones would have to die
slowly.

[sfr@canb.auug.org.au: f2fs fixup]
  Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:12 -07:00
Linus Torvalds 0302e28dee Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

  IMA:
   - provide ">" and "<" operators for fowner/uid/euid rules

  KEYS:
   - add a system blacklist keyring

   - add KEYCTL_RESTRICT_KEYRING, exposes keyring link restriction
     functionality to userland via keyctl()

  LSM:
   - harden LSM API with __ro_after_init

   - add prlmit security hook, implement for SELinux

   - revive security_task_alloc hook

  TPM:
   - implement contextual TPM command 'spaces'"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits)
  tpm: Fix reference count to main device
  tpm_tis: convert to using locality callbacks
  tpm: fix handling of the TPM 2.0 event logs
  tpm_crb: remove a cruft constant
  keys: select CONFIG_CRYPTO when selecting DH / KDF
  apparmor: Make path_max parameter readonly
  apparmor: fix parameters so that the permission test is bypassed at boot
  apparmor: fix invalid reference to index variable of iterator line 836
  apparmor: use SHASH_DESC_ON_STACK
  security/apparmor/lsm.c: set debug messages
  apparmor: fix boolreturn.cocci warnings
  Smack: Use GFP_KERNEL for smk_netlbl_mls().
  smack: fix double free in smack_parse_opts_str()
  KEYS: add SP800-56A KDF support for DH
  KEYS: Keyring asymmetric key restrict method with chaining
  KEYS: Restrict asymmetric key linkage using a specific keychain
  KEYS: Add a lookup_restriction function for the asymmetric key type
  KEYS: Add KEYCTL_RESTRICT_KEYRING
  KEYS: Consistent ordering for __key_link_begin and restrict check
  KEYS: Add an optional lookup_restriction hook to key_type
  ...
2017-05-03 08:50:52 -07:00
Linus Torvalds 8d65b08deb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
 "Here are some highlights from the 2065 networking commits that
  happened this development cycle:

   1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

   2) Add a generic XDP driver, so that anyone can test XDP even if they
      lack a networking device whose driver has explicit XDP support
      (me).

   3) Sparc64 now has an eBPF JIT too (me)

   4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
      Starovoitov)

   5) Make netfitler network namespace teardown less expensive (Florian
      Westphal)

   6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

   7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

   8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

   9) Multiqueue support in stmmac driver (Joao Pinto)

  10) Remove TCP timewait recycling, it never really could possibly work
      well in the real world and timestamp randomization really zaps any
      hint of usability this feature had (Soheil Hassas Yeganeh)

  11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
      Aleksandrov)

  12) Add socket busy poll support to epoll (Sridhar Samudrala)

  13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
      and several others)

  14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
  tipc: refactor function tipc_sk_recv_stream()
  tipc: refactor function tipc_sk_recvmsg()
  net: thunderx: Optimize page recycling for XDP
  net: thunderx: Support for XDP header adjustment
  net: thunderx: Add support for XDP_TX
  net: thunderx: Add support for XDP_DROP
  net: thunderx: Add basic XDP support
  net: thunderx: Cleanup receive buffer allocation
  net: thunderx: Optimize CQE_TX handling
  net: thunderx: Optimize RBDR descriptor handling
  net: thunderx: Support for page recycling
  ipx: call ipxitf_put() in ioctl error path
  net: sched: add helpers to handle extended actions
  qed*: Fix issues in the ptp filter config implementation.
  qede: Fix concurrency issue in PTP Tx path processing.
  stmmac: Add support for SIMATIC IOT2000 platform
  net: hns: fix ethtool_get_strings overflow in hns driver
  tcp: fix wraparound issue in tcp_lp
  bpf, arm64: fix jit branch offset related to ldimm64
  bpf, arm64: implement jiting of BPF_XADD
  ...
2017-05-02 16:40:27 -07:00
Linus Torvalds c58d4055c0 A reasonably busy cycle for documentation this time around. There is a new
guide for user-space API documents, rather sparsely populated at the
 moment, but it's a start.  Markus improved the infrastructure for
 converting diagrams.  Mauro has converted much of the USB documentation
 over to RST.  Plus the usual set of fixes, improvements, and tweaks.
 
 There's a bit more than the usual amount of reaching out of Documentation/
 to fix comments elsewhere in the tree; I have acks for those where I could
 get them.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZB1elAAoJEI3ONVYwIuV6wUIQAJSM/4rNdj6z+GXeWhRfbsOo
 vqqVYluvXQIJaaqdsy9dgcfThhOXWYsPyVF6Xd+bDJpwF3BMZYbX1CI1Mo3kRD+7
 9+Pf68cYSHRoU3l/sFI8q0zfKbHtmFteIvnRQoFtRaExqgTR8glUfxNDyN9XuNAZ
 3naS4qMZivM4gjMcSpIB/wFOQpV+6qVIs6VTFLdCC8wodT3W/Wmb+bqrCVJ0twbB
 t8jJeYHt2wsiTdqrKU+VilAUAZ1Lby+DNfeWrO18rC1ohktPyUzOGg8JmTKUBpVO
 qj1OJwD6abuaNh/J9bXsh8u0OrVrBKWjVrhq9IFYDlm92fu3Bgr6YeoaVPEpcklt
 jdlgZnWs9/oXa6d32aMc9F7mP9a0Q1qikFTYINhaHQZCb4VDRuQ9hCSuqWm5jlVy
 lmVAoxLa0zSdOoXaYuO3HC99ku1cIn814CXMDz/IwKXkqUCV+zl+H3AGkvxGyQ5M
 eblw2TnQnc6e1LRcxt5bgpFR1JYMbCJhu0U5XrNFueQV8ReB15dvL7h4y21dWJKF
 2Sr83rwfG1rpZQiSqCjOXxIzuXbEGH3+a+zCDV5IHhQRt/VNDOt2hgmcyucSSJ5h
 5GRFYgTlGvoT/6LdIT39QooHB+4tSDRtEQ6lh0q2ZtVV2rfG/I6/PR5sUbWM65SN
 vAfctRm2afHLhdonSX5O
 =41m+
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.12' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "A reasonably busy cycle for documentation this time around. There is a
  new guide for user-space API documents, rather sparsely populated at
  the moment, but it's a start. Markus improved the infrastructure for
  converting diagrams. Mauro has converted much of the USB documentation
  over to RST. Plus the usual set of fixes, improvements, and tweaks.

  There's a bit more than the usual amount of reaching out of
  Documentation/ to fix comments elsewhere in the tree; I have acks for
  those where I could get them"

* tag 'docs-4.12' of git://git.lwn.net/linux: (74 commits)
  docs: Fix a couple typos
  docs: Fix a spelling error in vfio-mediated-device.txt
  docs: Fix a spelling error in ioctl-number.txt
  MAINTAINERS: update file entry for HSI subsystem
  Documentation: allow installing man pages to a user defined directory
  Doc/PM: Sync with intel_powerclamp code behavior
  zr364xx.rst: usb/devices is now at /sys/kernel/debug/
  usb.rst: move documentation from proc_usb_info.txt to USB ReST book
  convert philips.txt to ReST and add to media docs
  docs-rst: usb: update old usbfs-related documentation
  arm: Documentation: update a path name
  docs: process/4.Coding.rst: Fix a couple of document refs
  docs-rst: fix usb cross-references
  usb: gadget.h: be consistent at kernel doc macros
  usb: composite.h: fix two warnings when building docs
  usb: get rid of some ReST doc build errors
  usb.rst: get rid of some Sphinx errors
  usb/URB.txt: convert to ReST and update it
  usb/persist.txt: convert to ReST and add to driver-api book
  usb/hotplug.txt: convert to ReST and add to driver-api book
  ...
2017-05-02 10:21:17 -07:00
Linus Torvalds 5db6db0d40 Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess unification updates from Al Viro:
 "This is the uaccess unification pile. It's _not_ the end of uaccess
  work, but the next batch of that will go into the next cycle. This one
  mostly takes copy_from_user() and friends out of arch/* and gets the
  zero-padding behaviour in sync for all architectures.

  Dealing with the nocache/writethrough mess is for the next cycle;
  fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
  sold on access_ok() in there, BTW; just not in this pile), same for
  reducing __copy_... callsites, strn*... stuff, etc. - there will be a
  pile about as large as this one in the next merge window.

  This one sat in -next for weeks. -3KLoC"

* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
  HAVE_ARCH_HARDENED_USERCOPY is unconditional now
  CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
  m32r: switch to RAW_COPY_USER
  hexagon: switch to RAW_COPY_USER
  microblaze: switch to RAW_COPY_USER
  get rid of padding, switch to RAW_COPY_USER
  ia64: get rid of copy_in_user()
  ia64: sanitize __access_ok()
  ia64: get rid of 'segment' argument of __do_{get,put}_user()
  ia64: get rid of 'segment' argument of __{get,put}_user_check()
  ia64: add extable.h
  powerpc: get rid of zeroing, switch to RAW_COPY_USER
  esas2r: don't open-code memdup_user()
  alpha: fix stack smashing in old_adjtimex(2)
  don't open-code kernel_setsockopt()
  mips: switch to RAW_COPY_USER
  mips: get rid of tail-zeroing in primitives
  mips: make copy_from_user() zero tail explicitly
  mips: clean and reorder the forest of macros...
  mips: consolidate __invoke_... wrappers
  ...
2017-05-01 14:41:04 -07:00
Eric Biggers cda37124f4 fs: constify tree_descr arrays passed to simple_fill_super()
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory.  Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 23:54:06 -04:00
Al Viro 2fefc97b21 HAVE_ARCH_HARDENED_USERCOPY is unconditional now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 12:11:06 -04:00
David S. Miller fb796707d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 20:23:53 -07:00
James Morris f65cc104c4 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next 2017-04-19 22:00:15 +10:00
James Morris 6859e21e81 Merge branch 'smack-for-4.12' of git://github.com/cschaufler/smack-next into next 2017-04-19 08:35:01 +10:00
James Morris fa5b5b26e2 Merge branch 'stable-4.12' of git://git.infradead.org/users/pcmoore/selinux into next 2017-04-19 08:30:08 +10:00
Eric Biggers c9f838d104 KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
This fixes CVE-2017-7472.

Running the following program as an unprivileged user exhausts kernel
memory by leaking thread keyrings:

	#include <keyutils.h>

	int main()
	{
		for (;;)
			keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
	}

Fix it by only creating a new thread keyring if there wasn't one before.
To make things more consistent, make install_thread_keyring_to_cred()
and install_process_keyring_to_cred() both return 0 if the corresponding
keyring is already present.

Fixes: d84f4f992c ("CRED: Inaugurate COW credentials")
Cc: stable@vger.kernel.org # 2.6.29+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-04-18 15:31:49 +01:00
David Howells c1644fe041 KEYS: Change the name of the dead type to ".dead" to prevent user access
This fixes CVE-2017-6951.

Userspace should not be able to do things with the "dead" key type as it
doesn't have some of the helper functions set upon it that the kernel
needs.  Attempting to use it may cause the kernel to crash.

Fix this by changing the name of the type to ".dead" so that it's rejected
up front on userspace syscalls by key_get_type_from_user().

Though this doesn't seem to affect recent kernels, it does affect older
ones, certainly those prior to:

	commit c06cfb08b8
	Author: David Howells <dhowells@redhat.com>
	Date:   Tue Sep 16 17:36:06 2014 +0100
	KEYS: Remove key_type::match in favour of overriding default by match_preparse

which went in before 3.18-rc1.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
2017-04-18 15:31:39 +01:00
David Howells ee8f844e3c KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
This fixes CVE-2016-9604.

Keyrings whose name begin with a '.' are special internal keyrings and so
userspace isn't allowed to create keyrings by this name to prevent
shadowing.  However, the patch that added the guard didn't fix
KEYCTL_JOIN_SESSION_KEYRING.  Not only can that create dot-named keyrings,
it can also subscribe to them as a session keyring if they grant SEARCH
permission to the user.

This, for example, allows a root process to set .builtin_trusted_keys as
its session keyring, at which point it has full access because now the
possessor permissions are added.  This permits root to add extra public
keys, thereby bypassing module verification.

This also affects kexec and IMA.

This can be tested by (as root):

	keyctl session .builtin_trusted_keys
	keyctl add user a a @s
	keyctl list @s

which on my test box gives me:

	2 keys in keyring:
	180010936: ---lswrv     0     0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05
	801382539: --alswrv     0     0 user: a


Fix this by rejecting names beginning with a '.' in the keyctl.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
cc: linux-ima-devel@lists.sourceforge.net
cc: stable@vger.kernel.org
2017-04-18 15:31:35 +01:00
James Morris 30a83251dd Merge tag 'keys-next-20170412' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next 2017-04-18 07:37:51 +10:00
Stephan Müller 4cd4ca7cc8 keys: select CONFIG_CRYPTO when selecting DH / KDF
Select CONFIG_CRYPTO in addition to CONFIG_HASH to ensure that
also CONFIG_HASH2 is selected. Both are needed for the shash
cipher support required for the KDF operation.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-04-11 23:18:09 +01:00
John Johansen 622f6e3265 apparmor: Make path_max parameter readonly
The path_max parameter determines the max size of buffers allocated
but it should  not be setable at run time. If can be used to cause an
oops

root@ubuntu:~# echo 16777216 > /sys/module/apparmor/parameters/path_max
root@ubuntu:~# cat /sys/module/apparmor/parameters/path_max
Killed

[  122.141911] BUG: unable to handle kernel paging request at ffff880080945fff
[  122.143497] IP: [<ffffffff81228844>] d_absolute_path+0x44/0xa0
[  122.144742] PGD 220c067 PUD 0
[  122.145453] Oops: 0002 [#1] SMP
[  122.146204] Modules linked in: vmw_vsock_vmci_transport vsock ppdev vmw_balloon snd_ens1371 btusb snd_ac97_codec gameport snd_rawmidi btrtl snd_seq_device ac97_bus btbcm btintel snd_pcm input_leds bluetooth snd_timer snd joydev soundcore serio_raw coretemp shpchp nfit parport_pc i2c_piix4 8250_fintek vmw_vmci parport mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd vmwgfx psmouse mptspi ttm mptscsih drm_kms_helper mptbase syscopyarea scsi_transport_spi sysfillrect
[  122.163365]  ahci sysimgblt e1000 fb_sys_fops libahci drm pata_acpi fjes
[  122.164747] CPU: 3 PID: 1501 Comm: bash Not tainted 4.4.0-59-generic #80-Ubuntu
[  122.166250] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  122.168611] task: ffff88003496aa00 ti: ffff880076474000 task.ti: ffff880076474000
[  122.170018] RIP: 0010:[<ffffffff81228844>]  [<ffffffff81228844>] d_absolute_path+0x44/0xa0
[  122.171525] RSP: 0018:ffff880076477b90  EFLAGS: 00010206
[  122.172462] RAX: ffff880080945fff RBX: 0000000000000000 RCX: 0000000001000000
[  122.173709] RDX: 0000000000ffffff RSI: ffff880080946000 RDI: ffff8800348a1010
[  122.174978] RBP: ffff880076477bb8 R08: ffff880076477c80 R09: 0000000000000000
[  122.176227] R10: 00007ffffffff000 R11: ffff88007f946000 R12: ffff88007f946000
[  122.177496] R13: ffff880076477c80 R14: ffff8800348a1010 R15: ffff8800348a2400
[  122.178745] FS:  00007fd459eb4700(0000) GS:ffff88007b6c0000(0000) knlGS:0000000000000000
[  122.180176] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  122.181186] CR2: ffff880080945fff CR3: 0000000073422000 CR4: 00000000001406e0
[  122.182469] Stack:
[  122.182843]  00ffffff00000001 ffff880080946000 0000000000000000 0000000000000000
[  122.184409]  00000000570f789c ffff880076477c30 ffffffff81385671 ffff88007a2e7a58
[  122.185810]  0000000000000000 ffff880076477c88 01000000008a1000 0000000000000000
[  122.187231] Call Trace:
[  122.187680]  [<ffffffff81385671>] aa_path_name+0x81/0x370
[  122.188637]  [<ffffffff813875dd>] profile_transition+0xbd/0xb80
[  122.190181]  [<ffffffff811af9bc>] ? zone_statistics+0x7c/0xa0
[  122.191674]  [<ffffffff81389b20>] apparmor_bprm_set_creds+0x9b0/0xac0
[  122.193288]  [<ffffffff812e1971>] ? ext4_xattr_get+0x81/0x220
[  122.194793]  [<ffffffff812e800c>] ? ext4_xattr_security_get+0x1c/0x30
[  122.196392]  [<ffffffff813449b9>] ? get_vfs_caps_from_disk+0x69/0x110
[  122.198004]  [<ffffffff81232d4f>] ? mnt_may_suid+0x3f/0x50
[  122.199737]  [<ffffffff81344b03>] ? cap_bprm_set_creds+0xa3/0x600
[  122.201377]  [<ffffffff81346e53>] security_bprm_set_creds+0x33/0x50
[  122.203024]  [<ffffffff81214ce5>] prepare_binprm+0x85/0x190
[  122.204515]  [<ffffffff81216545>] do_execveat_common.isra.33+0x485/0x710
[  122.206200]  [<ffffffff81216a6a>] SyS_execve+0x3a/0x50
[  122.207615]  [<ffffffff81838795>] stub_execve+0x5/0x5
[  122.208978]  [<ffffffff818384f2>] ? entry_SYSCALL_64_fastpath+0x16/0x71
[  122.210615] Code: f8 31 c0 48 63 c2 83 ea 01 48 c7 45 e8 00 00 00 00 48 01 c6 85 d2 48 c7 45 f0 00 00 00 00 48 89 75 e0 89 55 dc 78 0c 48 8d 46 ff <c6> 46 ff 00 48 89 45 e0 48 8d 55 e0 48 8d 4d dc 48 8d 75 e8 e8
[  122.217320] RIP  [<ffffffff81228844>] d_absolute_path+0x44/0xa0
[  122.218860]  RSP <ffff880076477b90>
[  122.219919] CR2: ffff880080945fff
[  122.220936] ---[ end trace 506cdbd85eb6c55e ]---

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-07 08:58:36 +10:00
John Johansen 545de8fe0f apparmor: fix parameters so that the permission test is bypassed at boot
Boot parameters are written before apparmor is ready to answer whether
the user is policy_view_capable(). Setting the parameters at boot results
in an oops and failure to boot. Setting the parameters at boot is
obviously allowed so skip the permission check when apparmor is not
initialized.

While we are at it move the more complicated check to last.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-07 08:58:36 +10:00
John Johansen b9b144bcaf apparmor: fix invalid reference to index variable of iterator line 836
Once the loop on lines 836-853 is complete and exits normally, ent is a
pointer to the dummy list head value.  The derefernces accessible from eg
the goto fail on line 860 or the various goto fail_lock's afterwards thus
seem incorrect.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-07 08:58:36 +10:00
Nicolas Iooss 9814448da7 apparmor: use SHASH_DESC_ON_STACK
When building the kernel with clang, the compiler fails to build
security/apparmor/crypto.c with the following error:

    security/apparmor/crypto.c:36:8: error: fields must have a constant
    size: 'variable length array in structure' extension will never be
    supported
                    char ctx[crypto_shash_descsize(apparmor_tfm)];
                         ^

Since commit a0a77af141 ("crypto: LLVMLinux: Add macro to remove use
of VLAIS in crypto code"), include/crypto/hash.h defines
SHASH_DESC_ON_STACK to work around this issue. Use it in aa_calc_hash()
and aa_calc_profile_hash().

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-07 08:58:35 +10:00
Valentin Rothberg eea7a05f19 security/apparmor/lsm.c: set debug messages
Add the _APPARMOR substring to reference the intended Kconfig option.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-07 08:58:35 +10:00
kbuild test robot b9c42ac76e apparmor: fix boolreturn.cocci warnings
security/apparmor/lib.c:132:9-10: WARNING: return of 0/1 in function 'aa_policy_init' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-07 08:58:35 +10:00
Tetsuo Handa af96f0d639 Smack: Use GFP_KERNEL for smk_netlbl_mls().
Since all callers of smk_netlbl_mls() are GFP_KERNEL context
(smk_set_cipso() calls memdup_user_nul(), init_smk_fs() calls
__kernfs_new_node(), smk_import_entry() calls kzalloc(GFP_KERNEL)),
it is safe to use GFP_KERNEL from netlbl_catmap_setbit().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-04-04 15:41:15 -07:00
Tetsuo Handa c3c8dc9f13 smack: fix double free in smack_parse_opts_str()
smack_parse_opts_str() calls kfree(opts->mnt_opts) when kcalloc() for
opts->mnt_opts_flags failed. But it should not have called it because
security_free_mnt_opts() will call kfree(opts->mnt_opts).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
fixes: 3bf2789cad ("smack: allow mount opts setting over filesystems with binary mount data")
Cc: Vivek Trivedi <t.vivek@samsung.com>
Cc: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
2017-04-04 15:41:15 -07:00
Stephan Mueller f1c316a3ab KEYS: add SP800-56A KDF support for DH
SP800-56A defines the use of DH with key derivation function based on a
counter. The input to the KDF is defined as (DH shared secret || other
information). The value for the "other information" is to be provided by
the caller.

The KDF is implemented using the hash support from the kernel crypto API.
The implementation uses the symmetric hash support as the input to the
hash operation is usually very small. The caller is allowed to specify
the hash name that he wants to use to derive the key material allowing
the use of all supported hashes provided with the kernel crypto API.

As the KDF implements the proper truncation of the DH shared secret to
the requested size, this patch fills the caller buffer up to its size.

The patch is tested with a new test added to the keyutils user space
code which uses a CAVS test vector testing the compliance with
SP800-56A.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-04-04 22:33:38 +01:00
Mat Martineau 6563c91fd6 KEYS: Add KEYCTL_RESTRICT_KEYRING
Keyrings recently gained restrict_link capabilities that allow
individual keys to be validated prior to linking.  This functionality
was only available using internal kernel APIs.

With the KEYCTL_RESTRICT_KEYRING command existing keyrings can be
configured to check the content of keys before they are linked, and
then allow or disallow linkage of that key to the keyring.

To restrict a keyring, call:

  keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring, const char *type,
         const char *restriction)

where 'type' is the name of a registered key type and 'restriction' is a
string describing how key linkage is to be restricted. The restriction
option syntax is specific to each key type.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
2017-04-04 14:10:12 -07:00
Mat Martineau 4a420896f1 KEYS: Consistent ordering for __key_link_begin and restrict check
The keyring restrict callback was sometimes called before
__key_link_begin and sometimes after, which meant that the keyring
semaphores were not always held during the restrict callback.

If the semaphores are consistently acquired before checking link
restrictions, keyring contents cannot be changed after the restrict
check is complete but before the evaluated key is linked to the keyring.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
2017-04-04 14:10:11 -07:00
Mat Martineau 2b6aa412ff KEYS: Use structure to capture key restriction function and data
Replace struct key's restrict_link function pointer with a pointer to
the new struct key_restriction. The structure contains pointers to the
restriction function as well as relevant data for evaluating the
restriction.

The garbage collector checks restrict_link->keytype when key types are
unregistered. Restrictions involving a removed key type are converted
to use restrict_link_reject so that restrictions cannot be removed by
unregistering key types.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
2017-04-04 14:10:10 -07:00
Mat Martineau aaf66c8838 KEYS: Split role of the keyring pointer for keyring restrict functions
The first argument to the restrict_link_func_t functions was a keyring
pointer. These functions are called by the key subsystem with this
argument set to the destination keyring, but restrict_link_by_signature
expects a pointer to the relevant trusted keyring.

Restrict functions may need something other than a single struct key
pointer to allow or reject key linkage, so the data used to make that
decision (such as the trust keyring) is moved to a new, fourth
argument. The first argument is now always the destination keyring.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
2017-04-03 10:24:56 -07:00
Mat Martineau 469ff8f7d4 KEYS: Use a typedef for restrict_link function pointers
This pointer type needs to be returned from a lookup function, and
without a typedef the syntax gets cumbersome.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
2017-04-03 10:24:55 -07:00
Elena Reshetova ddb99e118e security, keys: convert key_user.usage from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-03 10:49:06 +10:00
Elena Reshetova fff292914d security, keys: convert key.usage from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-04-03 10:49:05 +10:00
mchehab@s-opensource.com 0e056eb553 kernel-api.rst: fix a series of errors when parsing C files
./lib/string.c:134: WARNING: Inline emphasis start-string without end-string.
./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string.
./mm/filemap.c:1283: ERROR: Unexpected indentation.
./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string.
./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string.
./mm/page_alloc.c:4245: ERROR: Unexpected indentation.
./ipc/util.c:676: ERROR: Unexpected indentation.
./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent.
./security/security.c:109: ERROR: Unexpected indentation.
./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent.
./block/genhd.c:275: WARNING: Inline strong start-string without end-string.
./block/genhd.c:283: WARNING: Inline strong start-string without end-string.
./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
./ipc/util.c:477: ERROR: Unknown target name: "s".

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-04-02 14:31:49 -06:00
Dan Carpenter cae303df3f selinux: Fix an uninitialized variable bug
We removed this initialization as a cleanup but it is probably required.

The concern is that "nel" can be zero.  I'm not an expert on SELinux
code but I think it looks possible to write an SELinux policy which
triggers this bug.  GCC doesn't catch this, but my static checker does.

Fixes: 9c312e79d6 ("selinux: Delete an unnecessary variable initialisation in range_read()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-31 15:16:18 -04:00
Kees Cook 8291798dcf TOMOYO: Use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-30 17:37:45 +11:00
Matthias Kaehlcke 342e91578e selinux: Remove unnecessary check of array base in selinux_set_mapping()
'perms' will never be NULL since it isn't a plain pointer but an array
of u32 values.

This fixes the following warning when building with clang:

security/selinux/ss/services.c:158:16: error: address of array
'p_in->perms' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
                while (p_in->perms && p_in->perms[k]) {

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 18:57:25 -04:00
Markus Elfring 710a0647ba selinuxfs: Use seq_puts() in sel_avc_stats_seq_show()
A string which did not contain data format specifications should be put
into a sequence. Thus use the corresponding function "seq_puts".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:52:30 -04:00
Markus Elfring 8ee4586ca5 selinux: Adjust two checks for null pointers
The script "checkpatch.pl" pointed information out like the following.

Comparison to NULL could be written !…

Thus fix affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:46:10 -04:00
Markus Elfring b380f78377 selinux: Use kmalloc_array() in sidtab_init()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:44:10 -04:00
Markus Elfring ebd2b47ba5 selinux: Return directly after a failed kzalloc() in roles_init()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:39:17 -04:00
Markus Elfring 7befb7514e selinux: Return directly after a failed kzalloc() in perm_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:30:51 -04:00
Markus Elfring 442ca4d656 selinux: Return directly after a failed kzalloc() in common_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:29:11 -04:00
Markus Elfring df4a14dfb4 selinux: Return directly after a failed kzalloc() in class_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:24:58 -04:00
Markus Elfring ea6e2f7d12 selinux: Return directly after a failed kzalloc() in role_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:22:12 -04:00
Markus Elfring 549fe69ee5 selinux: Return directly after a failed kzalloc() in type_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:20:07 -04:00
Markus Elfring 4bd9f07b89 selinux: Return directly after a failed kzalloc() in user_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 11:15:17 -04:00
Markus Elfring b592119100 selinux: Improve another size determination in sens_read()
Replace the specification of a data type by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 10:51:09 -04:00
Markus Elfring 3c354d7d7b selinux: Return directly after a failed kzalloc() in sens_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 09:56:48 -04:00
Markus Elfring 7f6d0ad8b7 selinux: Return directly after a failed kzalloc() in cat_read()
Return directly after a call of the function "kzalloc" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-29 09:54:48 -04:00
David Ahern 983701eb06 rtnetlink: Add RTM_DELNETCONF
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 22:32:42 -07:00
Al Viro db68ce10c4 new helper: uaccess_kernel()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 16:43:25 -04:00
Tetsuo Handa e4e55b47ed LSM: Revive security_task_alloc() hook and per "struct task_struct" security blob.
We switched from "struct task_struct"->security to "struct cred"->security
in Linux 2.6.29. But not all LSM modules were happy with that change.
TOMOYO LSM module is an example which want to use per "struct task_struct"
security blob, for TOMOYO's security context is defined based on "struct
task_struct" rather than "struct cred". AppArmor LSM module is another
example which want to use it, for AppArmor is currently abusing the cred
a little bit to store the change_hat and setexeccon info. Although
security_task_free() hook was revived in Linux 3.4 because Yama LSM module
wanted to release per "struct task_struct" security blob,
security_task_alloc() hook and "struct task_struct"->security field were
not revived. Nowadays, we are getting proposals of lightweight LSM modules
which want to use per "struct task_struct" security blob.

We are already allowing multiple concurrent LSM modules (up to one fully
armored module which uses "struct cred"->security field or exclusive hooks
like security_xfrm_state_pol_flow_match(), plus unlimited number of
lightweight modules which do not use "struct cred"->security nor exclusive
hooks) as long as they are built into the kernel. But this patch does not
implement variable length "struct task_struct"->security field which will
become needed when multiple LSM modules want to use "struct task_struct"->
security field. Although it won't be difficult to implement variable length
"struct task_struct"->security field, let's think about it after we merged
this patch.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Tested-by: Djalal Harouni <tixxdz@gmail.com>
Acked-by: José Bollo <jobol@nonadev.net>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: José Bollo <jobol@nonadev.net>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-28 11:05:14 +11:00
Tetsuo Handa 3dfc9b0286 LSM: Initialize security_hook_heads upon registration.
"struct security_hook_heads" is an array of "struct list_head"
where elements can be initialized just before registration.

There is no need to waste 350+ lines for initialization. Let's
initialize "struct security_hook_heads" just before registration.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-24 14:24:41 +11:00
Markus Elfring 9c312e79d6 selinux: Delete an unnecessary variable initialisation in range_read()
The local variable "rt" will be set to an appropriate pointer a bit later.
Thus omit the explicit initialisation at the beginning which became
unnecessary with a previous update step.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 18:16:55 -04:00
Markus Elfring 57152a5be0 selinux: Return directly after a failed next_entry() in range_read()
Return directly after a call of the function "next_entry" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 18:11:33 -04:00
Markus Elfring 02fcef27cc selinux: Delete an unnecessary variable assignment in filename_trans_read()
The local variable "ft" was set to a null pointer despite of an
immediate reassignment.
Thus remove this statement from the beginning of a loop.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 18:08:05 -04:00
Markus Elfring 315e01ada8 selinux: One function call less in genfs_read() after null pointer detection
Call the function "kfree" at the end only after it was determined
that the local variable "newgenfs" contained a non-null pointer.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 17:53:29 -04:00
Markus Elfring 3a0aa56518 selinux: Return directly after a failed next_entry() in genfs_read()
Return directly after a call of the function "next_entry" failed
at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 17:45:29 -04:00
Markus Elfring b4e4686f65 selinux: Delete an unnecessary return statement in policydb_destroy()
The script "checkpatch.pl" pointed information out like the following.

WARNING: void function return statements are not generally useful

Thus remove such a statement in the affected function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 17:21:26 -04:00
Markus Elfring ad10a10567 selinux: Use kcalloc() in policydb_index()
Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 17:14:16 -04:00
Markus Elfring cb8d21e364 selinux: Adjust four checks for null pointers
The script "checkpatch.pl" pointed information out like the following.

Comparison to NULL could be written !…

Thus fix affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 16:36:38 -04:00
Markus Elfring 2f00e680fe selinux: Use kmalloc_array() in hashtab_create()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 16:31:20 -04:00
Markus Elfring fb13a312da selinux: Improve size determinations in four functions
Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 16:29:02 -04:00
Markus Elfring e34cfef901 selinux: Delete an unnecessary return statement in cond_compute_av()
The script "checkpatch.pl" pointed information out like the following.

WARNING: void function return statements are not generally useful

Thus remove such a statement in the affected function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 16:25:42 -04:00
Markus Elfring f6076f704a selinux: Use kmalloc_array() in cond_init_bool_indexes()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-23 16:21:41 -04:00
Mikhail Kurinnoi 3dd0c8d065 ima: provide ">" and "<" operators for fowner/uid/euid rules.
For now we have only "=" operator for fowner/uid/euid rules. This
patch provide two more operators - ">" and "<" in order to make
fowner/uid/euid rules more flexible.

Examples of usage.

 Appraise all files owned by special and system users (SYS_UID_MAX 999):
    appraise fowner<1000
 Don't appraise files owned by normal users (UID_MIN 1000):
    dont_appraise fowner>999
 Appraise all files owned by users with UID 1000-1010:
    dont_appraise fowner>1010
    appraise fowner>999

Changelog v3:
- Removed code duplication in ima_parse_rule().
- Fix ima_policy_show() - (Mimi)

Changelog v2:
- Fixed default policy rules.

Signed-off-by: Mikhail Kurinnoi <viewizard@viewizard.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>

 security/integrity/ima/ima_policy.c | 115 +++++++++++++++++++++++++++---------
 1 file changed, 87 insertions(+), 28 deletions(-)
2017-03-13 07:01:24 -04:00
Alexander Potapenko e2f586bd83 selinux: check for address length in selinux_socket_bind()
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in selinux_socket_bind():

==================================================================
BUG: KMSAN: use of unitialized memory
inter: 0
CPU: 3 PID: 1074 Comm: packet2 Tainted: G    B           4.8.0-rc6+ #1916
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 0000000000000000 ffff8800882ffb08 ffffffff825759c8 ffff8800882ffa48
 ffffffff818bf551 ffffffff85bab870 0000000000000092 ffffffff85bab550
 0000000000000000 0000000000000092 00000000bb0009bb 0000000000000002
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff825759c8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
 [<ffffffff818bdee6>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1008
 [<ffffffff818bf0fb>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424
 [<ffffffff822dae71>] selinux_socket_bind+0xf41/0x1080 security/selinux/hooks.c:4288
 [<ffffffff8229357c>] security_socket_bind+0x1ec/0x240 security/security.c:1240
 [<ffffffff84265d98>] SYSC_bind+0x358/0x5f0 net/socket.c:1366
 [<ffffffff84265a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
 [<ffffffff81005678>] do_syscall_64+0x58/0x70 arch/x86/entry/common.c:292
 [<ffffffff8518217c>] entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.o:?
chained origin: 00000000ba6009bb
 [<ffffffff810bb7a7>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
 [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
 [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:337
 [<ffffffff818bd2b8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:530
 [<ffffffff818bf033>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380
 [<ffffffff84265b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
 [<ffffffff84265a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
 [<ffffffff81005678>] do_syscall_64+0x58/0x70 arch/x86/entry/common.c:292
 [<ffffffff8518217c>] return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.o:?
origin description: ----address@SYSC_bind (origin=00000000b8c00900)
==================================================================

(the line numbers are relative to 4.8-rc6, but the bug persists upstream)

, when I run the following program as root:

=======================================================
  #include <string.h>
  #include <sys/socket.h>
  #include <netinet/in.h>

  int main(int argc, char *argv[]) {
    struct sockaddr addr;
    int size = 0;
    if (argc > 1) {
      size = atoi(argv[1]);
    }
    memset(&addr, 0, sizeof(addr));
    int fd = socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP);
    bind(fd, &addr, size);
    return 0;
  }
=======================================================

(for different values of |size| other error reports are printed).

This happens because bind() unconditionally copies |size| bytes of
|addr| to the kernel, leaving the rest uninitialized. Then
security_socket_bind() reads the IP address bytes, including the
uninitialized ones, to determine the port, or e.g. pass them further to
sel_netnode_find(), which uses them to calculate a hash.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
[PM: fixed some whitespace damage]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-10 15:22:16 -05:00
Daniel Glöckner 1ac202e978 ima: accept previously set IMA_NEW_FILE
Modifying the attributes of a file makes ima_inode_post_setattr reset
the IMA cache flags. So if the file, which has just been created,
is opened a second time before the first file descriptor is closed,
verification fails since the security.ima xattr has not been written
yet. We therefore have to look at the IMA_NEW_FILE even if the file
already existed.

With this patch there should no longer be an error when cat tries to
open testfile:

$ rm -f testfile
$ ( echo test >&3 ; touch testfile ; cat testfile ) 3>testfile

A file being new is no reason to accept that it is missing a digital
signature demanded by the policy.

Signed-off-by: Daniel Glöckner <dg@emlix.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2017-03-07 07:06:10 -05:00
James Morris bad4417b69 integrity: mark default IMA rules as __ro_after_init
The default IMA rules are loaded during init and then do not
change, so mark them as __ro_after_init.

Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2017-03-06 19:08:57 -05:00
James Morris 579fc0dc09 selinux: constify nlmsg permission tables
Constify nlmsg permission tables, which are initialized once
and then do not change.

Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-03-06 11:58:08 -05:00
James Morris ca97d939db security: mark LSM hooks as __ro_after_init
Mark all of the registration hooks as __ro_after_init (via the
__lsm_ro_after_init macro).

Signed-off-by: James Morris <james.l.morris@oracle.com>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
2017-03-06 11:00:15 +11:00
James Morris dd0859dccb security: introduce CONFIG_SECURITY_WRITABLE_HOOKS
Subsequent patches will add RO hardening to LSM hooks, however, SELinux
still needs to be able to perform runtime disablement after init to handle
architectures where init-time disablement via boot parameters is not feasible.

Introduce a new kernel configuration parameter CONFIG_SECURITY_WRITABLE_HOOKS,
and a helper macro __lsm_ro_after_init, to handle this case.

Signed-off-by: James Morris <james.l.morris@oracle.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Kees Cook <keescook@chromium.org>
2017-03-06 11:00:12 +11:00
Stephen Smalley 84e6885e9e selinux: fix kernel BUG on prlimit(..., NULL, NULL)
commit 79bcf325e6b32b3c ("prlimit,security,selinux: add a security hook
for prlimit") introduced a security hook for prlimit() and implemented it
for SELinux.  However, if prlimit() is called with NULL arguments for both
the new limit and the old limit, then the hook is called with 0 for the
read/write flags, since the prlimit() will neither read nor write the
process' limits.  This would in turn lead to calling avc_has_perm() with 0
for the requested permissions, which triggers a BUG_ON() in
avc_has_perm_noaudit() since the kernel should never be invoking
avc_has_perm() with no permissions.  Fix this in the SELinux hook by
returning immediately if the flags are 0.  Arguably prlimit64() itself
ought to return immediately if both old_rlim and new_rlim are NULL since
it is effectively a no-op in that case.

Reported by the lkp-robot based on trinity testing.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-06 10:44:16 +11:00
Stephen Smalley 791ec491c3 prlimit,security,selinux: add a security hook for prlimit
When SELinux was first added to the kernel, a process could only get
and set its own resource limits via getrlimit(2) and setrlimit(2), so no
MAC checks were required for those operations, and thus no security hooks
were defined for them. Later, SELinux introduced a hook for setlimit(2)
with a check if the hard limit was being changed in order to be able to
rely on the hard limit value as a safe reset point upon context
transitions.

Later on, when prlimit(2) was added to the kernel with the ability to get
or set resource limits (hard or soft) of another process, LSM/SELinux was
not updated other than to pass the target process to the setrlimit hook.
This resulted in incomplete control over both getting and setting the
resource limits of another process.

Add a new security_task_prlimit() hook to the check_prlimit_permission()
function to provide complete mediation.  The hook is only called when
acting on another task, and only if the existing DAC/capability checks
would allow access.  Pass flags down to the hook to indicate whether the
prlimit(2) call will read, write, or both read and write the resource
limits of the target process.

The existing security_task_setrlimit() hook is left alone; it continues
to serve a purpose in supporting the ability to make decisions based on
the old and/or new resource limit values when setting limits.  This
is consistent with the DAC/capability logic, where
check_prlimit_permission() performs generic DAC/capability checks for
acting on another task, while do_prlimit() performs a capability check
based on a comparison of the old and new resource limits.  Fix the
inline documentation for the hook to match the code.

Implement the new hook for SELinux.  For setting resource limits, we
reuse the existing setrlimit permission.  Note that this does overload
the setrlimit permission to mean the ability to set the resource limit
(soft or hard) of another process or the ability to change one's own
hard limit.  For getting resource limits, a new getrlimit permission
is defined.  This was not originally defined since getrlimit(2) could
only be used to obtain a process' own limits.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-06 10:43:47 +11:00
Linus Torvalds 1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
Ingo Molnar 50d34394ce sched/headers: Prepare to remove the <linux/magic.h> include from <linux/sched/task_stack.h>
Update files that depend on the magic.h inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar b2d0910310 sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h>
We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.

But first update code that relied on the implicit header inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:38 +01:00
Ingo Molnar 299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar 5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Ingo Molnar 8703e8a465 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h>
We are going to split <linux/sched/user.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/user.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar 3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Stephen Smalley 2651225b5e selinux: wrap cgroup seclabel support with its own policy capability
commit 1ea0ce4069 ("selinux: allow
changing labels for cgroupfs") broke the Android init program,
which looks up security contexts whenever creating directories
and attempts to assign them via setfscreatecon().
When creating subdirectories in cgroup mounts, this would previously
be ignored since cgroup did not support userspace setting of security
contexts.  However, after the commit, SELinux would attempt to honor
the requested context on cgroup directories and fail due to permission
denial.  Avoid breaking existing userspace/policy by wrapping this change
with a conditional on a new cgroup_seclabel policy capability.  This
preserves existing behavior until/unless a new policy explicitly enables
this capability.

Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-02 10:27:40 +11:00
David Howells 0837e49ab3 KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()
rcu_dereference_key() and user_key_payload() are currently being used in
two different, incompatible ways:

 (1) As a wrapper to rcu_dereference() - when only the RCU read lock used
     to protect the key.

 (2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
     used to protect the key and the may be being modified.

Fix this by splitting both of the key wrappers to produce:

 (1) RCU accessors for keys when caller has the key semaphore locked:

	dereference_key_locked()
	user_key_payload_locked()

 (2) RCU accessors for keys when caller holds the RCU read lock:

	dereference_key_rcu()
	user_key_payload_rcu()

This should fix following warning in the NFS idmapper

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.10.0 #1 Tainted: G        W
  -------------------------------
  ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
  other info that might help us debug this:
  rcu_scheduler_active = 2, debug_locks = 0
  1 lock held by mount.nfs/5987:
    #0:  (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
  stack backtrace:
  CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G        W       4.10.0 #1
  Call Trace:
    dump_stack+0xe8/0x154 (unreliable)
    lockdep_rcu_suspicious+0x140/0x190
    nfs_idmap_get_key+0x380/0x420 [nfsv4]
    nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
    decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
    decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
    nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
    rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
    call_decode+0x29c/0x910 [sunrpc]
    __rpc_execute+0x140/0x8f0 [sunrpc]
    rpc_run_task+0x170/0x200 [sunrpc]
    nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
    _nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
    nfs4_lookup_root+0xe0/0x350 [nfsv4]
    nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
    nfs4_find_root_sec+0xc4/0x100 [nfsv4]
    nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
    nfs4_get_rootfh+0x6c/0x190 [nfsv4]
    nfs4_server_common_setup+0xc4/0x260 [nfsv4]
    nfs4_create_server+0x278/0x3c0 [nfsv4]
    nfs4_remote_mount+0x50/0xb0 [nfsv4]
    mount_fs+0x74/0x210
    vfs_kern_mount+0x78/0x220
    nfs_do_root_mount+0xb0/0x140 [nfsv4]
    nfs4_try_mount+0x60/0x100 [nfsv4]
    nfs_fs_mount+0x5ec/0xda0 [nfs]
    mount_fs+0x74/0x210
    vfs_kern_mount+0x78/0x220
    do_mount+0x254/0xf70
    SyS_mount+0x94/0x100
    system_call+0x38/0xe0

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-02 10:09:00 +11:00
Alexey Dobriyan 5b5e0928f7 lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Dave Jiang 11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Linus Torvalds f1ef09fde1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
2017-02-23 20:33:51 -08:00
Linus Torvalds b2064617c7 driver core patches for 4.11-rc1
Here is the "small" driver core patches for 4.11-rc1.
 
 Not much here, some firmware documentation and self-test updates, a
 debugfs code formatting issue, and a new feature for call_usermodehelper
 to make it more robust on systems that want to lock it down in a more
 secure way.
 
 All of these have been linux-next for a while now with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2jKg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymCEACgozYuqZZ/TUGW0P3xVNi7fbfUWCEAn3nYExrc
 XgevqeYOSKp2We6X/2JX
 =aZ+5
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "small" driver core patches for 4.11-rc1.

  Not much here, some firmware documentation and self-test updates, a
  debugfs code formatting issue, and a new feature for call_usermodehelper
  to make it more robust on systems that want to lock it down in a more
  secure way.

  All of these have been linux-next for a while now with no reported
  issues"

* tag 'driver-core-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: handle null pointers while printing node name and path
  Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()
  Make static usermode helper binaries constant
  kmod: make usermodehelper path a const string
  firmware: revamp firmware documentation
  selftests: firmware: send expected errors to /dev/null
  selftests: firmware: only modprobe if driver is missing
  platform: Print the resource range if device failed to claim
  kref: prefer atomic_inc_not_zero to atomic_add_unless
  debugfs: improve formatting of debugfs_real_fops()
2017-02-22 11:44:32 -08:00
Linus Torvalds 3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds c9341ee0af Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "Highlights:

   - major AppArmor update: policy namespaces & lots of fixes

   - add /sys/kernel/security/lsm node for easy detection of loaded LSMs

   - SELinux cgroupfs labeling support

   - SELinux context mounts on tmpfs, ramfs, devpts within user
     namespaces

   - improved TPM 2.0 support"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (117 commits)
  tpm: declare tpm2_get_pcr_allocation() as static
  tpm: Fix expected number of response bytes of TPM1.2 PCR Extend
  tpm xen: drop unneeded chip variable
  tpm: fix misspelled "facilitate" in module parameter description
  tpm_tis: fix the error handling of init_tis()
  KEYS: Use memzero_explicit() for secret data
  KEYS: Fix an error code in request_master_key()
  sign-file: fix build error in sign-file.c with libressl
  selinux: allow changing labels for cgroupfs
  selinux: fix off-by-one in setprocattr
  tpm: silence an array overflow warning
  tpm: fix the type of owned field in cap_t
  tpm: add securityfs support for TPM 2.0 firmware event log
  tpm: enhance read_log_of() to support Physical TPM event log
  tpm: enhance TPM 2.0 PCR extend to support multiple banks
  tpm: implement TPM 2.0 capability to get active PCR banks
  tpm: fix RC value check in tpm2_seal_trusted
  tpm_tis: fix iTPM probe via probe_itpm() function
  tpm: Begin the process to deprecate user_read_timer
  tpm: remove tpm_read_index and tpm_write_index from tpm.h
  ...
2017-02-21 12:49:56 -08:00
Linus Torvalds 42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
David S. Miller 35eeacf182 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
Dan Carpenter 5217660379 KEYS: Use memzero_explicit() for secret data
I don't think GCC has figured out how to optimize the memset() away, but
they might eventually so let's future proof this code a bit.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-02-10 12:43:51 +11:00
Dan Carpenter 57cb17e764 KEYS: Fix an error code in request_master_key()
This function has two callers and neither are able to handle a NULL
return.  Really, -EINVAL is the correct thing return here anyway.  This
fixes some static checker warnings like:

	security/keys/encrypted-keys/encrypted.c:709 encrypted_key_decrypt()
	error: uninitialized symbol 'master_key'.

Fixes: 7e70cb4978 ("keys: add new key-type encrypted")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-02-10 12:43:49 +11:00
James Morris a2a15479d6 Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/selinux into next 2017-02-10 10:28:49 +11:00
Stephen Smalley 0c461cb727 selinux: fix off-by-one in setprocattr
SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute.  However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb12 ("proc_pid_attr_write():
switch to memdup_user()").  Fix the off-by-one error.

Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate

Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.

There are no users of this facility to my knowledge; possibly we
should just get rid of it.

UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug.  This patch fixes CVE-2017-2618.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Cc: stable@vger.kernel.org # 3.5: d6ea83ec68
Signed-off-by: Paul Moore <paul@paul-moore.com>

Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-02-08 19:09:53 +11:00
James Morris e2241be62d Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/selinux into next 2017-02-08 19:01:07 +11:00
Antonio Murdaca 1ea0ce4069 selinux: allow changing labels for cgroupfs
This patch allows changing labels for cgroup mounts. Previously, running
chcon on cgroupfs would throw an "Operation not supported". This patch
specifically whitelist cgroupfs.

The patch could also allow containers to write only to the systemd cgroup
for instance, while the other cgroups are kept with cgroup_t label.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-02-07 22:17:47 -05:00
Stephen Smalley a050a570db selinux: fix off-by-one in setprocattr
SELinux tries to support setting/clearing of /proc/pid/attr attributes
from the shell by ignoring terminating newlines and treating an
attribute value that begins with a NUL or newline as an attempt to
clear the attribute.  However, the test for clearing attributes has
always been wrong; it has an off-by-one error, and this could further
lead to reading past the end of the allocated buffer since commit
bb646cdb12 ("proc_pid_attr_write():
switch to memdup_user()").  Fix the off-by-one error.

Even with this fix, setting and clearing /proc/pid/attr attributes
from the shell is not straightforward since the interface does not
support multiple write() calls (so shells that write the value and
newline separately will set and then immediately clear the attribute,
requiring use of echo -n to set the attribute), whereas trying to use
echo -n "" to clear the attribute causes the shell to skip the
write() call altogether since POSIX says that a zero-length write
causes no side effects. Thus, one must use echo -n to set and echo
without -n to clear, as in the following example:
$ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate
unconfined_u:object_r:user_home_t:s0
$ echo "" > /proc/$$/attr/fscreate
$ cat /proc/$$/attr/fscreate

Note the use of /proc/$$ rather than /proc/self, as otherwise
the cat command will read its own attribute value, not that of the shell.

There are no users of this facility to my knowledge; possibly we
should just get rid of it.

UPDATE: Upon further investigation it appears that a local process
with the process:setfscreate permission can cause a kernel panic as a
result of this bug.  This patch fixes CVE-2017-2618.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the update about CVE-2017-2618 to the commit description]
Cc: stable@vger.kernel.org # 3.5: d6ea83ec68
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-02-07 17:31:38 -05:00
Lans Zhang 20f482ab9e ima: allow to check MAY_APPEND
Otherwise some mask and inmask tokens with MAY_APPEND flag may not work
as expected.

Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2017-01-27 14:17:21 -05:00
Mimi Zohar bc15ed663e ima: fix ima_d_path() possible race with rename
On failure to return a pathname from ima_d_path(), a pointer to
dname is returned, which is subsequently used in the IMA measurement
list, the IMA audit records, and other audit logging.  Saving the
pointer to dname for later use has the potential to race with rename.

Intead of returning a pointer to dname on failure, this patch returns
a pointer to a copy of the filename.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
2017-01-27 14:16:02 -05:00
James Morris 710584b9da Merge branch 'smack-for-4.11' of git://github.com/cschaufler/smack-next into next 2017-01-27 09:23:21 +11:00
Krister Johansen 4548b683b7 Introduce a sysctl that modifies the value of PROT_SOCK.
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace.  To
disable all privileged ports set this to zero.  It also checks for
overlap with the local port range.  The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration.  The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace.  This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 12:10:51 -05:00
Eric W. Biederman 9227dd2a84 exec: Remove LSM_UNSAFE_PTRACE_CAP
With previous changes every location that tests for
LSM_UNSAFE_PTRACE_CAP also tests for LSM_UNSAFE_PTRACE making the
LSM_UNSAFE_PTRACE_CAP redundant, so remove it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-01-24 12:03:08 +13:00
Eric W. Biederman 20523132ec exec: Test the ptracer's saved cred to see if the tracee can gain caps
Now that we have user namespaces and non-global capabilities verify
the tracer has capabilities in the relevant user namespace instead
of in the current_user_ns().

As the test for setting LSM_UNSAFE_PTRACE_CAP is currently
ptracer_capable(p, current_user_ns()) and the new task credentials are
in current_user_ns() this change does not have any user visible change
and simply moves the test to where it is used, making the code easier
to read.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-01-24 12:03:08 +13:00
Eric W. Biederman 70169420f5 exec: Don't reset euid and egid when the tracee has CAP_SETUID
Don't reset euid and egid when the tracee has CAP_SETUID in
it's user namespace.  I punted on relaxing this permission check
long ago but now that I have read this code closely it is clear
it is safe to test against CAP_SETUID in the user namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-01-24 12:03:07 +13:00
Greg Kroah-Hartman 64e90a8acb Introduce STATIC_USERMODEHELPER to mediate call_usermodehelper()
Some usermode helper applications are defined at kernel build time, while
others can be changed at runtime.  To provide a sane way to filter these, add a
new kernel option "STATIC_USERMODEHELPER".  This option routes all
call_usermodehelper() calls through this binary, no matter what the caller
wishes to have called.

The new binary (by default set to /sbin/usermode-helper, but can be changed
through the STATIC_USERMODEHELPER_PATH option) can properly filter the
requested programs to be run by the kernel by looking at the first argument
that is passed to it.  All other options should then be passed onto the proper
program if so desired.

To disable all call_usermodehelper() calls by the kernel, set
STATIC_USERMODEHELPER_PATH to an empty string.

Thanks to Neil Brown for the idea of this feature.

Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 12:59:45 +01:00
Greg Kroah-Hartman 377e7a27c0 Make static usermode helper binaries constant
There are a number of usermode helper binaries that are "hard coded" in
the kernel today, so mark them as "const" to make it harder for someone
to change where the variables point to.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alex Elder <elder@kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 12:59:45 +01:00
Casey Schaufler d69dece5f5 LSM: Add /sys/kernel/security/lsm
I am still tired of having to find indirect ways to determine
what security modules are active on a system. I have added
/sys/kernel/security/lsm, which contains a comma separated
list of the active security modules. No more groping around
in /proc/filesystems or other clever hacks.

Unchanged from previous versions except for being updated
to the latest security next branch.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-01-19 13:18:29 +11:00
John Johansen 3ccb76c5df apparmor: fix undefined reference to `aa_g_hash_policy'
The kernel build bot turned up a bad config combination when
CONFIG_SECURITY_APPARMOR is y and CONFIG_SECURITY_APPARMOR_HASH is n,
resulting in the build error
   security/built-in.o: In function `aa_unpack':
   (.text+0x841e2): undefined reference to `aa_g_hash_policy'

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 13:21:27 -08:00
John Johansen e6bfa25deb apparmor: replace remaining BUG_ON() asserts with AA_BUG()
AA_BUG() uses WARN and won't break the kernel like BUG_ON().

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:56 -08:00
John Johansen 2c17cd3681 apparmor: fix restricted endian type warnings for policy unpack
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:55 -08:00
John Johansen e6e8bf4188 apparmor: fix restricted endian type warnings for dfa unpack
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:54 -08:00
John Johansen ca4bd5ae0a apparmor: add check for apparmor enabled in module parameters missing it
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:53 -08:00
John Johansen d4669f0b03 apparmor: add per cpu work buffers to avoid allocating buffers at every hook
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:53 -08:00
Tyler Hicks e3ea1ca59a apparmor: sysctl to enable unprivileged user ns AppArmor policy loading
If this sysctl is set to non-zero and a process with CAP_MAC_ADMIN in
the root namespace has created an AppArmor policy namespace,
unprivileged processes will be able to change to a profile in the
newly created AppArmor policy namespace and, if the profile allows
CAP_MAC_ADMIN and appropriate file permissions, will be able to load
policy in the respective policy namespace.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:52 -08:00
William Hua e025be0f26 apparmor: support querying extended trusted helper extra data
Allow a profile to carry extra data that can be queried via userspace.
This provides a means to store extra data in a profile that a trusted
helper can extract and use from live policy.

Signed-off-by: William Hua <william.hua@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:51 -08:00
John Johansen 12eb87d50b apparmor: update cap audit to check SECURITY_CAP_NOAUDIT
apparmor should be checking the SECURITY_CAP_NOAUDIT constant. Also
in complain mode make it so apparmor can elect to log a message,
informing of the check.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:50 -08:00
John Johansen 31f75bfecd apparmor: make computing policy hashes conditional on kernel parameter
Allow turning off the computation of the policy hashes via the
apparmor.hash_policy kernel parameter.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:50 -08:00
John Johansen aa9a39ad8f apparmor: convert change_profile to use fqname later to give better control
Moving the use of fqname to later allows learning profiles to be based
on the fqname request instead of just the hname. It also allows cleaning
up some of the name parsing and lookup by allowing the use of
the fqlookupn_profile() lib fn.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:49 -08:00
John Johansen c3e1e584ad apparmor: fix change_hat debug output
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:48 -08:00
John Johansen 5ef50d014c apparmor: remove unused op parameter from simple_write_to_buffer()
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:48 -08:00
John Johansen ef88a7ac55 apparmor: change aad apparmor_audit_data macro to a fn macro
The aad macro can replace aad strings when it is not intended to. Switch
to a fn macro so it is only applied when intended.

Also at the same time cleanup audit_data initialization by putting
common boiler plate behind a macro, and dropping the gfp_t parameter
which will become useless.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:47 -08:00
John Johansen 47f6e5cc73 apparmor: change op from int to const char *
Having ops be an integer that is an index into an op name table is
awkward and brittle. Every op change requires an edit for both the
op constant and a string in the table. Instead switch to using const
strings directly, eliminating the need for the table that needs to
be kept in sync.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:46 -08:00
John Johansen 55a26ebf63 apparmor: rename context abreviation cxt to the more standard ctx
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:45 -08:00
John Johansen a20aa95fbe apparmor: fail task profile update if current_cred isn't real_cred
Trying to update the task cred while the task current cred is not the
real cred will result in an error at the cred layer. Avoid this by
failing early and delaying the update.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:45 -08:00
John Johansen b7fd2c0340 apparmor: add per policy ns .load, .replace, .remove interface files
Having per policy ns interface files helps with containers restoring
policy.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:44 -08:00
John Johansen 12dd7171d6 apparmor: pass the subject profile into profile replace/remove
This is just setup for new ns specific .load, .replace, .remove interface
files.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:43 -08:00
John Johansen 04dc715e24 apparmor: audit policy ns specified in policy load
Verify that profiles in a load set specify the same policy ns and
audit the name of the policy ns that policy is being loaded for.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:43 -08:00
John Johansen 5ac8c355ae apparmor: allow introspecting the loaded policy pre internal transform
Store loaded policy and allow introspecting it through apparmorfs. This
has several uses from debugging, policy validation, and policy checkpoint
and restore for containers.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:42 -08:00
John Johansen fc1c9fd10a apparmor: add ns name to the audit data for policy loads
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:41 -08:00
John Johansen 078c73c63f apparmor: add profile and ns params to aa_may_manage_policy()
Policy management will be expanded beyond traditional unconfined root.
This will require knowning the profile of the task doing the management
and the ns view.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:40 -08:00
John Johansen fd2a80438d apparmor: add ns being viewed as a param to policy_admin_capable()
Prepare for a tighter pairing of user namespaces and apparmor policy
namespaces, by making the ns to be viewed available.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:40 -08:00
John Johansen 2bd8dbbf22 apparmor: add ns being viewed as a param to policy_view_capable()
Prepare for a tighter pairing of user namespaces and apparmor policy
namespaces, by making the ns to be viewed available and checking
that the user namespace level is the same as the policy ns level.

This strict pairing will be relaxed once true support of user namespaces
lands.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:39 -08:00
John Johansen a6f233003b apparmor: allow specifying the profile doing the management
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:38 -08:00
John Johansen 3e3e569539 apparmor: allow introspecting the policy namespace name
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:37 -08:00
John Johansen b79473f2de apparmor: Make aa_remove_profile() callable from a different view
This is prep work for fs operations being able to remove namespaces.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:37 -08:00
John Johansen ee2351e4b0 apparmor: track ns level so it can be used to help in view checks
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:36 -08:00
John Johansen a71ada3058 apparmor: add special .null file used to "close" fds at exec
Borrow the special null device file from selinux to "close" fds that
don't have sufficient permissions at exec time.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:35 -08:00
John Johansen 34c426acb7 apparmor: provide userspace flag indicating binfmt_elf_mmap change
Commit 9f834ec18d ("binfmt_elf: switch to new creds when switching to new mm")
changed when the creds are installed by the binfmt_elf handler. This
affects which creds are used to mmap the executable into the address
space. Which can have an affect on apparmor policy.

Add a flag to apparmor at
/sys/kernel/security/apparmor/features/domain/fix_binfmt_elf_mmap

to make it possible to detect this semantic change so that the userspace
tools and the regression test suite can correctly deal with the change.

BugLink: http://bugs.launchpad.net/bugs/1630069
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:35 -08:00
John Johansen 11c236b89d apparmor: add a default null dfa
Instead of testing whether a given dfa exists in every code path, have
a default null dfa that is used when loaded policy doesn't provide a
dfa.

This will let us get rid of special casing and avoid dereference bugs
when special casing is missed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:34 -08:00
John Johansen 6604d4c1c1 apparmor: allow policydb to be used as the file dfa
Newer policy will combine the file and policydb dfas, allowing for
better optimizations. However to support older policy we need to
keep the ability to address the "file" dfa separately. So dup
the policydb as if it is the file dfa and set the appropriate start
state.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:33 -08:00
John Johansen 293a4886f9 apparmor: add get_dfa() fn
The dfa is currently setup to be shared (has the basis of refcounting)
but currently can't be because the count can't be increased.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:32 -08:00
John Johansen 474d6b7510 apparmor: prepare to support newer versions of policy
Newer policy encodes more than just version in the version tag,
so add masking to make sure the comparison remains correct.

Note: this is fully compatible with older policy as it will never set
the bits being masked out.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:32 -08:00
John Johansen 5ebfb12822 apparmor: add support for force complain flag to support learning mode
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:31 -08:00
John Johansen abbf873403 apparmor: remove paranoid load switch
Policy should always under go a full paranoid verification.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:30 -08:00
John Johansen 181f7c9776 apparmor: name null-XXX profiles after the executable
When possible its better to name a learning profile after the missing
profile in question. This allows for both more informative names and
for profile reuse.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:30 -08:00
John Johansen 30b026a8d1 apparmor: pass gfp_t parameter into profile allocation
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:29 -08:00
John Johansen 73688d1ed0 apparmor: refactor prepare_ns() and make usable from different views
prepare_ns() will need to be called from alternate views, and namespaces
will need to be created via different interfaces. So refactor and
allow specifying the view ns.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:28 -08:00
John Johansen 5fd1b95fc9 apparmor: update policy_destroy to use new debug asserts
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:27 -08:00
John Johansen d102d89571 apparmor: pass gfp param into aa_policy_init()
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:27 -08:00
John Johansen bbe4a7c873 apparmor: constify policy name and hname
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:26 -08:00
John Johansen 6e474e3063 apparmor: rename hname_tail to basename
Rename to the shorter and more familiar shell cmd name

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:25 -08:00
John Johansen efeee83a70 apparmor: rename mediated_filesystem() to path_mediated_fs()
Rename to indicate the test is only about whether path mediation is used,
not whether other types of mediation might be used.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:24 -08:00
John Johansen 680cd62e91 apparmor: add debug assert AA_BUG and Kconfig to control debug info
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:24 -08:00
John Johansen 57e36bbd67 apparmor: add macro for bug asserts to check that a lock is held
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:23 -08:00
John Johansen 92b6d8eff5 apparmor: allow ns visibility question to consider subnses
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:22 -08:00
John Johansen 31617ddfdd apparmor: add fn to lookup profiles by fqname
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:22 -08:00
John Johansen 3b0aaf5866 apparmor: add lib fn to find the "split" for fqnames
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:21 -08:00
John Johansen 9a2d40c12d apparmor: add strn version of aa_find_ns
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:20 -08:00
John Johansen 1741e9eb8c apparmor: add strn version of lookup_profile fn
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:19 -08:00
John Johansen 8399588a7f apparmor: rename replacedby to proxy
Proxy is shorter and a better fit than replaceby, so rename it.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:18:19 -08:00
John Johansen d97d51d253 apparmor: rename PFLAG_INVALID to PFLAG_STALE
Invalid does not convey the meaning of the flag anymore so rename it.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 01:16:37 -08:00
John Johansen 121d4a91e3 apparmor: rename sid to secid
Move to common terminology with other LSMs and kernel infrastucture

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 00:42:17 -08:00
John Johansen 98849dff90 apparmor: rename namespace to ns to improve code line lengths
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 00:42:16 -08:00
John Johansen cff281f686 apparmor: split apparmor policy namespaces code into its own file
Policy namespaces will be diverging from profile management and
expanding so put it in its own file.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 00:42:15 -08:00
John Johansen fe6bb31f59 apparmor: split out shared policy_XXX fns to lib
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 00:42:14 -08:00
John Johansen 12557dcba2 apparmor: move lib definitions into separate lib include
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-16 00:42:13 -08:00
Kees Cook 8486adf0d7 apparmor: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-15 20:00:32 -08:00
Tetsuo Handa a7f6c1b63b AppArmor: Use GFP_KERNEL for __aa_kvmalloc().
Calling kmalloc(GFP_NOIO) with order == PAGE_ALLOC_COSTLY_ORDER is not
recommended because it might fall into infinite retry loop without
invoking the OOM killer.

Since aa_dfa_unpack() is the only caller of kvzalloc() and
aa_dfa_unpack() which is calling kvzalloc() via unpack_table() is
doing kzalloc(GFP_KERNEL), it is safe to use GFP_KERNEL from
__aa_kvmalloc().

Since aa_simple_write_to_buffer() is the only caller of kvmalloc()
and aa_simple_write_to_buffer() is calling copy_from_user() which
is GFP_KERNEL context (see memdup_user_nul()), it is safe to use
GFP_KERNEL from __aa_kvmalloc().

Therefore, replace GFP_NOIO with GFP_KERNEL. Also, since we have
vmalloc() fallback, add __GFP_NORETRY so that we don't invoke the OOM
killer by kmalloc(GFP_KERNEL) with order == PAGE_ALLOC_COSTLY_ORDER.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2017-01-15 13:41:09 -08:00
Peter Zijlstra 6b1ffa06e5 locking/atomic, kref: Use kref_get_unless_zero() more
For some obscure reason apparmor thinks its needs to locally implement
kref primitives that already exist. Stop doing this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:37:20 +01:00
Stephen Smalley 3a2f5a59a6 security,selinux,smack: kill security_task_wait hook
As reported by yangshukui, a permission denial from security_task_wait()
can lead to a soft lockup in zap_pid_ns_processes() since it only expects
sys_wait4() to return 0 or -ECHILD. Further, security_task_wait() can
in general lead to zombies; in the absence of some way to automatically
reparent a child process upon a denial, the hook is not useful.  Remove
the security hook and its implementations in SELinux and Smack.  Smack
already removed its check from its hook.

Reported-by: yangshukui <yangshukui@huawei.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-12 11:10:57 -05:00
Stephen Smalley b4ba35c75a selinux: drop unused socket security classes
Several of the extended socket classes introduced by
commit da69a5306a ("selinux: support distinctions
among all network address families") are never used because
sockets can never be created with the associated address family.
Remove these unused socket security classes.  The removed classes
are bridge_socket for PF_BRIDGE, ib_socket for PF_IB, and mpls_socket
for PF_MPLS.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-12 11:10:24 -05:00
Seung-Woo Kim 83a1e53f39 Smack: ignore private inode for file functions
The access to fd from anon_inode is always failed because there is
no set xattr operations. So this patch fixes to ignore private
inode including anon_inode for file functions.

It was only ignored for smack_file_receive() to share dma-buf fd,
but dma-buf has other functions like ioctl and mmap.

Reference: https://lkml.org/lkml/2015/4/17/16

Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Rafal Krypa 805b65a80b Smack: fix d_instantiate logic for sockfs and pipefs
Since 4b936885a (v2.6.32) all inodes on sockfs and pipefs are disconnected.
It caused filesystem specific code in smack_d_instantiate to be skipped,
because all inodes on those pseudo filesystems were treated as root inodes.
As a result all sockfs inodes had the Smack label set to floor.

In most cases access checks for sockets use socket_smack data so the inode
label is not important. But there are special cases that were broken.
One example would be calling fcntl with F_SETOWN command on a socket fd.

Now smack_d_instantiate expects all pipefs and sockfs inodes to be
disconnected and has the logic in appropriate place.

Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Himanshu Shukla c9d238a18b SMACK: Use smk_tskacc() instead of smk_access() for proper logging
smack_file_open() is first checking the capability of calling subject,
this check will skip the SMACK logging for success case. Use smk_tskacc()
for proper logging and SMACK access check.

Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Vishal Goel 348dc288d4 Smack: Traverse the smack_known_list using list_for_each_entry_rcu macro
In smack_from_secattr function,"smack_known_list" is being traversed
using list_for_each_entry macro, although it is a rcu protected
structure. So it should be traversed using "list_for_each_entry_rcu"
macro to fetch the rcu protected entry.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Himanshu Shukla 3d4f673a69 SMACK: Free the i_security blob in inode using RCU
There is race condition issue while freeing the i_security blob in SMACK
module. There is existing condition where i_security can be freed while
inode_permission is called from path lookup on second CPU. There has been
observed the page fault with such condition. VFS code and Selinux module
takes care of this condition by freeing the inode and i_security field
using RCU via call_rcu(). But in SMACK directly the i_secuirty blob is
being freed. Use call_rcu() to fix this race condition issue.

Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Himanshu Shukla d54a197964 SMACK: Delete list_head repeated initialization
smk_copy_rules() and smk_copy_relabel() are initializing list_head though
they have been initialized already in new_task_smack() function. Delete
repeated initialization.

Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Vishal Goel 2e962e2fec SMACK: Add new lock for adding entry in smack master list
"smk_set_access()" function adds a new rule entry in subject label specific
list(rule_list) and in global rule list(smack_rule_list) both. Mutex lock
(rule_lock) is used to avoid simultaneous updates. But this lock is subject
label specific lock. If 2 processes tries to add different rules(i.e with
different subject labels) simultaneously, then both the processes can take
the "rule_lock" respectively. So it will cause a problem while adding
entries in master rule list.
Now a new mutex lock(smack_master_list_lock) has been taken to add entry in
smack_rule_list to avoid simultaneous updates of different rules.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Vishal Goel 0c96d1f532 Smack: Fix the issue of wrong SMACK label update in socket bind fail case
Fix the issue of wrong SMACK label (SMACK64IPIN) update when a second bind
call is made to same IP address & port, but with different SMACK label
(SMACK64IPIN) by second instance of server. In this case server returns
with "Bind:Address already in use" error but before returning, SMACK label
is updated in SMACK port-label mapping list inside smack_socket_bind() hook

To fix this issue a new check has been added in smk_ipv6_port_label()
function before updating the existing port entry. It checks whether the
socket for matching port entry is closed or not. If it is closed then it
means port is not bound and it is safe to update the existing port entry
else return if port is still getting used. For checking whether socket is
closed or not, one more field "smk_can_reuse" has been added in the
"smk_port_label" structure. This field will be set to '1' in
"smack_sk_free_security()" function which is called to free the socket
security blob when the socket is being closed. In this function, port entry
is searched in the SMACK port-label mapping list for the closing socket.
If entry is found then "smk_can_reuse" field is set to '1'.Initially
"smk_can_reuse" field is set to '0' in smk_ipv6_port_label() function after
creating a new entry in the list which indicates that socket is in use.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Vishal Goel 9d44c97384 Smack: Fix the issue of permission denied error in ipv6 hook
Permission denied error comes when 2 IPv6 servers are running and client
tries to connect one of them. Scenario is that both servers are using same
IP and port but different protocols(Udp and tcp). They are using different
SMACK64IPIN labels.Tcp server is using "test" and udp server is using
"test-in". When we try to run tcp client with SMACK64IPOUT label as "test",
then connection denied error comes. It should not happen since both tcp
server and client labels are same.This happens because there is no check
for protocol in smk_ipv6_port_label() function while searching for the
earlier port entry. It checks whether there is an existing port entry on
the basis of port only. So it updates the earlier port entry in the list.
Due to which smack label gets changed for earlier entry in the
"smk_ipv6_port_list" list and permission denied error comes.

Now a check is added for socket type also.Now if 2 processes use same
port  but different protocols (tcp or udp), then 2 different port entries
will be  added in the list. Similarly while checking smack access in
smk_ipv6_port_check() function,  port entry is searched on the basis of
both port and protocol.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <Himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Vishal Goel 3c7ce34271 SMACK: Add the rcu synchronization mechanism in ipv6 hooks
Add the rcu synchronization mechanism for accessing smk_ipv6_port_list
in smack IPv6 hooks. Access to the port list is vulnerable to a race
condition issue,it does not apply proper synchronization methods while
working on critical section. It is possible that when one thread is
reading the list, at the same time another thread is modifying the
same port list, which can cause the major problems.

To ensure proper synchronization between two threads, rcu mechanism
has been applied while accessing and modifying the port list. RCU will
also not affect the performance, as there are more accesses than
modification where RCU is most effective synchronization mechanism.

Signed-off-by: Vishal Goel <vishal.goel@samsung.com>
Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2017-01-10 09:47:20 -08:00
Gary Tierney 900fde06cb selinux: default to security isid in sel_make_bools() if no sid is found
Use SECINITSID_SECURITY as the default SID for booleans which don't have
a matching SID returned from security_genfs_sid(), also update the
error message to a warning which matches this.

This prevents the policy failing to load (and consequently the system
failing to boot) when there is no default genfscon statement matched for
the selinuxfs in the new policy.

Signed-off-by: Gary Tierney <gary.tierney@gmx.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:32 -05:00
Gary Tierney 4262fb51c9 selinux: log errors when loading new policy
Adds error logging to the code paths which can fail when loading a new
policy in sel_write_load().  If the policy fails to be loaded from
userspace then a warning message is printed, whereas if a failure occurs
after loading policy from userspace an error message will be printed
with details on where policy loading failed (recreating one of /classes/,
/policy_capabilities/, /booleans/ in the SELinux fs).

Also, if sel_make_bools() fails to obtain an SID for an entry in
/booleans/* an error will be printed indicating the path of the
boolean.

Signed-off-by: Gary Tierney <gary.tierney@gmx.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:31 -05:00
Stephen Smalley b21507e272 proc,security: move restriction on writing /proc/pid/attr nodes to proc
Processes can only alter their own security attributes via
/proc/pid/attr nodes.  This is presently enforced by each individual
security module and is also imposed by the Linux credentials
implementation, which only allows a task to alter its own credentials.
Move the check enforcing this restriction from the individual
security modules to proc_pid_attr_write() before calling the security hook,
and drop the unnecessary task argument to the security hook since it can
only ever be the current task.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:31 -05:00
Stephen Smalley be0554c9bf selinux: clean up cred usage and simplify
SELinux was sometimes using the task "objective" credentials when
it could/should use the "subjective" credentials.  This was sometimes
hidden by the fact that we were unnecessarily passing around pointers
to the current task, making it appear as if the task could be something
other than current, so eliminate all such passing of current.  Inline
various permission checking helper functions that can be reduced to a
single avc_has_perm() call.

Since the credentials infrastructure only allows a task to alter
its own credentials, we can always assume that current must be the same
as the target task in selinux_setprocattr after the check. We likely
should move this check from selinux_setprocattr() to proc_pid_attr_write()
and drop the task argument to the security hook altogether; it can only
serve to confuse things.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:31 -05:00
Stephen Smalley 01593d3299 selinux: allow context mounts on tmpfs, ramfs, devpts within user namespaces
commit aad82892af ("selinux: Add support for
unprivileged mounts from user namespaces") prohibited any use of context
mount options within non-init user namespaces.  However, this breaks
use of context mount options for tmpfs mounts within user namespaces,
which are being used by Docker/runc.  There is no reason to block such
usage for tmpfs, ramfs or devpts.  Exempt these filesystem types
from this restriction.

Before:
sh$ userns_child_exec  -p -m -U -M '0 1000 1' -G '0 1000 1' bash
sh# mount -t tmpfs -o context=system_u:object_r:user_tmp_t:s0:c13 none /tmp
mount: tmpfs is write-protected, mounting read-only
mount: cannot mount tmpfs read-only

After:
sh$ userns_child_exec  -p -m -U -M '0 1000 1' -G '0 1000 1' bash
sh# mount -t tmpfs -o context=system_u:object_r:user_tmp_t:s0:c13 none /tmp
sh# ls -Zd /tmp
unconfined_u:object_r:user_tmp_t:s0:c13 /tmp

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:31 -05:00
Stephen Smalley ef37979a2c selinux: handle ICMPv6 consistently with ICMP
commit 79c8b348f215 ("selinux: support distinctions among all network
address families") mapped datagram ICMP sockets to the new icmp_socket
security class, but left ICMPv6 sockets unchanged.  This change fixes
that oversight to handle both kinds of sockets consistently.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:31 -05:00
Yongqin Liu a2c7c6fbe5 selinux: add security in-core xattr support for tracefs
Since kernel 4.1 ftrace is supported as a new separate filesystem. It
gets automatically mounted by the kernel under the old path
/sys/kernel/debug/tracing. Because it lives now on a separate filesystem
SELinux needs to be updated to also support setting SELinux labels
on tracefs inodes.  This is required for compatibility in Android
when moving to Linux 4.1 or newer.

Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: William Roberts <william.c.roberts@intel.com>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:30 -05:00
Stephen Smalley da69a5306a selinux: support distinctions among all network address families
Extend SELinux to support distinctions among all network address families
implemented by the kernel by defining new socket security classes
and mapping to them. Otherwise, many sockets are mapped to the generic
socket class and are indistinguishable in policy.  This has come up
previously with regard to selectively allowing access to bluetooth sockets,
and more recently with regard to selectively allowing access to AF_ALG
sockets.  Guido Trentalancia submitted a patch that took a similar approach
to add only support for distinguishing AF_ALG sockets, but this generalizes
his approach to handle all address families implemented by the kernel.
Socket security classes are also added for ICMP and SCTP sockets.
Socket security classes were not defined for AF_* values that are reserved
but unimplemented in the kernel, e.g. AF_NETBEUI, AF_SECURITY, AF_ASH,
AF_ECONET, AF_SNA, AF_WANPIPE.

Backward compatibility is provided by only enabling the finer-grained
socket classes if a new policy capability is set in the policy; older
policies will behave as before.  The legacy redhat1 policy capability
that was only ever used in testing within Fedora for ptrace_child
is reclaimed for this purpose; as far as I can tell, this policy
capability is not enabled in any supported distro policy.

Add a pair of conditional compilation guards to detect when new AF_* values
are added so that we can update SELinux accordingly rather than having to
belatedly update it long after new address families are introduced.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-09 10:07:30 -05:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds 67327145c4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull SElinux fix from James Morris:
 "From Paul:
   'A small SELinux patch to fix some clang/llvm compiler warnings and
    ensure the tools under scripts work well in the face of kernel
    changes'"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  selinux: use the kernel headers when building scripts/selinux
2016-12-22 10:03:52 -08:00
Paul Moore bfc5e3a6af selinux: use the kernel headers when building scripts/selinux
Commit 3322d0d64f ("selinux: keep SELinux in sync with new capability
definitions") added a check on the defined capabilities without
explicitly including the capability header file which caused problems
when building genheaders for users of clang/llvm.  Resolve this by
using the kernel headers when building genheaders, which is arguably
the right thing to do regardless, and explicitly including the
kernel's capability.h header file in classmap.h.  We also update the
mdp build, even though it wasn't causing an error we really should
be using the headers from the kernel we are building.

Reported-by: Nicolas Iooss <nicolas.iooss@m4x.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-21 10:39:25 -05:00
Andreas Steffen 98e1d55d03 ima: platform-independent hash value
For remote attestion it is important for the ima measurement values to
be platform-independent.  Therefore integer fields to be hashed must be
converted to canonical format.

Link: http://lkml.kernel.org/r/1480554346-29071-11-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Andreas Steffen <andreas.steffen@strongswan.org>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:46 -08:00
Mimi Zohar d68a6fe9fc ima: define a canonical binary_runtime_measurements list format
The IMA binary_runtime_measurements list is currently in platform native
format.

To allow restoring a measurement list carried across kexec with a
different endianness than the targeted kernel, this patch defines
little-endian as the canonical format.  For big endian systems wanting
to save/restore the measurement list from a system with a different
endianness, a new boot command line parameter named "ima_canonical_fmt"
is defined.

Considerations: use of the "ima_canonical_fmt" boot command line option
will break existing userspace applications on big endian systems
expecting the binary_runtime_measurements list to be in platform native
format.

Link: http://lkml.kernel.org/r/1480554346-29071-10-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:45 -08:00
Mimi Zohar c7d0936770 ima: support restoring multiple template formats
The configured IMA measurement list template format can be replaced at
runtime on the boot command line, including a custom template format.
This patch adds support for restoring a measuremement list containing
multiple builtin/custom template formats.

Link: http://lkml.kernel.org/r/1480554346-29071-9-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:45 -08:00
Mimi Zohar 3f23d624de ima: store the builtin/custom template definitions in a list
The builtin and single custom templates are currently stored in an
array.  In preparation for being able to restore a measurement list
containing multiple builtin/custom templates, this patch stores the
builtin and custom templates as a linked list.  This will permit
defining more than one custom template per boot.

Link: http://lkml.kernel.org/r/1480554346-29071-8-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:45 -08:00
Mimi Zohar 7b8589cc29 ima: on soft reboot, save the measurement list
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot.

This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.

Link: http://lkml.kernel.org/r/1480554346-29071-7-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:44 -08:00
Mimi Zohar d158847ae8 ima: maintain memory size needed for serializing the measurement list
In preparation for serializing the binary_runtime_measurements, this
patch maintains the amount of memory required.

Link: http://lkml.kernel.org/r/1480554346-29071-5-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:44 -08:00
Mimi Zohar dcfc56937b ima: permit duplicate measurement list entries
Measurements carried across kexec need to be added to the IMA
measurement list, but should not prevent measurements of the newly
booted kernel from being added to the measurement list.  This patch adds
support for allowing duplicate measurements.

The "boot_aggregate" measurement entry is the delimiter between soft
boots.

Link: http://lkml.kernel.org/r/1480554346-29071-4-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:43 -08:00
Mimi Zohar 94c3aac567 ima: on soft reboot, restore the measurement list
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and restored on boot.  This
patch restores the measurement list.

Link: http://lkml.kernel.org/r/1480554346-29071-3-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:43 -08:00
Linus Torvalds 9a19a6db37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
2016-12-16 10:24:44 -08:00
Al Viro c4364f837c Merge branches 'work.namei', 'work.dcache' and 'work.iov_iter' into for-linus 2016-12-15 01:07:29 -05:00
Linus Torvalds a57cb1c1d7 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a few misc things

 - kexec updates

 - DMA-mapping updates to better support networking DMA operations

 - IPC updates

 - various MM changes to improve DAX fault handling

 - lots of radix-tree changes, mainly to the test suite. All leading up
   to reimplementing the IDA/IDR code to be a wrapper layer over the
   radix-tree. However the final trigger-pulling patch is held off for
   4.11.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  radix tree test suite: delete unused rcupdate.c
  radix tree test suite: add new tag check
  radix-tree: ensure counts are initialised
  radix tree test suite: cache recently freed objects
  radix tree test suite: add some more functionality
  idr: reduce the number of bits per level from 8 to 6
  rxrpc: abstract away knowledge of IDR internals
  tpm: use idr_find(), not idr_find_slowpath()
  idr: add ida_is_empty
  radix tree test suite: check multiorder iteration
  radix-tree: fix replacement for multiorder entries
  radix-tree: add radix_tree_split_preload()
  radix-tree: add radix_tree_split
  radix-tree: add radix_tree_join
  radix-tree: delete radix_tree_range_tag_if_tagged()
  radix-tree: delete radix_tree_locate_item()
  radix-tree: improve multiorder iterators
  btrfs: fix race in btrfs_free_dummy_fs_info()
  radix-tree: improve dump output
  radix-tree: make radix_tree_find_next_bit more useful
  ...
2016-12-14 17:25:18 -08:00
Lorenzo Stoakes 5b56d49fc3 mm: add locked parameter to get_user_pages_remote()
Patch series "mm: unexport __get_user_pages_unlocked()".

This patch series continues the cleanup of get_user_pages*() functions
taking advantage of the fact we can now pass gup_flags as we please.

It firstly adds an additional 'locked' parameter to
get_user_pages_remote() to allow for its callers to utilise
VM_FAULT_RETRY functionality.  This is necessary as the invocation of
__get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
this and no other existing higher level function would allow it to do
so.

Secondly existing callers of __get_user_pages_unlocked() are replaced
with the appropriate higher-level replacement -
get_user_pages_unlocked() if the current task and memory descriptor are
referenced, or get_user_pages_remote() if other task/memory descriptors
are referenced (having acquiring mmap_sem.)

This patch (of 2):

Add a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().

Taking into account the previous adjustments to get_user_pages*()
functions allowing for the passing of gup_flags, we are now in a
position where __get_user_pages_unlocked() need only be exported for his
ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
subsequently unexport __get_user_pages_unlocked() as well as allowing
for future flexibility in the use of get_user_pages_remote().

[sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
  Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Linus Torvalds 412ac77a9d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "After a lot of discussion and work we have finally reachanged a basic
  understanding of what is necessary to make unprivileged mounts safe in
  the presence of EVM and IMA xattrs which the last commit in this
  series reflects. While technically it is a revert the comments it adds
  are important for people not getting confused in the future. Clearing
  up that confusion allows us to seriously work on unprivileged mounts
  of fuse in the next development cycle.

  The rest of the fixes in this set are in the intersection of user
  namespaces, ptrace, and exec. I started with the first fix which
  started a feedback cycle of finding additional issues during review
  and fixing them. Culiminating in a fix for a bug that has been present
  since at least Linux v1.0.

  Potentially these fixes were candidates for being merged during the rc
  cycle, and are certainly backport candidates but enough little things
  turned up during review and testing that I decided they should be
  handled as part of the normal development process just to be certain
  there were not any great surprises when it came time to backport some
  of these fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Revert "evm: Translate user/group ids relative to s_user_ns when computing HMAC"
  exec: Ensure mm->user_ns contains the execed files
  ptrace: Don't allow accessing an undumpable mm
  ptrace: Capture the ptracer's creds not PT_PTRACE_CAP
  mm: Add a user_ns owner to mm_struct and fix ptrace permission checks
2016-12-14 14:09:48 -08:00
Linus Torvalds 683b96f4d1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Generally pretty quiet for this release. Highlights:

  Yama:
   - allow ptrace access for original parent after re-parenting

  TPM:
   - add documentation
   - many bugfixes & cleanups
   - define a generic open() method for ascii & bios measurements

  Integrity:
   - Harden against malformed xattrs

  SELinux:
   - bugfixes & cleanups

  Smack:
   - Remove unnecessary smack_known_invalid label
   - Do not apply star label in smack_setprocattr hook
   - parse mnt opts after privileges check (fixes unpriv DoS vuln)"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits)
  Yama: allow access for the current ptrace parent
  tpm: adjust return value of tpm_read_log
  tpm: vtpm_proxy: conditionally call tpm_chip_unregister
  tpm: Fix handling of missing event log
  tpm: Check the bios_dir entry for NULL before accessing it
  tpm: return -ENODEV if np is not set
  tpm: cleanup of printk error messages
  tpm: replace of_find_node_by_name() with dev of_node property
  tpm: redefine read_log() to handle ACPI/OF at runtime
  tpm: fix the missing .owner in tpm_bios_measurements_ops
  tpm: have event log use the tpm_chip
  tpm: drop tpm1_chip_register(/unregister)
  tpm: replace dynamically allocated bios_dir with a static array
  tpm: replace symbolic permission with octal for securityfs files
  char: tpm: fix kerneldoc tpm2_unseal_trusted name typo
  tpm_tis: Allow tpm_tis to be bound using DT
  tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV
  tpm: Only call pm_runtime_get_sync if device has a parent
  tpm: define a generic open() method for ascii & bios measurements
  Documentation: tpm: add the Physical TPM device tree binding documentation
  ...
2016-12-14 13:57:44 -08:00
Linus Torvalds 9465d9cc31 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The time/timekeeping/timer folks deliver with this update:

   - Fix a reintroduced signed/unsigned issue and cleanup the whole
     signed/unsigned mess in the timekeeping core so this wont happen
     accidentaly again.

   - Add a new trace clock based on boot time

   - Prevent injection of random sleep times when PM tracing abuses the
     RTC for storage

   - Make posix timers configurable for real tiny systems

   - Add tracepoints for the alarm timer subsystem so timer based
     suspend wakeups can be instrumented

   - The usual pile of fixes and updates to core and drivers"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  timekeeping: Use mul_u64_u32_shr() instead of open coding it
  timekeeping: Get rid of pointless typecasts
  timekeeping: Make the conversion call chain consistently unsigned
  timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
  alarmtimer: Add tracepoints for alarm timers
  trace: Update documentation for mono, mono_raw and boot clock
  trace: Add an option for boot clock as trace clock
  timekeeping: Add a fast and NMI safe boot clock
  timekeeping/clocksource_cyc2ns: Document intended range limitation
  timekeeping: Ignore the bogus sleep time if pm_trace is enabled
  selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
  clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
  clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
  arm64: dts: rockchip: Arch counter doesn't tick in system suspend
  clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
  posix-timers: Make them configurable
  posix_cpu_timers: Move the add_device_randomness() call to a proper place
  timer: Move sys_alarm from timer.c to itimer.c
  ptp_clock: Allow for it to be optional
  Kconfig: Regenerate *.c_shipped files after previous changes
  ...
2016-12-12 19:56:15 -08:00
Al Viro cbbd26b8b1 [iov_iter] new primitives - copy_from_iter_full() and friends
copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users.  *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case.  Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-05 14:33:36 -05:00
Josh Stone 50523a29d9 Yama: allow access for the current ptrace parent
Under ptrace_scope=1, it's possible to have a tracee that is already
ptrace-attached, but is no longer a direct descendant.  For instance, a
forking daemon will be re-parented to init, losing its ancestry to the
tracer that launched it.

The tracer can continue using ptrace in that state, but it will be
denied other accesses that check PTRACE_MODE_ATTACH, like process_vm_rw
and various procfs files.  There's no reason to prevent such access for
a tracer that already has ptrace control anyway.

This patch adds a case to ptracer_exception_found to allow access for
any task in the same thread group as the current ptrace parent.

Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-12-05 11:48:01 +11:00
Al Viro 450630975d don't open-code file_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-04 18:29:28 -05:00
Eric W. Biederman 19339c2516 Revert "evm: Translate user/group ids relative to s_user_ns when computing HMAC"
This reverts commit 0b3c9761d1.

Seth Forshee <seth.forshee@canonical.com> writes:
> All right, I think 0b3c9761d1 should be
> reverted then. EVM is a machine-local integrity mechanism, and so it
> makes sense that the signature would be based on the kernel's notion of
> the uid and not the filesystem's.

I added a commment explaining why the EVM hmac needs to be in the
kernel's notion of uid and gid, not the filesystems to prevent
remounting the filesystem and gaining unwaranted trust in files.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-12-02 20:58:41 -06:00
James Morris 0821e30cd2 Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/selinux into next 2016-11-24 11:21:25 +11:00
James Morris b075361e91 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next 2016-11-23 09:52:11 +11:00
Andreas Gruenbacher 9287aed2ad selinux: Convert isec->lock into a spinlock
Convert isec->lock from a mutex into a spinlock.  Instead of holding
the lock while sleeping in inode_doinit_with_dentry, set
isec->initialized to LABEL_PENDING and release the lock.  Then, when
the sid has been determined, re-acquire the lock.  If isec->initialized
is still set to LABEL_PENDING, set isec->sid; otherwise, the sid has
been set by another task (LABEL_INITIALIZED) or invalidated
(LABEL_INVALID) in the meantime.

This fixes a deadlock on gfs2 where

 * one task is in inode_doinit_with_dentry -> gfs2_getxattr, holds
   isec->lock, and tries to acquire the inode's glock, and

 * another task is in do_xmote -> inode_go_inval ->
   selinux_inode_invalidate_secctx, holds the inode's glock, and
   tries to acquire isec->lock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[PM: minor tweaks to keep checkpatch.pl happy]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-22 17:44:02 -05:00
James Morris 636e4625ad Merge remote branch 'smack/smack-for-4.10' into next 2016-11-22 22:15:30 +11:00
Stephen Smalley 3322d0d64f selinux: keep SELinux in sync with new capability definitions
When a new capability is defined, SELinux needs to be updated.
Trigger a build error if a new capability is defined without
corresponding update to security/selinux/include/classmap.h's
COMMON_CAP2_PERMS.  This is similar to BUILD_BUG_ON() guards
in the SELinux nlmsgtab code to ensure that SELinux tracks
new netlink message types as needed.

Note that there is already a similar build guard in
security/selinux/hooks.c to detect when more than 64
capabilities are defined, since that will require adding
a third capability class to SELinux.

A nicer way to do this would be to extend scripts/selinux/genheaders
or a similar tool to auto-generate the necessary definitions and code
for SELinux capability checking from include/uapi/linux/capability.h.
AppArmor does something similar in its Makefile, although it only
needs to generate a single table of names.  That is left as future
work.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: reformat the description to keep checkpatch.pl happy]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-21 15:37:24 -05:00
John Johansen 3d40658c97 apparmor: fix change_hat not finding hat after policy replacement
After a policy replacement, the task cred may be out of date and need
to be updated. However change_hat is using the stale profiles from
the out of date cred resulting in either: a stale profile being applied
or, incorrect failure when searching for a hat profile as it has been
migrated to the new parent profile.

Fixes: 01e2b670aa (failure to find hat)
Fixes: 898127c34e (stale policy being applied)
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000287
Cc: stable@vger.kernel.org
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-11-21 18:01:28 +11:00
Stephen Smalley ea49d10eee selinux: normalize input to /sys/fs/selinux/enforce
At present, one can write any signed integer value to
/sys/fs/selinux/enforce and it will be stored,
e.g. echo -1 > /sys/fs/selinux/enforce or echo 2 >
/sys/fs/selinux/enforce. This makes no real difference
to the kernel, since it only ever cares if it is zero or non-zero,
but some userspace code compares it with 1 to decide if SELinux
is enforcing, and this could confuse it. Only a process that is
already root and is allowed the setenforce permission in SELinux
policy can write to /sys/fs/selinux/enforce, so this is not considered
to be a security issue, but it should be fixed.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-20 17:13:19 -05:00
Nicolas Pitre baa73d9e47 posix-timers: Make them configurable
Some embedded systems have no use for them.  This removes about
25KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:26:35 +01:00
Casey Schaufler 152f91d4d1 Smack: Remove unnecessary smack_known_invalid
The invalid Smack label ("") and the Huh ("?") Smack label
serve the same purpose and having both is unnecessary.
While pulling out the invalid label it became clear that
the use of smack_from_secid() was inconsistent, so that
is repaired. The setting of inode labels to the invalid
label could never happen in a functional system, has
never been observed in the wild and is not what you'd
really want for a failure behavior in any case. That is
removed.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-15 09:34:39 -08:00
Tetsuo Handa 8c15d66e42 Smack: Use GFP_KERNEL for smack_parse_opts_str().
Since smack_parse_opts_str() is calling match_strdup() which uses
GFP_KERNEL, it is safe to use GFP_KERNEL from kcalloc() which is
called by smack_parse_opts_str().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-14 12:55:11 -08:00
Andreas Gruenbacher 13457d073c selinux: Clean up initialization of isec->sclass
Now that isec->initialized == LABEL_INITIALIZED implies that
isec->sclass is valid, skip such inodes immediately in
inode_doinit_with_dentry.

For the remaining inodes, initialize isec->sclass at the beginning of
inode_doinit_with_dentry to simplify the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-14 15:53:04 -05:00
Andreas Gruenbacher db978da8fa proc: Pass file mode to proc_pid_make_inode
Pass the file mode of the proc inode to be created to
proc_pid_make_inode.  In proc_pid_make_inode, initialize inode->i_mode
before calling security_task_to_inode.  This allows selinux to set
isec->sclass right away without introducing "half-initialized" inode
security structs.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-14 15:39:48 -05:00
Andreas Gruenbacher 420591128c selinux: Minor cleanups
Fix the comment for function __inode_security_revalidate, which returns
an integer.

Use the LABEL_* constants consistently for isec->initialized.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-14 15:25:07 -05:00
Tetsuo Handa 8931c3bdb3 SELinux: Use GFP_KERNEL for selinux_parse_opts_str().
Since selinux_parse_opts_str() is calling match_strdup() which uses
GFP_KERNEL, it is safe to use GFP_KERNEL from kcalloc() which is
called by selinux_parse_opts_str().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-11-14 15:03:38 -05:00
Seth Forshee b4bfec7f4a security/integrity: Harden against malformed xattrs
In general the handling of IMA/EVM xattrs is good, but I found
a few locations where either the xattr size or the value of the
type field in the xattr are not checked. Add a few simple checks
to these locations to prevent malformed or malicious xattrs from
causing problems.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-11-13 22:50:11 -05:00
Mimi Zohar 064be15c52 ima: include the reason for TPM-bypass mode
This patch includes the reason for going into TPM-bypass mode
and not using the TPM.

Signed-off-by: Mimi Zohar (zohar@linux.vnet.ibm>
2016-11-13 22:50:09 -05:00
Mimi Zohar f5acb3dcba Revert "ima: limit file hash setting by user to fix and log modes"
Userspace applications have been modified to write security xattrs,
but they are not context aware.  In the case of security.ima, the
security xattr can be either a file hash or a file signature.
Permitting writing one, but not the other requires the application to
be context aware.

In addition, userspace applications might write files to a staging
area, which might not be in policy, and then change some file metadata
(eg. owner) making it in policy.  As a result, these files are not
labeled properly.

This reverts commit c68ed80c97, which
prevents writing file hashes as security.ima xattrs.

Requested-by: Patrick Ohly <patrick.ohly@intel.com>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-11-13 22:50:09 -05:00
Eric Richter 9a11a18902 ima: fix memory leak in ima_release_policy
When the "policy" securityfs file is opened for read, it is opened as a
sequential file. However, when it is eventually released, there is no
cleanup for the sequential file, therefore some memory is leaked.

This patch adds a call to seq_release() in ima_release_policy() to clean up
the memory when the file is opened for read.

Fixes: 80eae209d6 IMA: allow reading back the current policy
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Eric Richter <erichte@linux.vnet.ibm.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2016-11-13 22:50:08 -05:00
Casey Schaufler 2e4939f702 Smack: ipv6 label match fix
The check for a deleted entry in the list of IPv6 host
addresses was being performed in the wrong place, leading
to most peculiar results in some cases. This puts the
check into the right place.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-10 11:22:18 -08:00
Himanshu Shukla b437aba85b SMACK: Fix the memory leak in smack_cred_prepare() hook
Memory leak in smack_cred_prepare()function.
smack_cred_prepare() hook returns error if there is error in allocating
memory in smk_copy_rules() or smk_copy_relabel() function.
If smack_cred_prepare() function returns error then the calling
function should call smack_cred_free() function for cleanup.
In smack_cred_free() function first credential is  extracted and
then all rules are deleted. In smack_cred_prepare() function security
field is assigned in the end when all function return success. But this
function may return before and memory will not be freed.

Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-10 11:22:06 -08:00
Himanshu Shukla 7128ea159d SMACK: Do not apply star label in smack_setprocattr hook
Smack prohibits processes from using the star ("*") and web ("@") labels.
Checks have been added in other functions. In smack_setprocattr()
hook, only check for web ("@") label has been added and restricted
from applying web ("@") label.
Check for star ("*") label should also be added in smack_setprocattr()
hook. Return error should be "-EINVAL" not "-EPERM" as permission
is there for setting label but not the label value as star ("*") or
web ("@").

Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-10 11:21:52 -08:00
Himanshu Shukla 2097f59920 smack: parse mnt opts after privileges check
In smack_set_mnt_opts()first the SMACK mount options are being
parsed and later it is being checked whether the user calling
mount has CAP_MAC_ADMIN capability.
This sequence of operationis will allow unauthorized user to add
SMACK labels in label list and may cause denial of security attack
by adding many labels by allocating kernel memory by unauthorized user.
Superblock smack flag is also being set as initialized though function
may return with EPERM error.
First check the capability of calling user then set the SMACK attributes
and smk_flags.

Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-10 11:21:32 -08:00
jooseong lee 08382c9f6e Smack: Assign smack_known_web label for kernel thread's
Assign smack_known_web label for kernel thread's socket

Creating struct sock by sk_alloc function in various kernel subsystems
like bluetooth doesn't call smack_socket_post_create(). In such case,
received sock label is the floor('_') label and makes access deny.

Signed-off-by: jooseong lee <jooseong.lee@samsung.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
2016-11-04 17:42:57 -07:00
Artem Savkov 31e6ec4519 security/keys: make BIG_KEYS dependent on stdrng.
Since BIG_KEYS can't be compiled as module it requires one of the "stdrng"
providers to be compiled into kernel. Otherwise big_key_crypto_init() fails
on crypto_alloc_rng step and next dereference of big_key_skcipher (e.g. in
big_key_preparse()) results in a NULL pointer dereference.

Fixes: 13100a72f4 ('Security: Keys: Big keys stored encrypted')
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Stephan Mueller <smueller@chronox.de>
cc: Kirill Marinushkin <k.marinushkin@gmail.com>
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-10-27 16:03:33 +11:00
David Howells 7df3e59c3d KEYS: Sort out big_key initialisation
big_key has two separate initialisation functions, one that registers the
key type and one that registers the crypto.  If the key type fails to
register, there's no problem if the crypto registers successfully because
there's no way to reach the crypto except through the key type.

However, if the key type registers successfully but the crypto does not,
big_key_rng and big_key_blkcipher may end up set to NULL - but the code
neither checks for this nor unregisters the big key key type.

Furthermore, since the key type is registered before the crypto, it is
theoretically possible for the kernel to try adding a big_key before the
crypto is set up, leading to the same effect.

Fix this by merging big_key_crypto_init() and big_key_init() and calling
the resulting function late.  If they're going to be encrypted, we
shouldn't be creating big_keys before we have the facilities to do the
encryption available.  The key type registration is also moved after the
crypto initialisation.

The fix also includes message printing on failure.

If the big_key type isn't correctly set up, simply doing:

	dd if=/dev/zero bs=4096 count=1 | keyctl padd big_key a @s

ought to cause an oops.

Fixes: 13100a72f4 ('Security: Keys: Big keys stored encrypted')
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Peter Hlavaty <zer0mem@yahoo.com>
cc: Kirill Marinushkin <k.marinushkin@gmail.com>
cc: Artem Savkov <asavkov@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-10-27 16:03:27 +11:00
David Howells 03dab869b7 KEYS: Fix short sprintf buffer in /proc/keys show function
This fixes CVE-2016-7042.

Fix a short sprintf buffer in proc_keys_show().  If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.

The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:

	(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
	$2 = 30500568904943

That's 14 chars plus NUL, not 11 chars plus NUL.

Expand the buffer to 16 chars.

I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.

The panic incurred looks something like:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
 ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
 ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
Call Trace:
 [<ffffffff813d941f>] dump_stack+0x63/0x84
 [<ffffffff811b2cb6>] panic+0xde/0x22a
 [<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
 [<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
 [<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
 [<ffffffff81350410>] ? key_validate+0x50/0x50
 [<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
 [<ffffffff8126b31c>] seq_read+0x2cc/0x390
 [<ffffffff812b6b12>] proc_reg_read+0x42/0x70
 [<ffffffff81244fc7>] __vfs_read+0x37/0x150
 [<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
 [<ffffffff81246156>] vfs_read+0x96/0x130
 [<ffffffff81247635>] SyS_read+0x55/0xc0
 [<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Reported-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ondrej Kozina <okozina@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-10-27 16:03:24 +11:00
Linus Torvalds 86c5bf7101 Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vmap stack fixes from Ingo Molnar:
 "This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
  accesses that used to be just somewhat questionable are now totally
  buggy.

  These changes try to do it without breaking the ABI: the fields are
  left there, they are just reporting zero, or reporting narrower
  information (the maps file change)"

* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
  fs/proc: Stop trying to report thread stacks
  fs/proc: Stop reporting eip and esp in /proc/PID/stat
  mm/numa: Remove duplicated include from mprotect.c
2016-10-22 09:39:10 -07:00
Andy Lutomirski d17af5056c mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
Asking for a non-current task's stack can't be done without races
unless the task is frozen in kernel mode.  As far as I know,
vm_is_stack_for_task() never had a safe non-current use case.

The __unused annotation is because some KSTK_ESP implementations
ignore their parameter, which IMO is further justification for this
patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Link: http://lkml.kernel.org/r/4c3f68f426e6c061ca98b4fc7ef85ffbb0a25b0c.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:21:41 +02:00
Lorenzo Stoakes 9beae1ea89 mm: replace get_user_pages_remote() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:02 -07:00
Linus Torvalds 101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Al Viro 3873691e5a Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
Linus Torvalds 97d2116708 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "xattr stuff from Andreas

  This completes the switch to xattr_handler ->get()/->set() from
  ->getxattr/->setxattr/->removexattr"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Remove {get,set,remove}xattr inode operations
  xattr: Stop calling {get,set,remove}xattr inode operations
  vfs: Check for the IOP_XATTR flag in listxattr
  xattr: Add __vfs_{get,set,remove}xattr helpers
  libfs: Use IOP_XATTR flag for empty directory handling
  vfs: Use IOP_XATTR flag for bad-inode handling
  vfs: Add IOP_XATTR inode operations flag
  vfs: Move xattr_resolve_name to the front of fs/xattr.c
  ecryptfs: Switch to generic xattr handlers
  sockfs: Get rid of getxattr iop
  sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
  kernfs: Switch to generic xattr handlers
  hfs: Switch to generic xattr handlers
  jffs2: Remove jffs2_{get,set,remove}xattr macros
  xattr: Remove unnecessary NULL attribute name check
2016-10-10 17:11:50 -07:00
Linus Torvalds abb5a14fa2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted misc bits and pieces.

  There are several single-topic branches left after this (rename2
  series from Miklos, current_time series from Deepa Dinamani, xattr
  series from Andreas, uaccess stuff from from me) and I'd prefer to
  send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
  proc: switch auxv to use of __mem_open()
  hpfs: support FIEMAP
  cifs: get rid of unused arguments of CIFSSMBWrite()
  posix_acl: uapi header split
  posix_acl: xattr representation cleanups
  fs/aio.c: eliminate redundant loads in put_aio_ring_file
  fs/internal.h: add const to ns_dentry_operations declaration
  compat: remove compat_printk()
  fs/buffer.c: make __getblk_slow() static
  proc: unsigned file descriptors
  fs/file: more unsigned file descriptors
  fs: compat: remove redundant check of nr_segs
  cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
  cifs: don't use memcpy() to copy struct iov_iter
  get rid of separate multipage fault-in primitives
  fs: Avoid premature clearing of capabilities
  fs: Give dentry to inode_change_ok() instead of inode
  fuse: Propagate dentry down to inode_change_ok()
  ceph: Propagate dentry down to inode_change_ok()
  xfs: Propagate dentry down to inode_change_ok()
  ...
2016-10-10 13:04:49 -07:00
Linus Torvalds 563873318d Merge branch 'printk-cleanups'
Merge my system logging cleanups, triggered by the broken '\n' patches.

The line continuation handling has been broken basically forever, and
the code to handle the system log records was both confusing and
dubious.  And it would do entirely the wrong thing unless you always had
a terminating newline, partly because it couldn't actually see whether a
message was marked KERN_CONT or not (but partly because the LOG_CONT
handling in the recording code was rather confusing too).

This re-introduces a real semantically meaningful KERN_CONT, and fixes
the few places I noticed where it was missing.  There are probably more
missing cases, since KERN_CONT hasn't actually had any semantic meaning
for at least four years (other than the checkpatch meaning of "no log
level necessary, this is a continuation line").

This also allows the combination of KERN_CONT and a log level.  In that
case the log level will be ignored if the merging with a previous line
is successful, but if a new record is needed, that new record will now
get the right log level.

That also means that you can at least in theory combine KERN_CONT with
the "pr_info()" style helpers, although any use of pr_fmt() prefixing
would make that just result in a mess, of course (the prefix would end
up in the middle of a continuing line).

* printk-cleanups:
  printk: make reading the kernel log flush pending lines
  printk: re-organize log_output() to be more legible
  printk: split out core logging code into helper function
  printk: reinstate KERN_CONT for printing continuation lines
2016-10-10 09:29:50 -07:00
Linus Torvalds 4bcc595ccd printk: reinstate KERN_CONT for printing continuation lines
Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space.  It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.

Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other.  Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.

To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:

  4749252776 ("printk: add KERN_CONT annotation").

That continuation marker was initially an empty string, and didn't
actuall make any semantic difference.  But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.

To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.

  5fd29d6ccb ("printk: clean up handling of log-levels and newlines")

and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.

Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.

You can see the beginning of that conversion in commits

  e11fea92e1 ("kmsg: export printk records to the /dev/kmsg interface")
  7ff9554bb5 ("printk: convert byte-buffer to variable-length record buffer")

with a number of follow-up commits to fix some painful fallout from that
conversion.  Over all, it took a couple of months to sort out most of
it.  But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.

And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks.  In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.

However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit

  61e99ab8e3 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

due to the incorrect belief that KERN_CONT wasn't meaningful.  The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.

This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.

But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.

For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.

Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel).  So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.

Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.

So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.

There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.

That said, we should strive to avoid the need for KERN_CONT.  It does
result in real problems for logging, and should generally not be seen as
a good feature.  If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.

But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's.  Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".

(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited.  Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09 12:23:38 -07:00