linux/fs
Vladimir Davydov b2a209ffa6 Revert "kernfs: do not account ino_ida allocations to memcg"
Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
alloc_kmem_pages call) are accounted to memory cgroup automatically.
Callers have to explicitly opt out if they don't want/need accounting
for some reason.  Such a design decision leads to several problems:

 - kmalloc users are highly sensitive to failures, many of them
   implicitly rely on the fact that kmalloc never fails, while memcg
   makes failures quite plausible.

 - A lot of objects are shared among different containers by design.
   Accounting such objects to one of containers is just unfair.
   Moreover, it might lead to pinning a dead memcg along with its kmem
   caches, which aren't tiny, which might result in noticeable increase
   in memory consumption for no apparent reason in the long run.

 - There are tons of short-lived objects. Accounting them to memcg will
   only result in slight noise and won't change the overall picture, but
   we still have to pay accounting overhead.

For more info, see

 - http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
 - http://lkml.kernel.org/r/20151106090555.GK29259@esperanza

Therefore this patchset switches to the white list policy.  Now kmalloc
users have to explicitly opt in by passing __GFP_ACCOUNT flag.

Currently, the list of accounted objects is quite limited and only
includes those allocations that (1) are known to be easily triggered
from userspace and (2) can fail gracefully (for the full list see patch
no.  6) and it still misses many object types.  However, accounting only
those objects should be a satisfactory approximation of the behavior we
used to have for most sane workloads.

This patch (of 6):

Revert 499611ed45 ("kernfs: do not account ino_ida allocations
to memcg").

Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
fragile and difficult to maintain, because there seem to be many more
allocations that should not be accounted than those that should be.
Besides, false accounting an allocation might result in much worse
consequences than not accounting at all, namely increased memory
consumption due to pinned dead kmem caches.

So it was decided to switch to the white-list policy.  This patch reverts
bits introducing the black-list policy.  The white-list policy will be
introduced later in the series.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
..
9p Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
adfs adfs: constify adfs_dir_ops structures 2015-12-06 21:17:16 -05:00
affs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
afs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
autofs4 switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
befs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
bfs
btrfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
cachefiles convert a bunch of open-coded instances of memdup_user_nul() 2016-01-04 10:26:58 -05:00
ceph Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-11 13:32:10 -08:00
cifs Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 16:30:34 -08:00
coda switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
configfs Configfs changes for the 4.5 merge window: 2016-01-12 18:15:34 -08:00
cramfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
debugfs debugfs: fix refcount imbalance in start_creating 2015-11-11 02:04:44 -05:00
devpts devpts: if initialization failed, don't crash when opening /dev/ptmx 2015-06-30 19:44:58 -07:00
dlm convert a bunch of open-coded instances of memdup_user_nul() 2016-01-04 10:26:58 -05:00
ecryptfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
efivarfs
efs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
exofs Merge branch 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-11 13:13:23 -08:00
exportfs
ext2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
ext4 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
f2fs Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs 2016-01-13 21:01:44 -08:00
fat fat: fix fake_offset handling on error path 2015-11-20 16:17:32 -08:00
freevxfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
fscache FS-Cache: Handle a write to the page immediately beyond the EOF marker 2015-11-11 02:11:02 -05:00
fuse Merge branch 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-11 13:13:23 -08:00
gfs2 GFS2: merge window 2016-01-12 18:09:35 -08:00
hfs HFS wants 8Kb per-superblock allocation; just use kmalloc() 2016-01-04 10:29:34 -05:00
hfsplus Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-11 13:32:10 -08:00
hostfs Merge branch 'for-linus-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2016-01-12 13:27:18 -08:00
hpfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
hugetlbfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
isofs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
jbd2 fs: use block_device name vsprintf helper 2016-01-06 13:03:18 -05:00
jffs2 MTD updates for v4.5: 2016-01-13 11:25:54 -08:00
jfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
kernfs Revert "kernfs: do not account ino_ida allocations to memcg" 2016-01-14 16:00:49 -08:00
lockd Mainly smaller bugfixes and cleanup. We're still finding some bugs from 2015-11-11 20:11:28 -08:00
logfs logfs: fix logfs build errors and dependencies 2016-01-14 16:00:49 -08:00
minix Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
ncpfs switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
nfs Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 16:30:34 -08:00
nfs_common lockd: NLM grace period shouldn't block NFSv4 opens 2015-08-13 10:22:06 -04:00
nfsd Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
nilfs2 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
nls
notify fsnotify: destroy marks with call_srcu instead of dedicated thread 2016-01-14 16:00:49 -08:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 ocfs2/dlm: cleanup redunant lksb flags in dlmcommon.h 2016-01-14 16:00:49 -08:00
omfs
openpromfs
overlayfs switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
proc Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
pstore pstore: fix code comment to match code 2015-11-02 13:41:52 -08:00
qnx4 don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
qnx6 don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
quota Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-09-05 20:34:28 -07:00
ramfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
reiserfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
romfs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
squashfs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
sysfs platform/chrome: Branch for v4.4 2015-11-13 21:53:18 -08:00
sysv switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs This pull request contains three changes, two cleanups and one UBI 2016-01-12 18:20:20 -08:00
udf don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
ufs don't put symlink bodies in pagecache into highmem 2015-12-08 22:41:36 -05:00
xfs xfs: updates for 4.5-rc1 2016-01-13 21:15:18 -08:00
Kconfig File locking related changes for v4.5 (pile #1) 2016-01-12 15:46:17 -08:00
Kconfig.binfmt
Makefile ext4: promote ext4 over ext2 in the default probe order 2015-10-15 10:33:21 -04:00
aio.c mm: move ->mremap() from file_operations to vm_operations_struct 2015-09-04 16:54:41 -07:00
anon_inodes.c
attr.c
bad_inode.c fs/bad_inode.c: is_bad_inode can be boolean 2015-12-06 21:17:14 -05:00
binfmt_aout.c
binfmt_elf.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-11 09:45:24 -08:00
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c libnvdimm for 4.5 2016-01-13 19:15:14 -08:00
buffer.c fs: use block_device name vsprintf helper 2016-01-06 13:03:18 -05:00
char_dev.c fs/char_dev.c: fix incorrect documentation for unregister_chrdev_region 2015-08-05 13:49:35 -07:00
compat.c saner calling conventions for copy_mount_options() 2016-01-04 10:28:32 -05:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
coredump.c coredump: Use 64bit time for unix time of coredump 2015-12-06 21:17:17 -05:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
dcookies.c
direct-io.c fix the regression from "direct-io: Fix negative return from dio read beyond eof" 2015-12-08 15:02:42 -05:00
drop_caches.c inode: convert inode_sb_list_lock to per-sb 2015-08-17 18:39:46 -04:00
eventfd.c
eventpoll.c
exec.c don't carry MAY_OPEN in op->acc_mode 2016-01-04 10:28:40 -05:00
fcntl.c fcntl: allow to set O_DIRECT flag on pipe 2016-01-09 02:55:37 -05:00
fhandle.c
file.c fs/file.c: __const_max is actually __const_min :-) 2015-12-06 21:17:11 -05:00
file_table.c fs, file table: reinit files_stat.max_files after deferred memory initialisation 2015-08-07 04:39:40 +03:00
filesystems.c
fs-writeback.c fs: fix writeback.c kernel-doc warnings 2015-11-11 02:18:27 -05:00
fs_pin.c
fs_struct.c
inode.c File locking related changes for v4.5 (pile #1) 2016-01-12 15:46:17 -08:00
internal.h Merge branch 'for-linus' into work.misc 2016-01-08 21:20:11 -05:00
ioctl.c Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 16:30:34 -08:00
libfs.c switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
locks.c Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 16:30:34 -08:00
mbcache.c
mount.h fs: use seq_open_private() for proc_mounts 2015-06-30 19:44:56 -07:00
mpage.c mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
namei.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
namespace.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
no-block.c
nsfs.c fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void 2015-09-11 15:21:34 -07:00
open.c don't carry MAY_OPEN in op->acc_mode 2016-01-04 10:28:40 -05:00
pipe.c fs/pipe.c: return error code rather than 0 in pipe_write() 2015-11-11 02:18:26 -05:00
pnode.c
pnode.h mnt: Clarify and correct the disconnect logic in umount_tree 2015-07-22 20:33:27 -05:00
posix_acl.c xattr handlers: Simplify list operation 2015-12-13 19:46:12 -05:00
proc_namespace.c vfs: show_vfsstat: remove redundant initialization and check of error code 2015-12-06 21:17:16 -05:00
read_write.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
readdir.c
select.c poll: plug an unused argument to do_poll 2016-01-06 08:26:52 -05:00
seq_file.c fs, seqfile: always allow oom killer 2015-11-06 17:50:42 -08:00
signalfd.c signalfd: fix information leak in signalfd_copyinfo 2015-08-07 04:39:40 +03:00
splice.c fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE 2016-01-09 02:55:35 -05:00
stack.c
stat.c fs/stat.c: remove unnecessary new_valid_dev() check 2015-11-09 15:11:24 -08:00
statfs.c
super.c fs: use block_device name vsprintf helper 2016-01-06 13:03:18 -05:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c
userfaultfd.c userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" 2015-09-22 15:09:53 -07:00
utimes.c
xattr.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00