CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.
CURRENT_TIME is also not y2038 safe.
This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.
Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The logfs_block_ops structures are never modified, so declare them as
const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page. As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently. At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.
Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate. Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
There are few important bug fixes for LogFS
9f0bbd8 logfs: query block device for number of pages to send with bio
This BUG was found when LogFS was used on KVM. The patch fixes
the problem by asking for underlaying block device the number
of pages to send with each BIO.
41b93bc logfs: maintain the ordering of meta-inode destruction
LogFS maintains file system meta-data in special inodes. These
inodes are releated to each other, therefore they must be
destroyed in a proper order.
ddb24bb logfs: create a pagecache page if it is not present
cd8bfa9 logfs: initialize the number of iovecs in bio
LogFS used to panic when it was created on an encrypted LVM
volume. The patch fixes the problem by properly initializing
the BIO.
d2dcd90 logfs: destroy the reserved inodes while unmounting
Diffstat:
fs/logfs/dev_bdev.c | 15 ++++++++-------
fs/logfs/inode.c | 18 +-----------------
fs/logfs/journal.c | 2 +-
fs/logfs/readwrite.c | 1 +
fs/logfs/segment.c | 2 +-
5 files changed, 12 insertions(+), 26 deletions(-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQNmnnAAoJEDFA/f+3K+ZNhc0QAJWWtcvqCIU2UHKw0DaQieH8
X+YwoqdH5UBR0vylyoHCgCRSoRuttYEy47DYDhkpUdqad60atDhwIwhFaknlNZbt
+hv2ve6hZTzcs42HZhu+0WUzMkUs0s6Xjuu9JlZND59zk1wWBfttDlKI2/SdjMlm
GnW0nx5gLFZ1rmpaL8bdRpyLXxUvb9FmUc+YWsuinj8A2Lnqg5bNOmJZ3CiMYNVk
UvbHDYJmNaMZndbYeXZtxXtUo9Uk4HsU+7whpmgD+OPz7h+VMOgfXnwkpsgmtbY2
qTXAjyFVpN23zhBTbpCMXvpfbrffdBJBfkEW+2sXqpr9IHbMX0Y9cb/XJ3Ub1Qz+
HRLdBh0iAZewWVRsGOKG2OU1WUrTNzKUAJ795QFL0c0bZsODzQ+5OlcD7rBzEMpq
ioN49UtJWlmckWaott6PinSl/OKWlvz771Zayh9+ttuL3Dvt6coV7K3ns5MI6glN
M9DTMd8GAW2Kdz80EyAjZYWw2M3lbs0/GghJB4ozYg3mrDpBq1w5ouEvSNw8PuJw
yoUEe2xLGpLZdzQDouKMlrai8kohCyMoHKH1Kimx+iG4LZkYLy8VJepY1mfzQbEY
1zSB1nCV/c67qI3fxRdPNmJ7YtjTC9ero0qOouXwSJWwZ+OEikGXOKWnjRutcY1C
Jfbfg6gNqOrZZbYWp1ke
=eS/c
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream
Pull LogFS bugfixes from Prasad Joshi:
- "logfs: query block device for number of pages to send with bio"
This BUG was found when LogFS was used on KVM. The patch fixes
the problem by asking for underlaying block device the number
of pages to send with each BIO.
- "logfs: maintain the ordering of meta-inode destruction"
LogFS maintains file system meta-data in special inodes. These
inodes are releated to each other, therefore they must be
destroyed in a proper order.
- "logfs: initialize the number of iovecs in bio"
LogFS used to panic when it was created on an encrypted LVM
volume. The patch fixes the problem by properly initializing
the BIO.
Plus a couple more:
- logfs: create a pagecache page if it is not present
- logfs: destroy the reserved inodes while unmounting
* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
logfs: query block device for number of pages to send with bio
logfs: maintain the ordering of meta-inode destruction
logfs: create a pagecache page if it is not present
logfs: initialize the number of iovecs in bio
logfs: destroy the reserved inodes while unmounting
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
We were assuming that the evict_inode() would never be called on
reserved inodes. However, (after the commit 8e22c1a4e logfs: get rid
of magical inodes) while unmounting the file system, in put_super, we
call iput() on all of the reserved inodes.
The following simple test used to cause a kernel panic on LogFS:
1. Mount a LogFS file system on /mnt
2. Create a file
$ touch /mnt/a
3. Try to unmount the FS
$ umount /mnt
The simple fix would be to drop the assumption and properly destroy
the reserved inodes.
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Can be necessary if an inode gets deleted (through -ENOSPC) before being
written. Might be better to move this into logfs_write_rec(), but for
now go with the stupid&safe patch.
Signed-off-by: Joern Engel <joern@logfs.org>
LogFS sets PG_private flag to indicate a pined page. We assumed that
marking a page as private is enough to ensure its existence. But
instead it is necessary to hold a reference count to the page.
The change resolves the following BUG
BUG: Bad page state in process flush-253:16 pfn:6a6d0
page flags: 0x100000000000808(uptodate|private)
Suggested-and-Acked-by: Joern Engel <joern@logfs.org>
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This happens when __logfs_create() tries to write a new inode to the disk
which is full.
__logfs_create() associates the transaction pointer with inode. During
the logfs_write_inode() function call chain this transaction pointer is
moved from inode to page->private using function move_inode_to_page
(do_write_inode() -> inode_to_page() -> move_inode_to_page)
When the write inode fails, the transaction is aborted and iput is called
on the failed inode. During delete_inode the same transaction pointer
associated with the page is getting used. Thus causing kernel BUG.
The patch checks for error in write_inode() and restores the page->private
to NULL.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Truncate would do an almost limitless amount of work without invoking
the garbage collector in between. Split it up into more manageable,
though still large, chunks.
Signed-off-by: Joern Engel <joern@logfs.org>
It would probably be better to just accept NULL pointers in
mempool_destroy(). But for the current -rc series let's keep things
simple.
This patch was lost in the cracks for a while.
Kevin Cernekee <cernekee@gmail.com> had to rediscover the problem and
send a similar patch because of it. :(
Signed-off-by: Joern Engel <joern@logfs.org>
Removing sufficiently large files would create aliases for a large
number of segments. This in turn results in a large number of journal
entries and an overflow of s_je_array.
Cheap fix is to add a BUG_ON, turning memory corruption into something
annoying, but less dangerous. Real fix is to count the number of
affected segments and prevent the problem completely.
Signed-off-by: Joern Engel <joern@logfs.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
The comment was correct, so make the code match the comment. As the
new comment indicates, we might be able to do a little less work. But
for the current -rc series let's keep it simple and just fix the bug.
Signed-off-by: Joern Engel <joern@logfs.org>