Orangefs doesn't do buffered writes yet, so there's no point in
initiating and waiting for writeback.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
remove unused get_fsid_from_ino
fix bounds check for listxattr
clean up oversize xattr validation
do not set getattr_time on orangefs_lookup
return from orangefs_devreq_read quickly if possible
do not wait for timeout if umounting
handle zero size write in debugfs
Bug fixes:
do not check possibly stale size on truncate
ensure the userspace component is unmounted if mount fails
total reimplementation of dir.c
New feature:
implement statx
The new implementation of dir.c is kind of a big deal, all new
code. It has been posted to fs-devel during the previous rc period,
we didn't get much review or feedback from there, but it has been reviewed
very heavily here, so much so that we have two entire versions of the
reimplementation. Not only does the new implementation fix some
xfstests, but it passes all the new tests we made here that involve
seeking and rewinding and giant directories and long file names.
The new dir code has three patches itself:
skip forward to the next directory entry if seek is short
invalidate stored directory on seek
count directory pieces correctly
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZDLasAAoJEM9EDqnrzg2+yR4P/jNsryNfQush5V/6EO+wpQ7p
O0epuLG42QMN67wdsQDVOOzcRQq2IAoYrgupZfEvCVsoBiYxdCTTwhN/55UctMBA
xnakv8BarrLd6pqSJOlQviP7ByXdEvy7dtYYuAEtdRnPtTZEmjDH0k9ME759+DVm
pPQ6fanPzSZuG8fdjI4QrKiFfpE5slMeyMV9SmzIq81S1i+t2b9sDYKTiP3Jt14y
KTweGdJXTRT+Piy27d80HN9ExlFXlcyru9GDWNhZi4EHlax7bq76Qwu1XKyaOg0h
MN40+18k+Zqrpj1/tq4aj3YM0P3HjpRhtb5TqOC+QhZDIL1gJ8bv8rv61snWTak+
6cXtwvIh7r4aEU+gkMLP29HXCVlGg3V4up+DdbHJVbIEXV8C5csJBP+sQUlU7A5D
WoPmheV7CJ8nicwkxYm31dhdnW7mOwW/J4uUlM9w/yU/dVfoz1SK8AtKjy0xX87c
Jpo7nuJEDprI+9neT0y5U+RHVqH08+cA5DCrdk0x8JaJIrjOZpvTROIPrtzlS7QL
aTu+W/ISXtFwnM+ERmw8TKPD7TTUXypydYhzXe8V6itDpiNp1kQFGmLGzLhAMElH
iGQkFatR6LSKh+DxUD3PREQGNyQCKpgPiqLoGYprzQ829tqLpThumfZic9lX1C/+
we5VEpRbiz6BjN110DBJ
=NGTt
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
"Orangefs cleanups, fixes and statx support.
Some cleanups:
- remove unused get_fsid_from_ino
- fix bounds check for listxattr
- clean up oversize xattr validation
- do not set getattr_time on orangefs_lookup
- return from orangefs_devreq_read quickly if possible
- do not wait for timeout if umounting
- handle zero size write in debugfs
Bug fixes:
- do not check possibly stale size on truncate
- ensure the userspace component is unmounted if mount fails
- total reimplementation of dir.c
New feature:
- implement statx
The new implementation of dir.c is kind of a big deal, all new code.
It has been posted to fs-devel during the previous rc period, we
didn't get much review or feedback from there, but it has been
reviewed very heavily here, so much so that we have two entire
versions of the reimplementation.
Not only does the new implementation fix some xfstests, but it passes
all the new tests we made here that involve seeking and rewinding and
giant directories and long file names. The new dir code has three
patches itself:
- skip forward to the next directory entry if seek is short
- invalidate stored directory on seek
- count directory pieces correctly"
* tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: count directory pieces correctly
orangefs: invalidate stored directory on seek
orangefs: skip forward to the next directory entry if seek is short
orangefs: handle zero size write in debugfs
orangefs: do not wait for timeout if umounting
orangefs: return from orangefs_devreq_read quickly if possible
orangefs: ensure the userspace component is unmounted if mount fails
orangefs: do not check possibly stale size on truncate
orangefs: implement statx
orangefs: remove ORANGEFS_READDIR macros
orangefs: support very large directories
orangefs: support llseek on directories
orangefs: rewrite readdir to fix several bugs
orangefs: do not set getattr_time on orangefs_lookup
orangefs: clean up oversize xattr validation
orangefs: fix bounds check for listxattr
orangefs: remove unused get_fsid_from_ino
Fortunately OrangeFS has had a getattr request mask for a long time.
The server basically has two difficulty levels for attributes. Fetching
any attribute except size requires communicating with the metadata
server for that handle. Since all the attributes are right there, it
makes sense to return them all. Fetching the size requires
communicating with every I/O server (that the file is distributed
across). Therefore if asked for anything except size, get everything
except size, and if asked for size, get everything.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Replace wrong use of file->f_path.dentry->d_inode with file_inode(file).
In case orangefs ever finds itself as an overelayfs layer, it would want
to get its own inode and not overlayfs's inode.
DISCLAIMER: I did not test this patch because I do not know how to setup
an orangefs mount
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.
There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...
Pull in an OrangeFS branch containing miscellaneous improvements.
- clean up debugfs globals
- remove dead code in sysfs
- reorganize duplicated sysfs attribute structs
- consolidate sysfs show and store functions
- remove duplicated sysfs_ops structures
- describe organization of sysfs
- make devreq_mutex static
- g_orangefs_stats -> orangefs_stats for consistency
- rename most remaining global variables
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.
CURRENT_TIME is also not y2038 safe.
This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.
Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This has been dormant code for many years. Parts of it were removed from
the OrangeFS kernel code when it went into mainline. These bits were missed.
Now the readahead cache has been resurrected in the OrangeFS userspace
portions. It was renamed there, since it doesn't really have anything to do
with mmap specifically, so it will be renamed here.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Mike,
On Fri, Jun 3, 2016 at 9:44 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> We use the return value in this one line you changed, our userspace code gets
> ill when we send it (-ENOMEM +1) as a key length...
ah, my mistake. Here's a fixed version.
Thanks,
Andreas
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
The ORANGEFS_XATTR_INDEX_ defines are unused; the ORANGEFS_XATTR_NAME_
defines only obfuscate the code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Error should only be returned if nothing had been read/written.
Otherwise we need to report a short read/write instead.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
a) open files can't have NULL inodes
b) it's SEEK_END, not ORANGEFS_SEEK_END; no need to get cute.
c) make_bad_inode() on lseek()?
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
just have it return the slot number or -E... - the caller checks
the sign anyway
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
This is motivated by orangefs_inode_old_getattr's habit of writing over
live inodes.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
I have verified that there is nothing in the userspace daemon version we
are implementing this protocol against that ever looks at this field.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Make cancels reuse the aborted read/write op, to make sure they do not
fail on lack of memory.
Don't issue a cancel unless the daemon has seen our read/write, has not
replied and isn't being shut down.
If cancel *is* issued, don't wait for it to complete; stash the slot
in there and just have it freed when cancel is finally replied to or
purged (and delay dropping the reference until then, obviously).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
it's always equal to __orangefs_bufmap and the latter can't change
until we are done
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Previously, it would update a live inode. This was fixed, but it did not
ever check that the inode attributes in the dcache are correct. This
checks all inode attributes and rejects any that are not correct, which
causes a lookup and thus a new getattr.
Perhaps inode_operations->permission should replace or augment some of
this.
There is no actual caching, and this does a rather excessive amount of
network operations back to the filesystem server.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
* create with refcount 1
* make op_release() decrement and free if zero (i.e. old put_op()
has become that).
* mark when submitter has given up waiting; from that point nobody
else can move between the lists, change state, etc.
* have daemon read/write_iter grab a reference when picking op
and *always* give it up in the end
* don't put into hash until we know it's been successfully passed to
daemon
* move op->lock _lower_ than htab_in_progress_lock (and make sure
to take it in purge_inprogress_ops())
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
OrangeFS was formerly known as PVFS2 and retains the name in many places.
I leave the device /dev/pvfs2-req since this affects userspace.
I leave the filesystem type pvfs2 since this affects userspace. Further
the OrangeFS sysint library reads fstab for an entry of type pvfs2
independently of kernel mounts.
I leave extended attribute keys user.pvfs2 and system.pvfs2 as the
sysint library understands these.
I leave references to userspace binaries still named pvfs2.
I leave the filenames.
Signed-off-by: Yi Liu <yi9@clemson.edu>
[martin@omnibond.com: clairify above constraints and merge]
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
On Wed, Nov 11, 2015 at 10:19:48AM +0000, Al Viro wrote:
> I'll cook the minimal fixup for API change after I get some sleep and
> send it your way, unless somebody gets there first...
This should do it - switches ->ioctl() to pvfs2_inode_[gs]etxattr() and
converts xattr_handler ->[gs]et() to new API.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
... and make the only caller use page-backed iov_iter,
getting rid of kmap/kunmap *and* of the bug with
attempted use of iovec-backed copy_page_to_iter()
on a kernel pointer.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
no need to build a copy of what the caller already has;
what's more, we want the one given to caller properly
advanced *and* we shouldn't depend upon it being an
iovec-backed one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
copy_page_{to,from}_iter() advances it just fine *and* it has no
problem with partially consumed segments.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
incidentally, insane or compromised server returning *more* than
requested on read should not oops the kernel - initialize the
iov_iter for read according to the iovec we've got. That's why
pvfs_bufmap_copy_to_iovec() needed a separate size argument - we
shouldn't abuse iov_iter_count(iter) for passing that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>