This patch adds a new mount parameter 'ecryptfs_mount_auth_tok_only' to
force ecryptfs to use only authentication tokens which signature has
been specified at mount time with parameters 'ecryptfs_sig' and
'ecryptfs_fnek_sig'. In this way, after disabling the passthrough and
the encrypted view modes, it's possible to make available to users only
files encrypted with the specified authentication token.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: James Morris <jmorris@namei.org>
[Tyler: Clean up coding style errors found by checkpatch]
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
This patch replaces the check of the 'matching_auth_tok' pointer with
the exit status of ecryptfs_find_auth_tok_for_sig().
This avoids to use authentication tokens obtained through the function
ecryptfs_keyring_auth_tok_for_sig which are not valid.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
This patch allows keys requested in the function
ecryptfs_keyring_auth_tok_for_sig()to be released when they are no
longer required. In particular keys are directly released in the same
function if the obtained authentication token is not valid.
Further, a new function parameter 'auth_tok_key' has been added to
ecryptfs_find_auth_tok_for_sig() in order to provide callers the key
pointer to be passed to key_put().
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: James Morris <jmorris@namei.org>
[Tyler: Initialize auth_tok_key to NULL in ecryptfs_parse_packet_set]
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
eCryptfs was passing the LOOKUP_OPEN flag through to the lower file
system, even though ecryptfs_create() doesn't support the flag. A valid
filp for the lower filesystem could be returned in the nameidata if the
lower file system's create() function supported LOOKUP_OPEN, possibly
resulting in unencrypted writes to the lower file.
However, this is only a potential problem in filesystems (FUSE, NFS,
CIFS, CEPH, 9p) that eCryptfs isn't known to support today.
https://bugs.launchpad.net/ecryptfs/+bug/641703
Reported-by: Kevin Buhr
Cc: stable <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Ecryptfs is a stackable filesystem which relies on lower filesystems the
ability of setting/getting extended attributes.
If there is a security module enabled on the system it updates the
'security' field of inodes according to the owned extended attribute set
with the function vfs_setxattr(). When this function is performed on a
ecryptfs filesystem the 'security' field is not updated for the lower
filesystem since the call security_inode_post_setxattr() is missing for
the lower inode.
Further, the call security_inode_setxattr() is missing for the lower inode,
leading to policy violations in the security module because specific
checks for this hook are not performed (i. e. filesystem
'associate' permission on SELinux is not checked for the lower filesystem).
This patch replaces the call of the setxattr() method of the lower inode
in the function ecryptfs_setxattr() with vfs_setxattr().
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: stable <stable@kernel.org>
Cc: Dustin Kirkland <kirkland@canonical.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
When btrfs is mounted in degraded mode, it has some internal structures
to track the missing devices. This missing device is setup as readonly,
but the mapping code can get upset when we try to write to it.
This changes the mapping code to return -EIO instead of oops when we try
to write to the readonly device.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch reduces the CPU time spent in the extent buffer search by using the
radix tree instead of the rbtree and using the rcu lock instead of the spin
lock.
I did a quick test by the benchmark tool[1] and found the patch improve the
file creation/deletion performance problem that I have reported[2].
Before applying this patch:
Create files:
Total files: 50000
Total time: 0.971531
Average time: 0.000019
Delete files:
Total files: 50000
Total time: 1.366761
Average time: 0.000027
After applying this patch:
Create files:
Total files: 50000
Total time: 0.927455
Average time: 0.000019
Delete files:
Total files: 50000
Total time: 1.292280
Average time: 0.000026
[1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
[2] http://marc.info/?l=linux-btrfs&m=128212635122920&w=2
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
restructure try_release_extent_buffer() and write a function to release the
extent buffer. It will be used later.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We have a fairly complex set of loops around walking our list of
delalloc inodes when we find metadata delalloc space running low.
It doesn't work very well, can use large amounts of CPU and doesn't
do very efficient writeback.
This switches us to kick the bdi flusher threads instead. All dirty
data in btrfs is accounted as delalloc data, so this is very similar
in terms of what it writes, but we're able to just kick off the IO
and wait for progress.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
An earlier commit tried to keep us from allocating too many
empty metadata chunks. It was somewhat too restrictive and could
lead to ENOSPC errors on empty filesystems.
This increases the limits to about 5% of the FS size, allowing more
metadata chunks to be preallocated.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When btrfs is running low on metadata space, it needs to force delayed
allocation pages to disk. It currently does this with a suboptimal walk
of a private list of inodes with delayed allocation, and it would be
much better if we used the generic flusher threads.
writeback_inodes_sb_if_idle would be ideal, but it waits for the flusher
thread to start IO on all the dirty pages in the FS before it returns.
This adds variants of writeback_inodes_sb* that allow the caller to
control how many pages get sent down.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When btrfs discovers the generation number in a btree block is
incorrect, it can loop forever without forcing the RAID
code to try a valid mirror, and without returning EIO.
This changes things to properly kick out the EIO.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If you mount -o space_cache, the option will be persistent across mounts, but to
make sure the user knows that they did this, emit a message telling them if they
didn't mount with -o space_cache but the feature is still used.
Signed-off-by: Josef Bacik <josef@redhat.com>
If something goes wrong with the free space cache we need a way to make sure
it's not loaded on mount and that it's cleared for everybody. When you pass the
clear_cache option it will make it so all block groups are setup to be cleared,
which keeps them from being loaded and then they will be truncated when the
transaction is committed. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
There are just a few things that need to be fixed in the kernel to support mixed
data+metadata block groups. Mostly we just need to make sure that if we are
using mixed block groups that we continue to allocate mixed block groups as we
need them. Also we need to make sure __find_space_info will find our space info
if we search for DATA or METADATA only. Tested this with xfstests and it works
nicely. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
With the free space disk caching we can mark the block group as started with the
caching, but we don't have a caching ctl. This can race with anybody else who
tries to get the caching ctl before we cache (this is very hard to do btw). So
instead check to see if cache->caching_ctl is set, and if not return NULL.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
This patch actually loads the free space cache if it exists. The only thing
that really changes here is that we need to cache the block group if we're going
to remove an extent from it. Previously we did not do this since the caching
kthread would pick it up. With the on disk cache we don't have this luxury so
we need to make sure we read the on disk cache in first, and then remove the
extent, that way when the extent is unpinned the free space is added to the
block group. This has been tested with all sorts of things.
Signed-off-by: Josef Bacik <josef@redhat.com>
This is a simple bit, just dump the free space cache out to our preallocated
inode when we're writing out dirty block groups. There are a bunch of changes
in inode.c in order to account for special cases. Mostly when we're doing the
writeout we're holding trans_mutex, so we need to use the nolock transacation
functions. Also we can't do asynchronous completions since the async thread
could be blocked on already completed IO waiting for the transaction lock. This
has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC
tests. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
a) switch ->put_device() to logfs_super *
b) actually call it on early failures in logfs_get_sb_device()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
take setting s_bdev/s_mtd/s_devops to callers of logfs_get_sb_device(),
don't bother passing them separately
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
eventual replacement for ->get_sb() - does *not* get vfsmount,
return ERR_PTR(error) or root of subtree to be mounted.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nameidata_to_filp() drops nd->path or transfers it to opened
file. In the former case it's a Bad Idea(tm) to do mnt_drop_write()
on nd->path.mnt, since we might race with umount and vfsmount in
question might be gone already.
Fix: don't drop it, then... IOW, have nameidata_to_filp() grab nd->path
in case it transfers it to file and do path_drop() in callers. After
they are through with accessing nd->path...
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Removed following fields from smb session structure
cryptkey, ntlmv2_hash, tilen, tiblob
and ntlmssp_auth structure is allocated dynamically only if the auth mech
in NTLMSSP.
response field within a session_key structure is used to initially store the
target info (either plucked from type 2 challenge packet in case of NTLMSSP
or fabricated in case of NTLMv2 without extended security) and then to store
Message Authentication Key (mak) (session key + client response).
Server challenge or cryptkey needed during a NTLMSSP authentication
is now part of ntlmssp_auth structure which gets allocated and freed
once authenticaiton process is done.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Need to have cryptkey or server challenge in smb connection
(struct TCP_Server_Info) for ntlm and ntlmv2 auth types for which
cryptkey (Encryption Key) is supplied just once in Negotiate Protocol
response during an smb connection setup for all the smb sessions over
that smb connection.
For ntlmssp, cryptkey or server challenge is provided for every
smb session in type 2 packet of ntlmssp negotiation, the cryptkey
provided during Negotiation Protocol response before smb connection
does not count.
Rename cryptKey to cryptkey and related changes.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs4: The difference of 2 pointers is ptrdiff_t
nfs: testing the wrong variable
nfs: handle lock context allocation failures in nfs_create_request
Fixed Regression in NFS Direct I/O path
We need to make check if a page does not have buffes by checking
page_has_buffers(page) before calling page_buffers(page) in
ext4_writepage(). Otherwise page_buffers() could throw a BUG_ON.
Thanks also to Markus Trippelsdorf and Avinash Kurup who also reported
the problem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Sedat Dilek <sedat.dilek@googlemail.com>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
fs/notify/fanotify/fanotify_user.c: In function 'fanotify_release':
fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 'lre'
fs/notify/fanotify/fanotify_user.c:375: warning: unused variable 're'
this is really ugly.
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
If fanotify sets a new bit in the ignored mask it will cause the generic
fsnotify layer to recalculate the real mask. This is stupid since we
didn't change that part.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify has a very limited number of events it sends on directories. The
usefulness of these events is yet to be seen and still we send them. This
is particularly painful for mount marks where one might receive many of
these useless events. As such this patch will drop events on IS_DIR()
inodes unless they were explictly requested with FAN_ON_DIR.
This means that a mark on a directory without FAN_EVENT_ON_CHILD or
FAN_ON_DIR is meaningless and will result in no events ever (although it
will still be allowed since detecting it is hard)
Signed-off-by: Eric Paris <eparis@redhat.com>
The _IN_ in the naming is reserved for flags only used by inotify. Since I
am about to use this flag for fanotify rename it to be generic like the
rest.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify_should_send_event has a test to see if an object is a file or
directory and does not send an event otherwise. The problem is that the
test is actually checking if the object with a mark is a file or directory,
not if the object the event happened on is a file or directory. We should
check the latter.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify currently has no limit on the number of listeners a given user can
have open. This patch limits the total number of listeners per user to
128. This is the same as the inotify default limit.
Signed-off-by: Eric Paris <eparis@redhat.com>
Some fanotify groups, especially those like AV scanners, will need to place
lots of marks, particularly ignore marks. Since ignore marks do not pin
inodes in cache and are cleared if the inode is removed from core (usually
under memory pressure) we expose an interface for listeners, with
CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to
set and 'unlimited' number of marks. Programs which make use of this
feature will be able to OOM a machine.
Signed-off-by: Eric Paris <eparis@redhat.com>
There is currently no limit on the number of marks a given fanotify group
can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as
a serious DoS threat. This patch implements a default of 8192, the same as
inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating
the default DoS'able status.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify has a defualt max queue depth. This patch allows processes which
explicitly request it to have an 'unlimited' queue depth. These processes
need to be very careful to make sure they cannot fall far enough behind
that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently fanotify has no maximum queue depth. Since fanotify is
CAP_SYS_ADMIN only this does not pose a normal user DoS issue, but it
certianly is possible that an fanotify listener which can't keep up could
OOM the box. This patch implements a default 16k depth. This is the same
default depth used by inotify, but given fanotify's better queue merging in
many situations this queue will contain many additional useful events by
comparison.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify will clear ignore marks if a task changes the contents of an
inode. The problem is with the races around when userspace finishes
checking a file and when that result is actually attached to the inode.
This race was described as such:
Consider the following scenario with hostile processes A and B, and
victim process C:
1. Process A opens new file for writing. File check request is generated.
2. File check is performed in userspace. Check result is "file has no malware".
3. The "permit" response is delivered to kernel space.
4. File ignored mark set.
5. Process A writes dummy bytes to the file. File ignored flags are cleared.
6. Process B opens the same file for reading. File check request is generated.
7. File check is performed in userspace. Check result is "file has no malware".
8. Process A writes malware bytes to the file. There is no cached response yet.
9. The "permit" response is delivered to kernel space and is cached in fanotify.
10. File ignored mark set.
11. Now any process C will be permitted to open the malware file.
There is a race between steps 8 and 10
While fanotify makes no strong guarantees about systems with hostile
processes there is no reason we cannot harden against this race. We do
that by simply ignoring any ignore marks if the inode has open writers (aka
i_writecount > 0). (We actually do not ignore ignore marks if the
FAN_MARK_SURV_MODIFY flag is set)
Reported-by: Vasily Novikov <vasily.novikov@kaspersky.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify perm events do not call fsnotify parent. That means you cannot
register a perm event on a directory and enforce permissions on all inodes in
that directory. This patch fixes that situation.
Signed-off-by: Eric Paris <eparis@redhat.com>
When fsnotify groups return errors they are ignored. For permissions
events these should be passed back up the stack, but for most events these
should continue to be ignored.
Signed-off-by: Eric Paris <eparis@redhat.com>
The fanotify listeners needs to be able to specify what types of operations
they are going to perform so they can be ordered appropriately between other
listeners doing other types of operations. They need this to be able to make
sure that things like hierarchichal storage managers will get access to inodes
before processes which need the data. This patch defines 3 possible uses
which groups must indicate in the fanotify_init() flags.
FAN_CLASS_PRE_CONTENT
FAN_CLASS_CONTENT
FAN_CLASS_NOTIF
Groups will receive notification in that order. The order between 2 groups in
the same class is undeterministic.
FAN_CLASS_PRE_CONTENT is intended to be used by listeners which need access to
the inode before they are certain that the inode contains it's final data. A
hierarchical storage manager should choose to use this class.
FAN_CLASS_CONTENT is intended to be used by listeners which need access to the
inode after it contains its intended contents. This would be the appropriate
level for an AV solution or document control system.
FAN_CLASS_NOTIF is intended for normal async notification about access, much the
same as inotify and dnotify. Syncronous permissions events are not permitted
at this class.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify needs to be able to specify that some groups get events before
others. They use this idea to make sure that a hierarchical storage
manager gets access to files before programs which actually use them. This
is purely infrastructure. Everything will have a priority of 0, but the
infrastructure will exist for it to be non-zero.
Signed-off-by: Eric Paris <eparis@redhat.com>
We disabled the ability to build fanotify in commit 7c5347733d.
This reverts that commit and allows people to build fanotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
In order to save free space cache, we need an inode to hold the data, and we
need a special item to point at the right inode for the right block group. So
first, create a special item that will point to the right inode, and the number
of extent entries we will have and the number of bitmaps we will have. We
truncate and pre-allocate space everytime to make sure it's uptodate.
This feature will be turned on as soon as you mount with -o space_cache, however
it is safe to boot into old kernels, they will just generate the cache the old
fashion way. When you boot back into a newer kernel we will notice that we
modified and not the cache and automatically discard the cache.
Signed-off-by: Josef Bacik <josef@redhat.com>
On m68k, which is 32-bit:
fs/nfs/nfs4proc.c: In function ‘nfs41_sequence_done’:
fs/nfs/nfs4proc.c:432: warning: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘int’
fs/nfs/nfs4proc.c: In function ‘nfs4_setup_sequence’:
fs/nfs/nfs4proc.c:576: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘int’
On 32-bit, ptrdiff_t is int; on 64-bit, ptrdiff_t is long.
Introduced by commit dfb4f30983 ("NFSv4.1: keep
seq_res.sr_slot as pointer rather than an index")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This merges the staging-next tree to Linus's tree and resolves
some conflicts that were present due to changes in other trees that were
affected by files here.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The fourth argument should be unsigned. Also add missing include
so that the function prototype is defined in xattr_id.c
This fixes a couple of sparse warnings.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
hfsplus: free space correcly for files unlinked while open
hfsplus: fix double lock typo in ioctl
Commit 5dabfc78dc ("ext4: rename {exit,init}_ext4_*() to
ext4_{exit,init}_*()") causes
fs/ext4/super.c:4776: error: implicit declaration of function ‘ext4_init_xattr’
when CONFIG_EXT4_FS_XATTR is disabled.
It renamed init_ext4_xattr to ext4_init_xattr but forgot to update the
dummy definition in fs/ext4/xattr.h.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The intent was to test "*desc" for allocation failures, but it tests
"desc" which is always a valid pointer here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_get_lock_context can return NULL on an allocation failure.
Regression introduced by commit f11ac8db.
Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A typo, introduced by commit f11ac8db, in the nfs_direct_write()
routine causes writes with O_DIRECT set to fail with a ENOMEM error.
Found-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
SYNOPSIS
size[4] Tfsync tag[2] fid[4] datasync[4]
size[4] Rfsync tag[2]
DESCRIPTION
The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other permanent storage
device) where that file resides.
If datasync flag is specified data will be fleshed but does not flush
modified metadata unless that metadata is needed in order to allow a
subsequent data retrieval to be correctly handled.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We need to do O_LARGEFILE check even in case of 9p. Use the
generic_file_open helper
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Make sure we drop inode reference in the error path
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
A create without LOOKUP_OPEN flag set is due to mknod of regular
files. Use mknod 9P operation for the same
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] TReadlink tag[2] fid[4]
size[4] RReadlink tag[2] target[s]
Description
Readlink is used to return the contents of the symoblic link
referred by fid. Contents of symboic link is returned as a
response.
target[s] - Contents of the symbolic link referred by fid.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Use V9FS_MAGIC as the file system type while filling kernel statfs
strucutre instead of using host file system magic number. Also move
the definition of V9FS_MAGIC from v9fs.h to standard magic.h file.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] TGetlock tag[2] fid[4] getlock[n]
size[4] RGetlock tag[2] getlock[n]
Description
TGetlock is used to test for the existence of byte range posix locks on a file
identified by given fid. The reply contains getlock structure. If the lock could
be placed it returns F_UNLCK in type field of getlock structure. Otherwise it
returns the details of the conflicting locks in the getlock structure
getlock structure:
type[1] - Type of lock: F_RDLCK, F_WRLCK
start[8] - Starting offset for lock
length[8] - Number of bytes to check for the lock
If length is 0, check for lock in all bytes starting at the location
'start' through to the end of file
pid[4] - PID of the process that wants to take lock/owns the task
in case of reply
client[4] - Client id of the system that owns the process which
has the conflicting lock
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Synopsis
size[4] TLock tag[2] fid[4] flock[n]
size[4] RLock tag[2] status[1]
Description
Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request
flock structure:
type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
flags[4] - Flags could be either of
P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
conflicting lock exists, wait for that lock to be released.
P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
trying to reclaim a lock after a server restrart (due to crash)
start[8] - Starting offset for lock
length[8] - Number of bytes to lock
If length is 0, lock all bytes starting at the location 'start'
through to the end of file
pid[4] - PID of the process that wants to take lock
client_id[4] - Unique client id
status[1] - Status of the lock request, can be
P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
P9_LOCK_GRACE(3)
P9_LOCK_SUCCESS - Request was successful
P9_LOCK_BLOCKED - A conflicting lock is held by another process
P9_LOCK_ERROR - Error while processing the lock request
P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
requests in this period (except locks with
P9_LOCK_FLAGS_RECLAIM flag set)
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
SYNOPSIS
size[4] Tfsync tag[2] fid[4]
size[4] Rfsync tag[2]
DESCRIPTION
The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other permanent storage
device) where that file resides.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We need update the acl value on chmod
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch also update mode bits, as a normal file system.
I am not sure wether we should do that, considering that
a setxattr on the server will again update the ACL/mode value
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch implement fetching POSIX ACL from the server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The ACL value is fetched as a part of inode initialization
from the server and the permission checking function use the
cached value of the ACL
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
the same calculation is done in p9_client_write
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The presence of v9fs_direct_IO() in the address space ops vector
allowes open() O_DIRECT flags which would have failed otherwise.
In the non-cached mode, we shunt off direct read and write requests before
the VFS gets them, so this method should never be called.
Direct IO is not 'yet' supported in the cached mode. Hence when
this routine is called through generic_file_aio_read(), the read/write fails
with an error.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The current implementation of 9p client mkdir function does not
set the S_ISGID mode bit for the directory being created if the
parent directory has this bit set. This patch fixes this problem
so that the newly created directory inherits the gid from parent
directory and not from the process creating this directory, when
the S_ISGID bit is set in parent directory.
Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
A patch was accepted recently for sending correct buffer size to p9stat_read.
We need a similar patch in v9fs_dir_readdir_dotl to send correct end of buffer
to p9dirent_read.
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>