Commit Graph

248 Commits

Author SHA1 Message Date
Yan, Zheng e536030934 ceph: fix wake_up_session_cb()
We should reset i_requested_max_size before waking the waiters.
(zero i_requested_max_size make waiter re-request the max size)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:42 +02:00
Yan, Zheng f3c4ebe65e ceph: using hash value to compose dentry offset
If MDS sorts dentries in dirfrag in hash order, we use hash value to
compose dentry offset. dentry offset is:

  (0xff << 52) | ((24 bits hash) << 28) |
  (the nth entry hash hash collision)

This offset is stable across directory fragmentation. This alos means
there is no need to reset readdir offset if directory get fragmented
in the middle of readdir.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:36 +02:00
Yan, Zheng 8974eebd38 ceph: record 'offset' for each entry of readdir result
This is preparation for using hash value as dentry 'offset'

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:35 +02:00
Yan, Zheng 956d39d631 ceph: define 'end/complete' in readdir reply as bit flags
Set a flag in readdir request, which indicates that client interprets
'end/complete' as bit flags. So that mds can reply additional flags in
readdir reply.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:35 +02:00
Yan, Zheng 2a5beea3f1 ceph: define struct for dir entry in readdir reply
This avoids defining multiple arrays for entries in readdir reply

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:34 +02:00
Yan, Zheng 3f38495409 ceph: report mount root in session metadata
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:33 +02:00
Yan, Zheng 6c93df5db6 ceph: don't call truncate_pagecache in ceph_writepages_start
truncate_pagecache() may decrease inode's reference. This can cause
deadlock if inode's last reference is dropped and iput_final() wants
to evict the inode. (evict() calls inode_wait_for_writeback(), which
waits for ceph_writepages_start() to return).

The fix is use work thead to truncate dirty pages. Also add 'forced
umount' check to ceph_update_writeable_page(), which prevents new
pages getting dirty.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:32 +02:00
Yan, Zheng 77310320c2 ceph: renew caps for read/write if mds session got killed.
When mds session gets killed, read/write operation may hang.
Client waits for Frw caps, but mds does not know what caps client
wants. To recover this, client sends an open request to mds. The
request will tell mds what caps client wants.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:31 +02:00
Ilya Dryomov fcd00b68bb libceph: DEFINE_RB_FUNCS macro
Given

    struct foo {
        u64 id;
        struct rb_node bar_node;
    };

generate insert_bar(), erase_bar() and lookup_bar() functions with

    DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)

The key is assumed to be an integer (u64, int, etc), compared with
< and >.  nodefld has to be initialized with RB_CLEAR_NODE().

Start using it for MDS, MON and OSD requests and OSD sessions.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:23 +02:00
Ilya Dryomov 6c1ea260f8 libceph: make authorizer destruction independent of ceph_auth_client
Starting the kernel client with cephx disabled and then enabling cephx
and restarting userspace daemons can result in a crash:

    [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000
    [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130
    [262671.584334] PGD 0
    [262671.635847] Oops: 0000 [#1] SMP
    [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu
    [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014
    [262672.268937] Workqueue: ceph-msgr con_work [libceph]
    [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000
    [262672.428330] RIP: 0010:[<ffffffff811cd04a>]  [<ffffffff811cd04a>] kfree+0x5a/0x130
    [262672.535880] RSP: 0018:ffff880149aeba58  EFLAGS: 00010286
    [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018
    [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012
    [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000
    [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40
    [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840
    [262673.131722] FS:  0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000
    [262673.245377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0
    [262673.417556] Stack:
    [262673.472943]  ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04
    [262673.583767]  ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00
    [262673.694546]  ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800
    [262673.805230] Call Trace:
    [262673.859116]  [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph]
    [262673.968705]  [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph]
    [262674.078852]  [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph]
    [262674.134249]  [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph]
    [262674.189124]  [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph]
    [262674.243749]  [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph]
    [262674.297485]  [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph]
    [262674.350813]  [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph]
    [262674.403312]  [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph]
    [262674.454712]  [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690
    [262674.505096]  [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph]
    [262674.555104]  [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0
    [262674.604072]  [<ffffffff810901ea>] worker_thread+0x11a/0x470
    [262674.652187]  [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310
    [262674.699022]  [<ffffffff810957a2>] kthread+0xd2/0xf0
    [262674.744494]  [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0
    [262674.789543]  [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70
    [262674.834094]  [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0

What happens is the following:

    (1) new MON session is established
    (2) old "none" ac is destroyed
    (3) new "cephx" ac is constructed
    ...
    (4) old OSD session (w/ "none" authorizer) is put
          ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer)

osd->o_auth.authorizer in the "none" case is just a bare pointer into
ac, which contains a single static copy for all services.  By the time
we get to (4), "none" ac, freed in (2), is long gone.  On top of that,
a new vtable installed in (3) points us at ceph_x_destroy_authorizer(),
so we end up trying to destroy a "none" authorizer with a "cephx"
destructor operating on invalid memory!

To fix this, decouple authorizer destruction from ac and do away with
a single static "none" authorizer by making a copy for each OSD or MDS
session.  Authorizers themselves are independent of ac and so there is
no reason for destroy_authorizer() to be an ac op.  Make it an op on
the authorizer itself by turning ceph_authorizer into a real struct.

Fixes: http://tracker.ceph.com/issues/15447

Reported-by: Alan Zhang <alan.zhang@linux.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-25 20:54:13 +02:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Yan, Zheng 315f240880 ceph: fix security xattr deadlock
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.

The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)

Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Deepa Dinamani 8bbd47140c ceph: replace CURRENT_TIME by current_fs_time()
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Ilya Dryomov 82dcabad75 libceph: revamp subs code, switch to SUBSCRIBE2 protocol
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime".  To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs.  Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Yan, Zheng 5ea5c5e0a7 ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support
Add support for the format change of MClientReply/MclientCaps.
Also add code that denies access to inodes with pool_ns layouts.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2016-03-04 21:00:37 +01:00
Ilya Dryomov 79dbd1baa6 libceph: msg signing callouts don't need con argument
We can use msg->con instead - at the point we sign an outgoing message
or check the signature on the incoming one, msg->con is always set.  We
wouldn't know how to sign a message without an associated session (i.e.
msg->con == NULL) and being able to sign a message using an explicitly
provided authorizer is of no use.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:45 +01:00
Yan, Zheng 68cd5b4b76 ceph: make fsync() wait unsafe requests that created/modified inode
If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Yan, Zheng 4c06ace81a ceph: add request to i_unsafe_dirops when getting unsafe reply
Previously we add request to i_unsafe_dirops when registering
request. So ceph_fsync() also waits for imcomplete requests.
This is unnecessary, ceph_fsync() only needs to wait unsafe
requests.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Yan, Zheng 5e804ac482 ceph: don't invalidate page cache when inode is no longer used
ceph_check_caps() invalidate page cache when inode is not used
by any open file. This behaviour is not friendly for workload
that repeatly read files.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Arnd Bergmann 777d738a5e ceph: fix message length computation
create_request_message() computes the maximum length of a message,
but uses the wrong type for the time stamp: sizeof(struct timespec)
may be 8 or 16 depending on the architecture, while sizeof(struct
ceph_timespec) is always 8, and that is what gets put into the
message.

Found while auditing the uses of timespec for y2038 problems.

Fixes: b8e69066d8 ("ceph: include time stamp in every MDS request")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:47 +01:00
Jianpeng Ma 5fdb1389e1 ceph: cleanup use of ceph_msg_get
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Brad Hubbard 1550d34e56 ceph: remove redundant test of head->safe and silence static analysis warnings
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Yan, Zheng 48fec5d0a5 ceph: EIO all operations after forced umount
This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:28 +03:00
Yan, Zheng 687265e5a8 ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng f66fd9f095 ceph: pre-allocate data structure that tracks caps flushing
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng e548e9b93d ceph: re-send flushing caps (which are revoked) in reconnect stage
if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng 8310b08913 ceph: track pending caps flushing globally
So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng 553adfd941 ceph: track pending caps flushing accurately
Previously we do not trace accurate TID for flushing caps. when
MDS failovers, we have no choice but to re-send all flushing caps
with a new TID. This can cause problem because MDS can has already
flushed some caps and has issued the same caps to other client.
The re-sent cap flush has a new TID, which makes MDS unable to
detect if it has already processed the cap flush.

This patch adds code to track pending caps flushing accurately.
When re-sending cap flush is needed, we use its original flush
TID.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Yan, Zheng 3e0708b990 ceph: ratelimit warn messages for MDS closes session
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:30 +03:00
Ilya Dryomov 5be7303477 ceph: simplify two mount_timeout sites
No need to bifurcate wait now that we've got ceph_timeout_jiffies().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Ilya Dryomov a319bf56a6 libceph: store timeouts in jiffies, verify user input
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive.  All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.

There is also a bug in the way mount_timeout=0 is handled.  It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.

Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25 11:49:29 +03:00
Yan, Zheng e8a7b8b12b ceph: exclude setfilelock requests when calculating oldest tid
setfilelock requests can block for a long time, which can prevent
client from advancing its oldest tid.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng 745a8e3bcc ceph: don't pre-allocate space for cap release messages
Previously we pre-allocate cap release messages for each caps. This
wastes lots of memory when there are large amount of caps. This patch
make the code not pre-allocate the cap release messages. Instead,
we add the corresponding ceph_cap struct to a list when releasing a
cap. Later when flush cap releases is needed, we allocate the cap
release messages dynamically.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng affbc19a68 ceph: make sure syncfs flushes all cap snaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:29 +03:00
Yan, Zheng 622f3e250f ceph: don't trim auth cap when there are cap snaps
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Yan, Zheng 10183a6955 ceph: check OSD caps before read/write
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:28 +03:00
Linus Torvalds 9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Yan, Zheng c0bd50e2ee ceph: fix null pointer dereference in send_mds_reconnect()
sb->s_root can be null when umounting

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-22 18:33:31 +03:00
Yan, Zheng 1c841a96b5 ceph: cleanup unsafe requests when reconnecting is denied
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:37 +03:00
Yan, Zheng a9f6eb6185 ceph: don't zero i_wrbuffer_ref when reconnecting is denied
remove_session_caps_cb() does not truncate dirty data in page
cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will
result negtive i_wrbuffer_ref/i_wrbuffer_ref_head

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:36 +03:00
Yan, Zheng 571ade336a ceph: don't mark dirty caps when there is no auth cap
No i_auth_cap means reconnecting to MDS was denied. So don't
add new dirty caps.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:36 +03:00
Nicholas Mc Guire 3563dbdd99 ceph: use msecs_to_jiffies for time conversion
This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Yan, Zheng 6e6f09231a ceph: drop cap releases in requests composed before cap reconnect
These cap releases are stale because MDS will re-establish client
caps according to the cap reconnect messages.

Note: MDS can detect stale cap messages, so these stale cap
releases are harmless even we don't drop them.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
David Howells 2b0143b5c9 VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:57 -04:00
Linus Torvalds 4533f6e27a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph changes from Sage Weil:
 "On the RBD side, there is a conversion to blk-mq from Christoph,
  several long-standing bug fixes from Ilya, and some cleanup from
  Rickard Strandqvist.

  On the CephFS side there is a long list of fixes from Zheng, including
  improved session handling, a few IO path fixes, some dcache management
  correctness fixes, and several blocking while !TASK_RUNNING fixes.

  The core code gets a few cleanups and Chaitanya has added support for
  TCP_NODELAY (which has been used on the server side for ages but we
  somehow missed on the kernel client).

  There is also an update to MAINTAINERS to fix up some email addresses
  and reflect that Ilya and Zheng are doing most of the maintenance for
  RBD and CephFS these days.  Do not be surprised to see a pull request
  come from one of them in the future if I am unavailable for some
  reason"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
  MAINTAINERS: update Ceph and RBD maintainers
  libceph: kfree() in put_osd() shouldn't depend on authorizer
  libceph: fix double __remove_osd() problem
  rbd: convert to blk-mq
  ceph: return error for traceless reply race
  ceph: fix dentry leaks
  ceph: re-send requests when MDS enters reconnecting stage
  ceph: show nocephx_require_signatures and notcp_nodelay options
  libceph: tcp_nodelay support
  rbd: do not treat standalone as flatten
  ceph: fix atomic_open snapdir
  ceph: properly mark empty directory as complete
  client: include kernel version in client metadata
  ceph: provide seperate {inode,file}_operations for snapdir
  ceph: fix request time stamp encoding
  ceph: fix reading inline data when i_size > PAGE_SIZE
  ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
  ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
  ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
  rbd: fix error paths in rbd_dev_refresh()
  ...
2015-02-19 14:14:42 -08:00
Yan, Zheng 3de22be677 ceph: re-send requests when MDS enters reconnecting stage
So that MDS can check if any request is already completed and process
completed requests in clientreplay stage. When completed requests are
processed in clientreplay stage, MDS can avoid sending traceless
replies.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:40 +03:00
Yan, Zheng a6a5ce4f0d client: include kernel version in client metadata
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:39 +03:00
Yan, Zheng 1f041a89b4 ceph: fix request time stamp encoding
struct timespec uses 'long' to present second and nanosecond. 'long'
is 64 bits on 64bits machine. ceph MDS expects time stamp to be
encoded as struct ceph_timespec, which uses 'u32' to present second
and nanosecond.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:39 +03:00
Yan, Zheng 86d8f67b26 ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
use an atomic variable to track number of sessions, this can avoid block
operation inside wait loops.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:38 +03:00
Yan, Zheng d3383a8e37 ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
check_cap_flush() calls mutex_lock(), which may block. So we can't
use it as condition check function for wait_event();

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19 13:31:38 +03:00