Commit Graph

123 Commits

Author SHA1 Message Date
Stephen Rothwell f1a3d57213 ceph: update for write_inode API change
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-05 14:49:41 -08:00
Yehuda Sadeh 422d2cb8f9 ceph: reset osd after relevant messages timed out
This simplifies the process of timing out messages. We
keep lru of current messages that are in flight. If a
timeout has passed, we reset the osd connection, so that
messages will be retransmitted.  This is a failsafe in case
we hit some sort of problem sending out message to the OSD.
Normally, we'll get notification via an updated osdmap if
there are problems.

If a request is older than the keepalive timeout, send a
keepalive to ensure we detect any breaks in the TCP connection.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-04 11:26:35 -08:00
Sage Weil e9964c1023 ceph: fix flush_dirty_caps race with caps migration
The flush_dirty_caps() used to loop over the first entry of the cap_dirty
dirty list on the assumption that after calling ceph_check_caps() it would
be removed from the list.  This isn't true for caps that are being
migrated between MDSs, where we've received the EXPORT but not the IMPORT.

Instead, do a safe list iteration, and pin the next inode on the list via
the CEPH_I_NOFLUSH flag.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-01 15:28:02 -08:00
Sage Weil 2600d2dd50 ceph: drop messages on unregistered mds sessions; cleanup
Verify the mds session is currently registered before handling
incoming messages.  Clean up message handlers to pull mds out
of session->s_mds instead of less trustworthy src field.

Clean up con_{get,put} debug output.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23 14:26:35 -08:00
Sage Weil 7c1332b8cb ceph: fix iterate_caps removal race
We need to be able to iterate over all caps on a session with a
possibly slow callback on each cap.  To allow this, we used to
prevent cap reordering while we were iterating.  However, we were
not safe from races with removal: removing the 'next' cap would
make the next pointer from list_for_each_entry_safe be invalid,
and cause a lock up or similar badness.

Instead, we keep an iterator pointer in the session pointing to
the current cap.  As before, we avoid reordering.  For removal,
if the cap isn't the current cap we are iterating over, we are
fine.  If it is, we clear cap->ci (to mark the cap as pending
removal) but leave it in the session list.  In iterate_caps, we
can safely finish removal and get the next cap pointer.

While we're at it, clean up put_cap to not take a cap reservation
context, as it was never used.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:47 -08:00
Sage Weil 85ccce43a3 ceph: clean up readdir caps reservation
Use a global counter for the minimum number of allocated caps instead of
hard coding a check against readdir_max.  This takes into account multiple
client instances, and avoids examining the superblock mount options when a
cap is dropped.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17 10:02:43 -08:00
Sage Weil a105f00cf1 ceph: use rbtree for snap_realms
Switch from radix tree to rbtree for snap realms.  This is much more
appropriate given that realm keys are few and far between.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-16 22:01:09 -08:00
Sage Weil 3c6f6b79a6 ceph: cleanup async writeback, truncation, invalidate helpers
Grab inode ref in helper.  Make work functions static, with consistent
naming.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:54 -08:00
Yehuda Sadeh f5a2041bd9 ceph: put unused osd connections on lru
Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11 11:48:48 -08:00
Sage Weil 9bd2e6f8ba ceph: allow renewal of auth credentials
Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10 15:04:47 -08:00
Yehuda Sadeh 2baba25019 ceph: writeback congestion control
Set bdi congestion bit when amount of write data in flight exceeds adjustable
threshold.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:56 -08:00
Sage Weil 06edf046dd ceph: include link to bdi in debugfs
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21 16:39:54 -08:00
Sage Weil 0743304d87 ceph: fix debugfs entry, simplify fsid checks
We may first learn our fsid from any of the mon, osd, or mds maps
(whichever the monitor sends first).  Consolidate checks in a single
helper.  Initialize the client debugfs entry then, since we need the
fsid (and global_id) for the directory name.

Also remove dead mount code.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-20 14:24:27 -08:00
Sage Weil 4e7a5dcd1b ceph: negotiate authentication protocol; implement AUTH_NONE protocol
When we open a monitor session, we send an initial AUTH message listing
the auth protocols we support, our entity name, and (possibly) a previously
assigned global_id.  The monitor chooses a protocol and responds with an
initial message.

Initially implement AUTH_NONE, a dummy protocol that provides no security,
but works within the new framework.  It generates 'authorizers' that are
used when connecting to (mds, osd) services that simply state our entity
name and global_id.

This is a wire protocol change.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18 16:19:57 -08:00
Sage Weil 039934b895 ceph: build cleanly without CONFIG_DEBUG_FS
Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-12 15:56:51 -08:00
Sage Weil cdac830313 ceph: remove recon_gen logic
We don't get an explicit affirmative confirmation that our caps reconnect,
nor do we necessarily want to pay that cost.  So, take all this code out
for now.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-10 16:03:53 -08:00
Sage Weil 685f9a5d14 ceph: do not confuse stale and dead (unreconnected) caps
We were using the cap_gen to track both stale caps (caps that timed out
due to temporarily losing touch with the mds) and dead caps that did not
reconnect after an MDS failure.  Introduce a recon_gen counter to track
reconnections to restarted MDSs and kill dead caps based on that instead.

Rename gen to cap_gen while we're at it to make it more clear which is
which.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-09 12:06:07 -08:00
Noah Watkins fbbccec9c6 ceph: replace list_entry with container_of
Usage of non-list.h list_entry function for container_of
functionality replaced with direct use of container_of.

Signed-off-by: Noah Watkins <noah@noahdesu.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-28 17:44:22 -07:00
Sage Weil 6b8051855d ceph: allocate and parse mount args before client instance
This simplifies much of the error handling during mount.  It also means
that we have the mount args before client creation, and we can initialize
based on those options.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-27 11:57:03 -07:00
Sage Weil ecb19c4649 ceph: remove small mon addr limit; use CEPH_MAX_MON where appropriate
Get rid of separate max mon limit; use the system limit instead.  This
allows mounts when there are lots of mon addrs provided by mount.ceph (as
with a host with lots of A/AAAA records).

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-22 10:53:17 -07:00
Sage Weil 8fa9765576 ceph: enable readahead
Initialized bdi->ra_pages to enable readahead.  Use 512KB default.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-16 14:44:43 -07:00
Sage Weil afcdaea3f2 ceph: flush dirty caps via the cap_dirty list
Previously we were flushing dirty caps by passing an extra flag
when traversing the delayed caps list.  Besides being a bit ugly,
that can also miss caps that are dirty but didn't result in a
cap requeue: notably, mark_caps_dirty().

Separate the flushing into a separate helper, and traverse the
cap_dirty list.

This also brings i_dirty_item in line with i_dirty_caps: we are
on the list IFF caps != 0.  We carry an inode ref IFF
dirty_caps|flushing_caps != 0.

Lose the unused return value from __ceph_mark_caps_dirty().

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-15 18:14:35 -07:00
Sage Weil de57606c23 ceph: client types
We first define constants, types, and prototypes for the kernel client
proper.

A few subsystems are defined separately later: the MDS, OSD, and
monitor clients, and the messaging layer.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-06 11:31:07 -07:00