linux

Commit Graph

Author	SHA1	Message	Date
Xi Wang	6448669777	libceph: fix overflow check in crush_decode() The existing overflow check (n > ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n > (ULONG_MAX - a) / b). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Jim Schutt	182fac2689	net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer The Ceph messenger would sometimes queue multiple work items to write data to a socket when the socket buffer was full. Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the same way that net/core/stream.c:sk_stream_write_space() does, i.e., clearing it only when sufficient space is available in the socket buffer. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:45 -05:00
Linus Torvalds	6c073a7ee2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"	2012-02-02 15:47:33 -08:00
Sage Weil	ab434b60ab	ceph: initialize client debugfs outside of monc->mutex Initializing debufs under monc->mutex introduces a lock dependency for sb->s_type->i_mutex_key, which (combined with several other dependencies) leads to an annoying lockdep warning. There's no particular reason to do the debugfs setup under this lock, so move it out. It used to be the case that our first monmap could come from the OSD; that is no longer the case with recent servers, so we will reliably set up the client entry during the initial authentication. We don't have to worry about racing with debugfs teardown by ceph_debugfs_client_cleanup() because ceph_destroy_client() calls ceph_msgr_flush() first, which will wait for the message dispatch work to complete (and the debugfs init to complete). Fixes: #1940 Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:01 -08:00
Sage Weil	56e925b677	libceph: remove useless return value for osd_client __send_request() Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-10 08:57:03 -08:00
Sage Weil	e11b05d31f	crush: fix force for non-root TAKE Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-10 08:56:57 -08:00
Thomas Meyer	186482560f	ceph: Use kmemdup rather than duplicating its implementation Use kmemdup rather than duplicating its implementation The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-10 08:56:54 -08:00
Linus Torvalds	653f42f6b6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: add missing spin_unlock at ceph_mdsc_build_path() ceph: fix SEEK_CUR, SEEK_SET regression crush: fix mapping calculation when force argument doesn't exist ceph: use i_ceph_lock instead of i_lock rbd: remove buggy rollback functionality rbd: return an error when an invalid header is read ceph: fix rasize reporting by ceph_show_options	2011-12-13 14:59:42 -08:00
Sage Weil	f1932fc1a6	crush: fix mapping calculation when force argument doesn't exist If the force argument isn't valid, we should continue calculating a mapping as if it weren't specified. Signed-off-by: Sage Weil <sage@newdream.net>	2011-12-12 09:09:45 -08:00
Linus Torvalds	c292fe4aae	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: Allocate larger oid buffer in request msgs ceph: initialize root dentry ceph: fix iput race when queueing inode work	2011-11-21 12:11:13 -08:00
Stratos Psomadakis	224736d911	libceph: Allocate larger oid buffer in request msgs ceph_osd_request struct allocates a 40-byte buffer for object names. RBD image names can be up to 96 chars long (100 with the .rbd suffix), which results in the object name for the image being truncated, and a subsequent map failure. Increase the oid buffer in request messages, in order to avoid the truncation. Signed-off-by: Stratos Psomadakis <psomas@grnet.gr> Signed-off-by: Sage Weil <sage@newdream.net>	2011-11-11 09:50:19 -08:00
Paul Gortmaker	bc3b2d7fb9	net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:30 -04:00
Sage Weil	38d6453ca3	libceph: force resend of osd requests if we skip an osdmap If we skip over one or more map epochs, we need to resend all osd requests because it is possible they remapped to other servers and then back. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:17 -07:00
Noah Watkins	ee3b56f265	ceph: use kernel DNS resolver Change ceph_parse_ips to take either names given as IP addresses or standard hostnames (e.g. localhost). The DNS lookup is done using the dns_resolver facility similar to its use in AFS, NFS, and CIFS. This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER that controls if this feature is on or off. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:16 -07:00
Noah Watkins	49d9224c04	ceph: fix ceph_monc_init memory leak failure clean up does not consider ceph_auth_init. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:16 -07:00
Sage Weil	f0ed1b7cef	libceph: warn on msg allocation failures Any non-masked msg allocation failure should generate a warning and stack trace to the console. All of these need to eventually be replaced by safe preallocation or msgpools. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:16 -07:00
Sage Weil	b61c27636f	libceph: don't complain on msgpool alloc failures The pool allocation failures are masked by the pool; there is no need to spam the console about them. (That's the whole point of having the pool in the first place.) Mark msg allocations whose failure is safely handled as such. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Sage Weil	f6a2f5be07	libceph: always preallocate mon connection Allocate the mon connection on init. We already reuse it across reconnects. Remove now unnecessary (and incomplete) NULL checks. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Sage Weil	6ab00d465a	libceph: create messenger with client This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Linus Torvalds	92bb062fe3	Merge branch 'for-linus' of git://github.com/NewDreamNetwork/ceph-client * 'for-linus' of git://github.com/NewDreamNetwork/ceph-client: libceph: fix pg_temp mapping update libceph: fix pg_temp mapping calculation libceph: fix linger request requeuing libceph: fix parse options memory leak libceph: initialize ack_stamp to avoid unnecessary connection reset	2011-09-29 19:58:58 -07:00
Sage Weil	8adc8b3d78	libceph: fix pg_temp mapping update The incremental map updates have a record for each pg_temp mapping that is to be add/updated (len > 0) or removed (len == 0). The old code was written as if the updates were a complete enumeration; that was just wrong. Update the code to remove 0-length entries and drop the rbtree traversal. This avoids misdirected (and hung) requests that manifest as server errors like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-28 10:13:35 -07:00
Sage Weil	782e182e91	libceph: fix pg_temp mapping calculation We need to apply the modulo pg_num calculation before looking up a pgid in the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes (some) misdirected requests that result in messages like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 on the server and stall make the client block without getting a reply (at least until the pg_temp mapping goes way, but that can take a long long time). Reorder calc_pg_raw() a bit to make more sense. Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-28 10:13:31 -07:00
Sage Weil	935b639a04	libceph: fix linger request requeuing The r_req_lru_item list node moves between several lists, and that cycle is not directly related (and does not begin) with __register_request(). Initialize it in the request constructor, not __register_request(). This fixes later badness (below) when OSDs restart underneath an rbd mount. Crashes we've seen due to this include: [ 213.974288] kernel BUG at net/ceph/messenger.c:2193! and [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 144.035278] IP: [<ffffffffa036c053>] con_work+0x1463/0x2ce0 [libceph] Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-16 11:13:17 -07:00
Noah Watkins	1cad78932a	libceph: fix parse options memory leak ceph_destroy_options does not free opt->mon_addr that is allocated in ceph_parse_options. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-16 09:19:53 -07:00
Jim Schutt	c0d5f9db1c	libceph: initialize ack_stamp to avoid unnecessary connection reset Commit `4cf9d54463` recorded when an outgoing ceph message was ACKed, in order to avoid unnecessary connection resets when an OSD is busy. However, ack_stamp is uninitialized, so there is a window between when the message is sent and when it is ACKed in which handle_timeout() interprets the unitialized value as an expired timeout, and resets the connection unnecessarily. Close the window by initializing ack_stamp. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-16 09:16:22 -07:00
Linus Torvalds	0d20fbbe82	Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client * 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix leak of osd structs during shutdown ceph: fix memory leak ceph: fix encoding of ino only (not relative) paths libceph: fix msgpool	2011-09-09 15:48:34 -07:00
Sage Weil	aca420bc51	libceph: fix leak of osd structs during shutdown We want to remove all OSDs, not just those on the idle LRU. Signed-off-by: Sage Weil <sage@newdream.net>	2011-08-31 15:22:46 -07:00
Sage Weil	5185352c16	libceph: fix msgpool There were several problems here: 1- we weren't tagging allocations with the pool, so they were never returned to the pool. 2- msgpool_put didn't add back to the mempool, even it were called. 3- msgpool_release didn't clear the pool pointer, so it would have looped had #1 not been broken. These may or may not have been responsible for #1136 or #1381 (BUG due to non-empty mempool on umount). I can't seem to trigger the crash now using the method I was using before. Signed-off-by: Sage Weil <sage@newdream.net>	2011-08-09 15:26:17 -07:00
Linus Torvalds	ba5b56cb3e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: document unlocked d_parent accesses ceph: explicitly reference rename old_dentry parent dir in request ceph: document locking for ceph_set_dentry_offset ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug ceph: protect d_parent access in ceph_d_revalidate ceph: protect access to d_parent ceph: handle racing calls to ceph_init_dentry ceph: set dir complete frag after adding capability rbd: set blk_queue request sizes to object size ceph: set up readahead size when rsize is not passed rbd: cancel watch request when releasing the device ceph: ignore lease mask ceph: fix ceph_lookup_open intent usage ceph: only link open operations to directory unsafe list if O_CREAT\|O_TRUNC ceph: fix bad parent_inode calc in ceph_lookup_open ceph: avoid carrying Fw cap during write into page cache libceph: don't time out osd requests that haven't been received ceph: report f_bfree based on kb_avail rather than diffing. ceph: only queue capsnap if caps are dirty ceph: fix snap writeback when racing with writes ...	2011-07-26 13:38:50 -07:00
Sage Weil	4cf9d54463	libceph: don't time out osd requests that haven't been received Keep track of when an outgoing message is ACKed (i.e., the server fully received it and, presumably, queued it for processing). Time out OSD requests only if it's been too long since they've been received. This prevents timeouts and connection thrashing when the OSDs are simply busy and are throttling the requests they read off the network. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-26 11:27:24 -07:00
David S. Miller	033b1142f4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-21 13:38:42 -07:00
Sage Weil	38be7a79f7	ceph: fix file mode calculation open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR. No need for any O_APPEND special case. Passing O_WRONLY\|O_RDWR is undefined according to the man page, but the Linux VFS interprets this as O_RDWR, so we'll do the same. This fixes open(2) with flags O_RDWR\|O_APPEND, which was incorrectly being translated to readonly. Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Sage Weil <sage@newdream.net>	2011-07-19 11:25:04 -07:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
David S. Miller	9f6ec8d697	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c	2011-06-20 22:29:08 -07:00
Joe Perches	ea11073387	net: Remove casts of void * Unnecessary casts of void * clutter the code. These are the remainder casts after several specific patches to remove netdev_priv and dev_priv. Done via coccinelle script: $ cat cast_void_pointer.cocci @@ type T; T pt; void pv; @@ - pt = (T *)pv; + pt = pv; Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>	2011-06-16 23:19:27 -04:00
Sage Weil	9bb0ce2b0b	libceph: fix page calculation for non-page-aligned io Set the page count correctly for non-page-aligned IO. We were already doing this correctly for alignment, but not the page count. Fixes DIRECT_IO writes from unaligned pages. Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-13 16:26:17 -07:00
Sage Weil	2584547230	ceph: fix sync vs canceled write If we cancel a write, trigger the safe completions to prevent a sync from blocking indefinitely in ceph_osdc_sync(). Signed-off-by: Sage Weil <sage@newdream.net>	2011-06-07 21:34:13 -07:00
Sage Weil	cd634fb6ee	libceph: subscribe to osdmap when cluster is full When the cluster is marked full, subscribe to subsequent map updates to ensure we find out promptly when it is no longer full. This will prevent us from spewing ENOSPC for (much) longer than necessary. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:11 -07:00
Sage Weil	7662d8ff57	libceph: handle new osdmap down/state change encoding Old incrementals encode a 0 value (nearly always) when an osd goes down. Change that to allow any state bit(s) to be flipped. Special case 0 to mean flip the CEPH_OSD_UP bit to mimic the old behavior. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:09 -07:00
Sage Weil	9d6fcb081a	ceph: check return value for start_request in writepages Since we pass the nofail arg, we should never get an error; BUG if we do. (And fix the function to not return an error if __map_request fails.) Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:05 -07:00
Sage Weil	a2a79609c0	libceph: add missing breaks in addr_set_port Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:05 -07:00
Sage Weil	0417788226	libceph: fix TAG_WAIT case If we get a WAIT as a client something went wrong; error out. And don't fall through to an unrelated case. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:04 -07:00
Sage Weil	31456665a0	libceph: fix osdmap timestamp assignment Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:03 -07:00
Sage Weil	12a2f643b0	libceph: use snprintf for unknown addrs Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:03 -07:00
Sage Weil	2dab036b8c	libceph: use snprintf for formatting object name Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:02 -07:00
Sage Weil	e8f54ce169	libceph: fix uninitialized value when no get_authorizer method is set If there is no get_authorizer method we set the out_kvec to a bogus pointer. The length is also zero in that case, so it doesn't much matter, but it's better not to add the empty item in the first place. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:02 -07:00
Sage Weil	0da5d70369	libceph: handle connection reopen race with callbacks If a connection is closed and/or reopened (ceph_con_close, ceph_con_open) it can race with a callback. con_work does various state checks for closed or reopened sockets at the beginning, but drops con->mutex before making callbacks. We need to check for state bit changes after retaking the lock to ensure we restart con_work and execute those CLOSED/OPENING tests or else we may end up operating under stale assumptions. In Jim's case, this was causing 'bad tag' errors. There are four cases where we re-take the con->mutex inside con_work: catch them all and return EAGAIN from try_{read,write} so that we can restart con_work. Reported-by: Jim Schutt <jaschut@sandia.gov> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:21:05 -07:00
Sage Weil	4ad12621e4	libceph: fix ceph_osdc_alloc_request error checks ceph_osdc_alloc_request returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-03 09:28:13 -07:00
Henry C Chang	ca20892db7	libceph: fix ceph_msg_new error path If memory allocation failed, calling ceph_msg_put() will cause GPF since some of ceph_msg variables are not initialized first. Fix Bug #970. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-03 09:28:11 -07:00
Linus Torvalds	e6d2831834	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix linger request requeueing	2011-04-14 19:02:55 -07:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Sage Weil	77f38e0eea	libceph: fix linger request requeueing Fix the request transition from linger -> normal request. The key is to preserve r_osd and requeue on the same OSD. Reregister as a normal request, add the request to the proper queues, then unregister the linger. Fix the unregister helper to avoid clearing r_osd (and also simplify the parallel check in __unregister_request()). Reported-by: Henry Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-04-06 09:09:16 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Tommi Virtanen	4b2a58abd1	libceph: Create a new key type "ceph". This allows us to use existence of the key type as a feature test, from userspace. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-29 12:11:24 -07:00
Tommi Virtanen	e2c3d29b42	libceph: Get secret from the kernel keys api when mounting with key=NAME. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-29 12:11:19 -07:00
Tommi Virtanen	8323c3aa74	ceph: Move secret key parsing earlier. This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-29 12:11:16 -07:00
Sage Weil	fbdb919048	libceph: fix null dereference when unregistering linger requests We should only clear r_osd if we are neither registered as a linger or a regular request. We may unregister as a linger while still registered as a regular request (e.g., in reset_osd). Incorrectly clearing r_osd there leads to a null pointer dereference in __send_request. Also simplify the parallel check in __unregister_request() where we just removed r_osd_item and know it's empty. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-29 12:11:06 -07:00
Dan Carpenter	234af26ff1	ceph: unlock on error in ceph_osdc_start_request() There was a missing unlock on the error path if __map_request() failed. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-29 08:59:54 -07:00
Mariusz Kozlowski	6b0ae4097c	ceph: fix possible NULL pointer dereference This patch fixes 'event_work' dereference before it is checked for NULL. Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-26 13:41:20 -07:00
Sage Weil	ef550f6f4f	ceph: flush msgr_wq during mds_client shutdown The release method for mds connections uses a backpointer to the mds_client, so we need to flush the workqueue of any pending work (and ceph_connection references) prior to freeing the mds_client. This fixes an oops easily triggered under UML by while true ; do mount ... ; umount ... ; done Also fix an outdated comment: the flush in ceph_destroy_client only flushes OSD connections out. This bug is basically an artifact of the ceph -> ceph+libceph conversion. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-25 13:27:48 -07:00
Yehuda Sadeh	a40c4f10e3	libceph: add lingering request and watch/notify event framework Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-22 11:33:55 -07:00
Sage Weil	6f6c700675	libceph: fix osd request queuing on osdmap updates If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:19 -07:00
Tommi Virtanen	b09734b1f4	libceph: Fix base64-decoding when input ends in newline. It used to return -EINVAL because it thought the end was not aligned to 4 bytes. Clean up superfluous src < end test in if, the while itself guarantees that. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-15 09:14:02 -07:00
Sage Weil	e00de341fd	libceph: fix msgr standby handling The standby logic used to be pretty dependent on the work requeueing behavior that changed when we switched to WQ_NON_REENTRANT. It was also very fragile. Restructure things so that: - We clear WRITE_PENDING when we set STANDBY. This ensures we will requeue work when we wake up later. - con_work backs off if STANDBY is set. There is nothing to do if we are in standby. - clear_standby() helper is called by both con_send() and con_keepalive(), the two actions that can wake us up again. Move the connect_seq++ logic here. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-04 12:25:05 -08:00
Sage Weil	e76661d0a5	libceph: fix msgr keepalive flag There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-04 12:24:31 -08:00
Sage Weil	60bf8bf881	libceph: fix msgr backoff With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-04 12:24:28 -08:00
Sage Weil	692d20f576	libceph: retry after authorization failure If we mark the connection CLOSED we will give up trying to reconnect to this server instance. That is appropriate for things like a protocol version mismatch that won't change until the server is restarted, at which point we'll get a new addr and reconnect. An authorization failure like this is probably due to the server not properly rotating it's secret keys, however, and should be treated as transient so that the normal backoff and retry behavior kicks in. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 13:47:40 -08:00
Sage Weil	38815b7802	libceph: fix handling of short returns from get_user_pages get_user_pages() can return fewer pages than we ask for. We were returning a bogus pointer/error code in that case. Instead, loop until we get all the pages we want or get an error we can return to the caller. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 13:47:39 -08:00
Linus Torvalds	8bd89ca220	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling	2011-02-21 15:01:38 -08:00
Sage Weil	42961d2333	libceph: fix socket write error handling Pass errors from writing to the socket up the stack. If we get -EAGAIN, return 0 from the helper to simplify the callers' checks. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-25 08:19:34 -08:00
Sage Weil	98bdb0aa00	libceph: fix socket read error handling If we get EAGAIN when trying to read from the socket, it is not an error. Return 0 from the helper in this case to simplify the error handling cases in the caller (indirectly, try_read). Fix try_read to pass any error to it's caller (con_work) instead of almost always returning 0. This let's us respond to things like socket disconnects. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-25 08:17:48 -08:00
Linus Torvalds	a170315420	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode	2011-01-13 10:25:24 -08:00
Tejun Heo	f363e45fd1	net/ceph: make ceph_msgr_wq non-reentrant ceph messenger code does a rather complex dancing around multithread workqueue to make sure the same work item isn't executed concurrently on different CPUs. This restriction can be provided by workqueue with WQ_NON_REENTRANT. Make ceph_msgr_wq non-reentrant workqueue with the default concurrency level and remove the QUEUED/BUSY logic. * This removes backoff handling in con_work() but it couldn't reliably block execution of con_work() to begin with - queue_con() can be called after the work started but before BUSY is set. It seems that it was an optimization for a rather cold path and can be safely removed. * The number of concurrent work items is bound by the number of connections and connetions are independent from each other. With the default concurrency level, different connections will be executed independently. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:14 -08:00
Jesper Juhl	b0aee3516d	ceph: Always free allocated memory in osdmap_decode() Always free memory allocated to 'pi' in net/ceph/osdmap.c::osdmap_decode(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:14 -08:00
Sage Weil	6c0f3af72c	ceph: add dir_layout to inode Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:12 -08:00
David S. Miller	17f7f4d9fc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c	2010-12-26 22:37:05 -08:00
Linus Torvalds	9d5004fcf6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path	2010-12-20 21:32:20 -08:00
Henry C Chang	361cf40519	ceph: handle partial result from get_user_pages The get_user_pages() helper can return fewer than the requested pages. Error out in that case, and clean up the partial result. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 09:55:59 -08:00
Henry C Chang	b6aa5901c7	ceph: mark user pages dirty on direct-io reads For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 09:54:40 -08:00
Sage Weil	d96c9043d1	ceph: fix msgr_init error path create_workqueue() returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-13 20:30:28 -08:00
David S. Miller	fe6c791570	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c net/llc/af_llc.c	2010-12-08 13:47:38 -08:00
Linus Torvalds	a01af8e4a4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) af_unix: limit recursion level pch_gbe driver: The wrong of initializer entry pch_gbe dreiver: chang author ucc_geth: fix ucc halt problem in half duplex mode inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners ehea: Add some info messages and fix an issue hso: fix disable_net NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty cxgb4vf: fix setting unicast/multicast addresses ... net, ppp: Report correct error code if unit allocation failed DECnet: don't leak uninitialized stack byte au1000_eth: fix invalid address accessing the MAC enable register dccp: fix error in updating the GAR tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) netns: Don't leak others' openreq-s in proc Net: ceph: Makefile: Remove unnessary code vhost/net: fix rcu check usage econet: fix CVE-2010-3848 econet: fix CVE-2010-3850 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849 ...	2010-11-29 14:36:33 -08:00
Tracey Dent	4cb6a614ba	Net: ceph: Makefile: Remove unnessary code Remove the if and else conditional because the code is in mainline and there is no need in it being there. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-27 17:39:29 -08:00
Linus Torvalds	3cbaa0f7a7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: of/phylib: Use device tree properties to initialize Marvell PHYs. phylib: Add support for Marvell 88E1149R devices. phylib: Use common page register definition for Marvell PHYs. qlge: Fix incorrect usage of module parameters and netdev msg level ipv6: fix missing in6_ifa_put in addrconf SuperH IrDA: correct Baud rate error correction atl1c: Fix hardware type check for enabling OTP CLK net: allow GFP_HIGHMEM in __vmalloc() bonding: change list contact to netdev@vger.kernel.org e1000: fix screaming IRQ	2010-11-24 08:22:34 +09:00
Tracey Dent	fa13bc3daa	Net: ceph: Makefile: remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:10 -08:00
Eric Dumazet	7a1c8e5ab1	net: allow GFP_HIGHMEM in __vmalloc() We forgot to use __GFP_HIGHMEM in several __vmalloc() calls. In ceph, add the missing flag. In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is cleaner and allows using HIGHMEM pages as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-21 10:04:04 -08:00
Sage Weil	c5c6b19d4b	ceph: explicitly specify page alignment in network messages The alignment used for reading data into or out of pages used to be taken from the data_off field in the message header. This only worked as long as the page alignment matched the object offset, breaking direct io to non-page aligned offsets. Instead, explicitly specify the page alignment next to the page vector in the ceph_msg struct, and use that instead of the message header (which probably shouldn't be trusted). The alloc_msg callback is responsible for filling in this field properly when it sets up the page vector. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:43:17 -08:00
Sage Weil	b7495fc2ff	ceph: make page alignment explicit in osd interface We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:43:12 -08:00
Sage Weil	e98b6fed84	ceph: fix comment, remove extraneous args The offset/length arguments aren't used. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:24:53 -08:00
Sage Weil	df9f86faf3	ceph: fix small seq message skipping If the client gets out of sync with the server message sequence number, we normally skip low seq messages (ones we already received). The skip code was also incrementing the expected seq, such that all subsequent messages also appeared old and got skipped, and an eventual timeout on the osd connection. This resulted in some lagging requests and console messages like [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017 [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018 [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019 [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020 [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021 [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022 [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023 [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024 [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025 [233500.534614] ceph: tid 6023602 timed out on osd22, will reset osd Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-01 15:49:23 -07:00
Sage Weil	240634e9b3	ceph: fix num_pages_free accounting in pagelist Decrement the free page counter when removing a page from the free_list. Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:23 -07:00
Yehuda Sadeh	010e3b48fc	ceph: don't crash when passed bad mount options This only happened when parse_extra_token was not passed to ceph_parse_option() (hence, only happened in rbd). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:22 -07:00
Greg Farnum	ac0b74d8a1	ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor These facilitate preallocation of pages so that we can encode into the pagelist in an atomic context. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:16 -07:00
Yehuda Sadeh	602adf4002	rbd: introduce rados block device (rbd), based on libceph The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:13 -07:00
Yehuda Sadeh	3d14c5d2b6	ceph: factor out libceph from Ceph file system This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:37:28 -07:00

1 2 3 4 5

245 Commits