linux

Commit Graph

Author	SHA1	Message	Date
Alex Elder	8e94af8e2b	rbd: encapsulate header validity test If an rbd image header is read and it doesn't begin with the expected magic information, a warning is displayed. This is a fairly simple test, but it could be extended at some point. Fix the comparison so it actually looks at the "text" field rather than the front of the structure. In any case, encapsulate the validity test in its own function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:48 -07:00
Alex Elder	bd919d45aa	rbd: clean up a few dout() calls There was a dout() call in rbd_do_request() that was reporting the reporting the offset as the length and vice versa. While fixing that I did a quick scan of other dout() calls and fixed a couple of other minor things. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:46 -07:00
Alex Elder	a059329056	rbd: simplify __rbd_remove_all_snaps() This just replaces a while loop with list_for_each_entry_safe() in __rbd_remove_all_snaps(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:45 -07:00
Alex Elder	a66f8c97a3	rbd: drop extra header_rwsem init In commit `c666601a` there was inadvertently added an extra initialization of rbd_dev->header_rwsem. This gets rid of the duplicate. Reported-by: Guangliang Zhao <gzhao@suse.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:45 -07:00
Alex Elder	9e15dc735a	rbd: kill rbd_image_header->snap_seq The snap_seq field in an rbd_image_header structure held the value from the rbd image header when it was last refreshed. We now maintain this value in the snapc->seq field. So get rid of the other one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:44 -07:00
Alex Elder	505cbb9bed	rbd: set snapc->seq only when refreshing header In rbd_header_add_snap() there is code to set snapc->seq to the just-added snapshot id. This is the only remnant left of the use of that field for recording which snapshot an rbd_dev was associated with. That functionality is no longer supported, so get rid of that final bit of code. Doing so means we never actually set snapc->seq any more. On the server, the snapshot context's sequence value represents the highest snapshot id ever issued for a particular rbd image. So we'll make it have that meaning here as well. To do so, set this value whenever the rbd header is (re-)read. That way it will always be consistent with the rest of the snapshot context we maintain. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:43 -07:00
Alex Elder	78dc447d3c	rbd: preserve snapc->seq in rbd_header_set_snap() In rbd_header_set_snap(), there is logic to make the snap context's seq field get set to a particular snapshot id, or 0 if there is no snapshot for the rbd image. This seems to be an artifact of how the current snapshot id for an rbd_dev was recorded before the rbd_dev->snap_id field began to be used for that purpose. There's no need to update the value of snapc->seq here any more, so stop doing it. Tidy up a few local variables in that function while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:42 -07:00
Alex Elder	75fe9e1981	rbd: don't use snapc->seq that way In what appears to be an artifact of a different way of encoding whether an rbd image maps a snapshot, __rbd_refresh_header() has code that arranges to update the seq value in an rbd image's snapshot context to point to the first entry in its snapshot array if that's where it was pointing initially. We now use rbd_dev->snap_id to record the snapshot id--using the special value CEPH_NOSNAP to indicate the rbd_dev is not mapping a snapshot at all. There is therefore no need to check for this case, nor to update the seq value, in __rbd_refresh_header(). Just preserve the seq value that rbd_read_header() provides (which, at the moment, is nothing). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 18:15:41 -07:00
Josh Durgin	a71b891bc7	rbd: send header version when notifying Previously the original header version was sent. Now, we update it when the header changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:40 -07:00
Josh Durgin	d1d2564654	rbd: use reference counting for the snap context This prevents a race between requests with a given snap context and header updates that free it. The osd client was already expecting the snap context to be reference counted, since it get()s it in ceph_osdc_build_request and put()s it when the request completes. Also remove the second down_read()/up_read() on header_rwsem in rbd_do_request, which wasn't actually preventing this race or protecting any other data. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:40 -07:00
Josh Durgin	93a24e084d	rbd: set image size when header is updated The image may have been resized. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:39 -07:00
Josh Durgin	a51aa0c042	rbd: expose the correct size of the device in sysfs If an image was mapped to a snapshot, the size of the head version would be shown. Protect capacity with header_rwsem, since it may change. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:38 -07:00
Josh Durgin	474ef7ce83	rbd: only reset capacity when pointing to head Snapshots cannot be resized, and the new capacity of head should not be reflected by the snapshot. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:37 -07:00
Josh Durgin	e88a36ec96	rbd: return errors for mapped but deleted snapshot When a snapshot is deleted, the OSD will return ENOENT when reading from it. This is normally interpreted as a hole by rbd, which will return zeroes. To minimize the time in which this can happen, stop requests early when we are notified that our snapshot no longer exists. [elder@inktank.com: updated __rbd_init_snaps_header() logic] Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:36 -07:00
Alex Elder	d1f57ea663	rbd: kill num_reply parameters Several functions include a num_reply parameter, but it is never used. Just get rid of it everywhere--it seems to be something that never got fully implemented. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:09 -07:00
Alex Elder	43ae470112	rbd: option symbol renames Use the name "ceph_opts" consistently (rather than just "opt") for pointers to a ceph_options structure. Change the few spots that don't use "rbd_opts" for a rbd_options pointer to match the rest. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:08 -07:00
Alex Elder	aded07ea9f	rbd: more symbol renames Rename variables named "obj" which represent object names so they're consistently named "object_name". Rename the "cls" and "method" parameters in rbd_req_sync_exec() to be "class_name" and "method_name", and make similar changes to the names of local variables in that function representing the lengths of those names. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:07 -07:00
Alex Elder	0bed54dc9a	rbd: rename some fields in struct rbd_dev An rbd image is not a single object, but a logical construct made up of an aggregation of objects. Rename some fields in struct rbd_dev, in hopes of reinforcing this. obj --> image_name obj_len --> image_name_len obj_md_name --> header_name Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:06 -07:00
Alex Elder	0ce1a79413	rbd: use rbd_dev consistently Most variables that represent a struct rbd_device are named "rbd_dev", but in some cases "dev" is used instead. Change all the "dev" references so they use "rbd_dev" consistently, to make it clear from the name that we're working with an RBD device (as opposed to, for example, a struct device). Similarly, change the name of the "dev" field in struct rbd_notify_info to be "rbd_dev". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:05 -07:00
Alex Elder	820a5f3e94	rbd: dynamically allocate snapshot name There is no need to impose a small limit the length of the snapshot name recorded for an rbd image in a struct rbd_dev. Remove the limitation by allocating space for the snapshot name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:04 -07:00
Alex Elder	bf3e5ae112	rbd: dynamically allocate image name There is no need to impose a small limit the length of the rbd image name recorded in a struct rbd_dev. Remove the limitation by allocating space for the image name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:04 -07:00
Alex Elder	cb8627c76d	rbd: dynamically allocate image header name There is no need to impose a small limit the length of the header name recorded for an rbd image in a struct rbd_dev. Remove the limitation by allocating space for the header name dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:03 -07:00
Alex Elder	849b4260d4	rbd: dynamically allocate object prefix There is no need to impose a small limit the length of the object prefix recorded for an rbd image in a struct rbd_image_header. Remove the limitation by allocating space for the object prefix dynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:02 -07:00
Alex Elder	d22f76e703	rbd: dynamically allocate pool name There is no need to impose a small limit the length of the pool name recorded for an rbd image in a struct rbd_device. Remove the limitation by allocating space for the pool name ynamically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:01 -07:00
Alex Elder	9bb2f334b9	rbd: create pool_id device attribute Add an entry under /sys/bus/rbd/devices/<N>/ named "pool_id" that provides the id for the pool the rbd image is assocatied with. This is in addition to the pool name already provided. Rename the "poolid" field in struct rbd_device to be "pool_id". Update the documentation to reflect the addition of this new entry. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:30:00 -07:00
Alex Elder	ca1e49a6af	rbd: rename rbd_dev->block_name Each rbd image has a name that forms the basis of all data objects backing the device. Old (format 1) images refer to this name as the "block name," while new (format 2) images use the term "object prefix" for this. Change the field name in the in-core rbd image header structure to reflect the more modern usage. We intentionally keep the the name "block_name" in the on-disk definition for format 1 image headers. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:59 -07:00
Alex Elder	ea3352f4aa	rbd: define dup_token() Define a new function dup_token(), to be used during argument parsing for making dynamically-allocated copies of tokens being parsed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:58 -07:00
Alex Elder	ad4f232f28	rbd: drop a useless local variable In rbd_req_sync_notify_ack(), a local variable was needlessly being used to hold a null pointer. Just pass NULL instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-07-30 09:29:56 -07:00
Dan Carpenter	895cfcc810	rbd: endian bug in rbd_req_cb() Sparse complains about this because: drivers/block/rbd.c:996:20: warning: cast to restricted __le32 drivers/block/rbd.c:996:20: warning: cast from restricted __le16 These are set in osd_req_encode_op() and they are le16. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-06-06 09:23:54 -05:00
Yan, Zheng	f9f9a19044	rbd: Fix ceph_snap_context size calculation ceph_snap_context->snaps is an u64 array Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-06-06 09:23:53 -05:00
Josh Durgin	263c6ca007	rbd: rename __rbd_update_snaps to __rbd_refresh_header This function rereads the entire header and handles any changes in it, not just changes in snapshots. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:09 -05:00
Josh Durgin	3591538fb2	rbd: fix snapshot size type Snapshot sizes should be the same type as regular image sizes. This only affects their displayed size in sysfs, not the reported size of an actual block device sizes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:03 -05:00
Josh Durgin	b06e6a6be7	rbd: remove conditional snapid parameters The snapid parameters passed to rbd_do_op() and rbd_req_sync_op() are now always either a valid snapid or an explicit CEPH_NOSNAP. [elder@dreamhost.com: Rephrased the description] Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:58 -05:00
Josh Durgin	77dfe99fe3	rbd: store snapshot id instead of index When a device was open at a snapshot, and snapshots were deleted or added, data from the wrong snapshot could be read. Instead of assuming the snap context is constant, store the actual snap id when the device is initialized, and rely on the OSDs to signal an error if we try reading from a snapshot that was deleted. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:52 -05:00
Josh Durgin	403f24d3d5	rbd: protect read of snapshot sequence number This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:46 -05:00
Xi Wang	50f7c4c967	rbd: fix integer overflow in rbd_header_from_disk() ondisk->snap_count is read from disk via rbd_req_sync_read() and thus needs validation. Otherwise, a bogus `snap_count' could overflow the kmalloc() size, leading to memory corruption. Also use `u32' consistently for `snap_count'. [elder@dreamhost.com: changed to use UINT_MAX rather than ULONG_MAX] Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:41 -05:00
Dan Carpenter	f8ad495a8a	rbd: use gfp_flags parameter in rbd_header_from_disk() We should use the gfp_flags that the caller specified instead of GFP_KERNEL here. There is only one caller and it uses GFP_KERNEL, so this change is just a cleanup and doesn't change how the code works. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:35 -05:00
Sage Weil	3469ac1aa3	ceph: drop support for preferred_osd pgs This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>	2012-05-07 15:33:36 -07:00
Alex Elder	cd9d9f5df6	rbd: don't hold spinlock during messenger flush A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-04-05 15:43:58 -05:00
Josh Durgin	c666601a93	rbd: move snap_rwsem to the device, rename to header_rwsem A new temporary header is allocated each time the header changes, but only the changed properties are copied over. We don't need a new semaphore for each header update. This addresses http://tracker.newdream.net/issues/2174 Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:52 -05:00
Alex Elder	32eec68d2f	rbd: don't drop the rbd_id too early Currently an rbd device's id is released when it is removed, but it is done before the code is run to clean up sysfs-related files (such as /sys/bus/rbd/devices/1). It's possible that an rbd is still in use after the rbd_remove() call has been made. It's essentially the same as an active inode that stays around after it has been removed--until its final close operation. This means that the id shows up as free for reuse at a time it should not be. The effect of this was seen by Jens Rehpoehler, who: - had a filesystem mounted on an rbd device - unmapped that filesystem (without unmounting) - found that the mount still worked properly - but hit a panic when he attempted to re-map a new rbd device This re-map attempt found the previously-unmapped id available. The subsequent attempt to reuse it was met with a panic while attempting to (re-)install the sysfs entry for the new mapped device. Fix this by holding off "putting" the rbd id, until the rbd_device release function is called--when the last reference is finally dropped. Note: This fixes: http://tracker.newdream.net/issues/1907 Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	593a9e7b34	rbd: small changes Here is another set of small code tidy-ups: - Define SECTOR_SHIFT and SECTOR_SIZE, and use these symbolic names throughout. Tell the blk_queue system our physical block size, in the (unlikely) event we want to use something other than the default. - Delete the definition of struct rbd_info, which is never used. - Move the definition of dev_to_rbd() down in its source file, just above where it gets first used, and change its name to dev_to_rbd_dev(). - Replace an open-coded operation in rbd_dev_release() to use dev_to_rbd_dev() instead. - Calculate the segment size for a given rbd_device just once in rbd_init_disk(). - Use the '%zd' conversion specifier in rbd_snap_size_show(), since the value formatted is a size_t. - Switch to the '%llu' conversion specifier in rbd_snap_id_show(). since the value formatted is unsigned. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	00f1f36ffa	rbd: do some refactoring A few blocks of code are rearranged a bit here: - In rbd_header_from_disk(): - Don't bother computing snap_count until we're sure the on-disk header starts with a good signature. - Move a few independent lines of code so they are after a check for a failed memory allocation. - Get rid of unnecessary local variable "ret". - Make a few other changes in rbd_read_header(), similar to the above--just moving things around a bit while preserving the functionality. - In rbd_rq_fn(), just assign rq in the while loop's controlling expression rather than duplicating it before and at the end of the loop body. This allows the use of "continue" rather than "goto next" in a number of spots. - Rearrange the logic in snap_by_name(). End result is the same. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	fed4c143ba	rbd: fix module sysfs setup/teardown code Once rbd_bus_type is registered, it allows an "add" operation via the /sys/bus/rbd/add bus attribute, and adding a new rbd device that way establishes a connection between the device and rbd_root_dev. But rbd_root_dev is not registered until after the rbd_bus_type registration is complete. This could (in principle anyway) result in an invalid state. Since rbd_root_dev has no tie to rbd_bus_type we can reorder these two initializations and never be faced with this scenario. In addition, unregister the device in the event the bus registration fails at module init time. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	7ef3214af2	rbd: don't allocate mon_addrs buffer in rbd_add() The mon_addrs buffer in rbd_add is used to hold a copy of the monitor IP addresses supplied via /sys/bus/rbd/add. That is passed to rbd_get_client(), which never modifies it (nor do any of the functions it gets passed to thereafter)--the mon_addr parameter to rbd_get_client() is a pointer to constant data, so it can't be modifed. Furthermore, rbd_get_client() has the length of the mon_addrs buffer and that is used to ensure nothing goes beyond its end. Based on all this, there is no reason that a buffer needs to be used to hold a copy of the mon_addrs provided via /sys/bus/rbd/add. Instead, the location within that passed-in buffer can be provided, along with the length of the "token" therein which represents the monitor IP's. A small change to rbd_add_parse_args() allows the address within the buffer to be passed back, and the length is already returned. This now means that, at least from the perspective of this interface, there is no such thing as a list of monitor addresses that is too long. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	5214ecc45c	rbd: have rbd_parse_args() report found mon_addrs size The argument parsing routine already computes the size of the mon_addrs buffer it extracts from the "command." Pass it to the caller so it can use it to provide the length to rbd_get_client(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	81a8979378	rbd: do a few checks at build time This is a bit gratuitous, but there are a few things that can be verified at build time rather than run time, so do that. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	e28fff268e	rbd: don't use sscanf() in rbd_add_parse_args() Make use of a few simple helper routines to parse the arguments rather than sscanf(). This will treat both missing and too-long arguments as invalid input (rather than silently truncating the input in the too-long case). In time this can also be used by rbd_add() to use the passed-in buffer in place, rather than copying its contents into new buffers. It appears to me that the sscanf() previously used would not correctly handle a supplied snapshot--the two final "%s" conversion specifications were not separated by a space, and I'm not sure how sscanf() handles that situation. It may not be well-defined. So that may be a bug this change fixes (but I didn't verify that). The sizes of the mon_addrs and options buffers are now passed to rbd_add_parse_args(), so they can be supplied to copy_token(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	a725f65e52	rbd: encapsulate argument parsing for rbd_add() Move the code that parses the arguments provided to rbd_add() (which are supplied via /sys/bus/rbd/add) into a separate function. Also rename the "mon_dev_name" variable in rbd_add() to be "mon_addrs". The variable represents a list of one or more comma-separated monitor IP addresses, each with an optional port number. I think "mon_addrs" captures that notion a little better. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:48 -05:00
Alex Elder	27cc25943f	rbd: simplify error handling in rbd_add() If a couple pointers are initialized to NULL then a single "out_nomem" label can be used for all of the memory allocation failure cases in rbd_add(). Also, get rid of the "irc" local variable there. There is no real need for "rc" to be type ssize_t, and it can be used in the spot "irc" was. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	60571c7d55	rbd: reduce memory used for rbd_dev fields The length of the string containing the monitor address specification(s) will never exceed the length of the string passed in to rbd_add(). The same holds true for the ceph + rbd options string. So reduce the amount of memory allocated for these to that length rather than the maximum (1024 bytes). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	d720bcb0a8	rbd: have rbd_get_client() return a rbd_client Since rbd_get_client() currently returns an error code. It assigns the rbd_client field of the rbd_device structure it is passed if successful. Instead, have it return the created rbd_client structure and return a pointer-coded error if there is an error. This makes the assignment of the client pointer more obvious at the call site. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	f0f8cef5a3	rbd: a few simple changes Here are a few very simple cleanups: - Add a "RBD_" prefix to the two driver name string definitions. - Move the definition of struct rbd_request below struct rbd_req_coll to avoid the need for an empty declaration of the latter. - Move and group the definitions of rbd_root_dev_release() and rbd_root_dev, as well as rbd_bus_type and rbd_bus_attrs[], close to the top of the file. Arrange the latter so rbd_bus_type.bus_attrs can be initialized statically. - Get rid of an unnecessary local variable in rbd_open(). - Rework some hokey logic in rbd_bus_add_dev(), so the value of "ret" at the end is either 0 or -ENOENT to avoid the need for the code duplication that was there. - Rename a goto target in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	432b858749	rbd: rename "node_lock" The spinlock used to protect rbd_client_list is named "node_lock". Rename it to "rbd_client_list_lock" to make it more obvious what it's for. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	bc534d86be	rbd: move ctl_mutex lock inside rbd_client_create() Since rbd_client_create() is only called in one place, move the acquisition of the mutex around that call inside that function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d97081b0c7	rbd: move ctl_mutex lock inside rbd_get_client() Since rbd_get_client() is only called in one place, move the acquisition of the mutex around that call inside that function. Furthermore, within rbd_get_client(), it appears the mutex only needs to be held while calling rbd_client_create(). (Moving the lock inside that function will wait for the next patch.) Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e6994d3dde	rbd: release client list lock sooner In rbd_get_client(), if a client is reused, a number of things get done while still holding the list lock unnecessarily. This just moves a few things that need no lock protection outside the lock. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d184f6bfde	rbd: restore previous rbd id sequence behavior It used to be that selecting a new unique identifier for an added rbd device required searching all existing ones to find the highest id is used. A recent change made that unnecessary, but made it so that id's used were monotonically non-decreasing. It's a bit more pleasant to have smaller rbd id's though, and this change makes ids get allocated as they were before--each new id is one more than the maximum currently in use. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	499afd5b8e	rbd: tie rbd_dev_list changes to rbd_id operations The only time entries are added to or removed from the global rbd_dev_list is exactly when a "put" or "get" operation is being performed on a rbd_dev's id. So just move the list management code into get/put routines. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e124a82f3c	rbd: protect the rbd_dev_list with a spinlock The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	1ddbe94eda	rbd: rework calculation of new rbd id's In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for id's we can limit the boundless growth of id numbers a bit by arranging to reuse the current highest id once it gets released. Add these calls to "put" the id when an rbd is getting removed. Note that this changes the pattern of device id's used--new values will never be below the highest one seen so far (even if there exists an unused lower one). I assert this is OK because the key property of an rbd id is its uniqueness, not its magnitude. Regardless, a follow-on patch will restore the old way of doing things, I just think this commit just makes the incremental change to atomics a little easier to understand. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	b7f23c361b	rbd: encapsulate new rbd id selection Move the loop that finds a new unique rbd id to use into its own helper function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Josh Durgin	cc9d734c3d	rbd: use a single value of snap_name to mean no snap There's already a constant for this anyway. Since rbd_header_set_snap() is only used to set the rbd device snap_name field, just do that within that function rather than having it take the snap_name as an argument. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net> v2: Changed interface rbd_header_set_snap() so it explicitly updates the snap_name in the rbd_device. Also added a BUILD_BUG_ON() to verify the size of the snap_name field is sufficient for SNAP_HEAD_NAME.	2012-03-22 10:47:47 -05:00
Alex Elder	1dbb439913	rbd: do not duplicate ceph_client pointer in rbd_device The rbd_device structure maintains a duplicate copy of the ceph_client pointer maintained in its rbd_client structure. There appears to be no good reason for this, and its presence presents a risk of them getting out of synch or otherwise misused. So kill it off, and use the rbd_client copy only. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	ee57741c52	rbd: make ceph_parse_options() return a pointer ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	2107978668	rbd: a few small cleanups Some minor cleanups in "drivers/block/rbd.c: - Use the more meaningful "RBD_MAX_OBJ_NAME_LEN" in place if "96" in the definition of RBD_MAX_MD_NAME_LEN. - Use DEFINE_SPINLOCK() to define and initialize node_lock. - Drop a needless (char *) cast in parse_rbd_opts_token(). - Make a few minor formatting changes. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:46 -05:00
Linus Torvalds	6c073a7ee2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"	2012-02-02 15:47:33 -08:00
Alex Elder	d23a4b3fd6	rbd: fix safety of rbd_put_client() The rbd_client structure uses a kref to arrange for cleaning up and freeing an instance when its last reference is dropped. The cleanup routine is rbd_client_release(), and one of the things it does is delete the rbd_client from rbd_client_list. It acquires node_lock to do so, but the way it is done is still not safe. The problem is that when attempting to reuse an existing rbd_client, the structure found might already be in the process of getting destroyed and cleaned up. Here's the scenario, with "CLIENT" representing an existing rbd_client that's involved in the race: Thread on CPU A \| Thread on CPU B --------------- \| --------------- rbd_put_client(CLIENT) \| rbd_get_client() kref_put() \| (acquires node_lock) kref->refcount becomes 0 \| __rbd_client_find() returns CLIENT calls rbd_client_release() \| kref_get(&CLIENT->kref); \| (releases node_lock) (acquires node_lock) \| deletes CLIENT from list \| ...and starts using CLIENT... (releases node_lock) \| and frees CLIENT \| <-- but CLIENT gets freed here Fix this by having rbd_put_client() acquire node_lock. The result could still be improved, but at least it avoids this problem. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:56:59 -08:00
Alex Elder	97bb59a03d	rbd: fix a memory leak in rbd_get_client() If an existing rbd client is found to be suitable for use in rbd_get_client(), the rbd_options structure is not being freed as it should. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:27 -08:00
Alex Elder	0e805a1d85	rbd: initialize snap_rwsem in rbd_add() New rbd device structures get initialized in rbd_add(). Many of the fields rely on being initially zero-filled. However we lockdep was noticing that the rw_semaphore embedded in the header field was not getting properly initialized. Fix that. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-12 11:00:50 -08:00
Josh Durgin	51703306b3	rbd: remove buggy rollback functionality This doesn't interact with resizing well, since it doesn't set the size of the device to the size at the snapshot. It's also an expensive operation to be synchronous. Rollback can still be done with the userspace rbd tool. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:19 -08:00
Josh Durgin	81e759fbf7	rbd: return an error when an invalid header is read This protects against opening future rbd images that have incompatible format changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-12-07 10:46:10 -08:00
Linus Torvalds	97d2eb13a0	Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client * 'for-linus' of git://ceph.newdream.net/git/ceph-client: libceph: fix double-free of page vector ceph: fix 32-bit ino numbers libceph: force resend of osd requests if we skip an osdmap ceph: use kernel DNS resolver ceph: fix ceph_monc_init memory leak ceph: let the set_layout ioctl set single traits Revert "ceph: don't truncate dirty pages in invalidate work thread" ceph: replace leading spaces with tabs libceph: warn on msg allocation failures libceph: don't complain on msgpool alloc failures libceph: always preallocate mon connection libceph: create messenger with client ceph: document ioctls ceph: implement (optional) max read size ceph: rename rsize -> rasize ceph: make readpages fully async	2011-10-28 16:42:18 -07:00
Sage Weil	6ab00d465a	libceph: create messenger with client This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Jiri Kosina	e060c38434	Merge branch 'master' into for-next Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.	2011-09-15 15:08:18 +02:00
Justin P. Mattock	699324871f	treewide: remove extra semicolons from various parts of the kernel This is a resend from the original, changing the title from PATCH to RFC(since this is a review for commit, and I should have put that the first go around). and also removing some of the commit's with ia64 and bash since it is significant. let me know if I might have missed anything etc.. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:50:49 +02:00
Josh Durgin	029bcbd8b0	rbd: set blk_queue request sizes to object size This improves performance since more requests can be merged. Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>	2011-07-26 11:29:35 -07:00
Yehuda Sadeh	79e3057c4c	rbd: cancel watch request when releasing the device We were missing this cleanup, so when a device was released the osd didn't clean up its watchers list, so following notifications could be slow as osd needed to timeout on the client. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2011-07-26 11:29:04 -07:00
Sage Weil	9db4b3e327	rbd: handle online resize of underlying rbd image If we get a notification that the image header has changed, check for a change in the image size. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:08 -07:00
Sage Weil	aedfec59ee	rbd: use snprintf for disk->disk_name Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:03 -07:00
Sage Weil	916d4d6727	rbd: cleanup: make kfree match kmalloc Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:01 -07:00
Sage Weil	13143d2d1c	rbd: warn on update_snaps failure on notify Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-19 11:25:05 -07:00
Yehuda Sadeh	1fec70932d	rbd: fix split bio handling The rbd driver currently splits bios when they span an object boundary. However, the blk_end_request expects the completions to roll up the results in block device order, and the split rbd/ceph ops can complete in any order. This patch adds a struct rbd_req_coll to track completion of split requests and ensures that the results are passed back up to the block layer in order. This fixes errors where the file system gets completion of a read operation that spans an object boundary before the data has actually arrived. The bug is easily reproduced with iozone with a working set larger than available RAM. Reported-by: Fyodor Ustinov <ufm@ufm.su> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-13 13:52:57 -07:00
Sage Weil	11f770027b	rbd: fix leak of ops struct The ops vector must be freed by the rbd_do_request caller. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-12 20:59:14 -07:00
Sage Weil	4ad12621e4	libceph: fix ceph_osdc_alloc_request error checks ceph_osdc_alloc_request returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-03 09:28:13 -07:00
Yehuda Sadeh	59c2be1e4d	rbd: use watch/notify for changes in rbd header Send notifications when we change the rbd header (e.g. create a snapshot) and wait for such notifications. This allows synchronizing the snapshot creation between different rbd clients/rools. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-22 11:33:56 -07:00
Yehuda Sadeh	766fc43973	rbd: fix cleanup when trying to mount inexistent image Previously we didn't clean up the sysfs entry that was just created. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:18 -08:00
Yehuda Sadeh	dfc5606dc5	rbd: replace the rbd sysfs interface The new interface creates directories per mapped image and under each it creates a subdir per available snapshot. This allows keeping a cleaner interface within the sysfs guidelines. The ABI documentation was updated too. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 15:53:22 -08:00
Dan Carpenter	85b5aaa624	rbd: passing wrong variable to bvec_kunmap_irq() We should be passing "buf" here insead of "bv". This is tricky because it's not the same as kmap() and kunmap(). GCC does warn about it if you compile on i386 with CONFIG_HIGHMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:25 -07:00
Dan Carpenter	b8d0638a98	rbd: null vs ERR_PTR ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:24 -07:00
Yehuda Sadeh	f4cf3deef4	block: rbd: removing unnecessary test rbd_get_segment() can't return a negative value, we don't need to check the return output. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:20 -07:00
Vasiliy Kulikov	28f259b7cd	block: rbd: fixed may leaks rbd_client_create() doesn't free rbdc, this leads to many leaks. seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense. Also if fixed check fails then seg_name is leaked. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:19 -07:00
Yehuda Sadeh	602adf4002	rbd: introduce rados block device (rbd), based on libceph The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:13 -07:00

... 7 8 9 10 11

543 Commits