linux_old1

Commit Graph

Author	SHA1	Message	Date
Lars Ellenberg	81f448629a	drbd: Fix a potential race that could case data inconsistency When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE at the same time, and it happens that the line remote = remote && drbd_should_do_remote(s); stills sees C_WF_BITMAP_S, and send_oos = rw == WRITE && drbd_should_send_oos(s); already sees C_SYNC_SOURCE both are 0. This causes the write to not be mirrored, but marked as out-of-sync on the Sync_Source node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	38a05c16b8	drbd: Consider that bio->bi_bdev might be modified below DRBD Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	72585d2428	drbd: add missing part_round_stats to _drbd_start_io_acct Without this, iostat frequently sees bogus svctime and >= 100% "utilization". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	dd9b360475	drbd: Fix module refcount leak in drbd_accept() drbd_accept was modelled after kernel_accept with drbd commit 53eb779 in July 2008. Only, kernel_accept was then broken, and only fixed later with kernel commit `1b08534e` in Dec 2008: net: Fix module refcount leak in kernel_accept() Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp, would soon have their reference count become negative, preventing them from being unloaded (likely), or worse, hit zero without actually being unused, allowing them to be unloaded while still in use (unlikely, but if triggered, causing a kernel crash). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:16 +01:00
Philipp Reisner	93f5afe956	drbd: If disk timeout expires fail only the affected volume ...and not all volumes of the resource Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:16 +01:00
Philipp Reisner	32db80f6f6	drbd: Consider the disk-timeout also for meta-data IO operations If the backing device is already frozen during attach, we failed to recognize that. The current disk-timeout code works on top of the drbd_request objects. During attach we do not allow IO and therefore never generate a drbd_request object but block before that in drbd_make_request(). This patch adds the timeout to all drbd_md_sync_page_io(). Before this patch we used to go from D_ATTACHING directly to D_DISKLESS if IO failed during attach. We can no longer do this since we have to stay in D_FAILED until all IO ops issued to the backing device returned. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Philipp Reisner	25b0d6c8c1	drbd: Reinstate disabling AL updates with invalidate-remote Commit d0ef827e (drbd: switch configuration interface from connector to genetlink) introduced a regression by removing the ability to set all bits in the out of sync bitmap and to suspend updates to the activity log of a disconnected device via the invalidate-remote management call. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Lars Ellenberg	b17f33cb0a	drbd: explicitly clear unused dp_flags in drbd_send_block We send left-over garbage from the previous packet in P_DATA_REPLY and P_RS_DATA_REPLY packets. That's bad behaviour. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Philipp Reisner	4d0fc3fdc3	drbd: Fixed compat issue with disconnecting 8.4 from a primary 8.3 For compatibility reasons 8.4 has to send P_STATE_CHG_REQ (instead of P_CONN_ST_CHG_REQ) when disconnecting. In the receiving code path we missed to convert the old answer (P_STATE_CHG_REPLY) back to 8.4 logic. Therefore the CL_ST_CHG_SUCCESS or CL_ST_CHG_FAIL bit in the flags word of mdev got set, while the state code was waiting for the CONN_WD_ST_CHG_OKAY or CONN_WD_ST_CHG_FAIL bits in tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:14 +01:00
Andreas Gruenbacher	1a3cde4406	drbd: drbd_bm_ALe_set_all(): Remove unused function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:14 +01:00
Philipp Reisner	69b6a3b159	drbd: restart loop in drbd_make_request() [prepare for Linux-3.2] With Linux-3.2 generic_make_request() will no longer loop over the request function until it finally returns 0. Move this loop into our drbd_make_request() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	7da358625c	drbd: Restore late assigning of tconn->data.sock and meta.sock With commit from Mon Mar 28 16:33:12 2011 +0200 "drbd: drbd_connect(): Initialize struct drbd_socket before sending anything" tconn->data.sock and tconn->meta.sock get assigned early, in conn_connect. The early assigning can trigger an OOPS, because it may released the socket without acquiring the mutex protecting the socket. An other thread (worker) might use setsockopt() on the socket while it gets free()ed. Restored the (proven) 8.3 behavior of assigning these sockets after the two connections are established. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	a01842ebee	drbd: Log failures of connection state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	e8cdc34335	drbd: Consider that read requests could be NEG_ACKEDed ap_in_flight only counts writes. NEG_ACKED is an action on a request that might be called for reads and writes. This bug was there forever, but it becomes much more relevant with the read balincing code. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	6ab9b1b60b	drbd: Do not send state packets while lower than C_CONNECTED cstate I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION. Sending may already work in these cstates, but the peer still expects the HandShake / ConnectionFeatures packet. Actually triggered by the Testuite on kugel. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	b8853dbd8c	drbd: fix race between disconnect and receive_state If the asender thread, or request_timer_fn(), or some other part of the code, decided to drop the connection (because of timeout or other), but the receiver just now was processing a P_STATE packet, there was a chance that receive_state() would do a hard state change "re-establishing" an already failed connection without additional handshake. Log excerpt: Remote failed to finish a request within ko-count * timeout peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) asender terminated ... peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) ... Connection closed peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 ) receiver terminated Impact: while the connection state is erroneously "Connected", requests may be queued and even sent, which would never be acknowledged, and may have been missed by the cleanup. These requests would never be completed. The next drbd_suspend_io() will then lock up, waiting forever for these requests to complete. Fixed in several code paths: Make sure the connection state is NetworkFailure or worse before starting the cleanup in drbd_disconnect(). This should make sure the cleanup won't miss any requests. Disallow receive_state() to "upgrade" the connection state from an error state. This will make sure the "illegal" state transition won't happen. For all connection failure states, relax the safe-guard in sanitize_state() again to silently mask out those state changes (e.g. Timeout -> Connected becomes Timeout -> Timeout). Note by Philipp Reisner: The 3rd chunk described as "relax the safe-guard..." is not there in 8.4 as it is relaxed to the maximum in 8.4 already Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	57bcb6cf1d	drbd: Do not call generic_make_request() while holding req_lock Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:11 +01:00
Philipp Reisner	d60de03a66	drbd: Load balancing method: striping Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:11 +01:00
Philipp Reisner	380207d08e	drbd: Load balancing of read requests New config option for the disk secition "read-balancing", with the values: prefer-local, prefer-remote, round-robin, when-congested-remote. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Philipp Reisner	d10b4ea32b	drbd: Get rid of "ASSERTION FAILED: tconn->current_epoch->list not empty" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Lars Ellenberg	615e087fbd	drbd: add missing rcu locks around recently introduced idr_for_each Recent commit drbd: Move write_ordering from mdev to tconn introduced a new idr_for_each loop over all volumes, but did not take necessary rcu locks or krefs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Andreas Gruenbacher	03d63e1d1e	drbd: Remove leftover prototype Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:09 +01:00
Philipp Reisner	975b297947	drbd: fix potential spinlock deadlock drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks left to be resynced (rs_left) in the current resync extent. If it detects a mismatch, it complains, and forces a disconnect using drbd_force_state(mdev, NS(conn, C_DISCONNECTING)); Unfortunately, this may be called while holding the req_lock, and drbd_force_state() want's to aquire that lock itself. Deadlock. Don't force a disconnect, but fix up rs_left by recounting and reassigning the number of dirty blocks in that extent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:09 +01:00
Philipp Reisner	77fede5137	drbd: Fix the WO=drain implementation for multiple volumes Wait until IO is drained in all volumes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	1e9dd2912e	drbd: Switch drbd_may_finish_epoch() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	12038a3a71	drbd: Move list of epochs from mdev to tconn This is necessary since the transfer_log on the sending is also per tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	1d2783d532	drbd: Prepare epochs per connection An epoch object needs a pointer to the mdev it was received for. This is necessary to be able to send the barrier ack packet for the same volume as the original barrier packet was assigned to. This prepares the next step, in which the (receiver side) epoch list is moved from the device (mdev) to the connection (tconn) object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:07 +01:00
Philipp Reisner	4b0007c0e8	drbd: Move write_ordering from mdev to tconn This is necessary in order to prepare the move of the (receiver side) epoch list from the device (mdev) to the connection (tconn) objects. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:07 +01:00
Philipp Reisner	6936fcb49a	drbd: Move the CREATE_BARRIER flag from connection to device That is necessary since the whole transfer log is per connection(tconn) and not per device(mdev). This bug caused list corruption on the worker list. When a barrier is queued for sending in the context of one device, another device did not see the CREATE_BARRIER bit, and queued the same object again -> list corruption. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	36baf6117b	drbd: Fixed an obvious copy-n-paste mistake Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	43de7c852b	drbd: Fixes from the drbd-8.3 branch * drbd-8.3: drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6). drbd: send intermediate state change results to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	0cfac5dd90	drbd: Fixes from the drbd-8.3 branch * drbd-8.3: drbd: fix spurious meta data IO "error" drbd: Fixed a race condition between detach and start of resync drbd: fix harmless race to not trigger an ASSERT drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:05 +01:00
Philipp Reisner	376694a054	drbd: Silenced compiler warnings Since version 4.6.1 gcc warns about variables that get a value assigned, but which are never read later on. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:05 +01:00
Philipp Reisner	9bcd252182	drbd: fix "stalled" empty resync With sync-after dependencies, given "lucky" timing of pause/unpause events, and the end of an empty (0 bits set) resync was sometimes not detected on the SyncTarget, leading to a "stalled" SyncSource state. Fixed this by expecting not only "Inconsistent -> UpToDate" but also "Consistent -> UpToDate" transitions for the peer disk state to end a resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:04 +01:00
Lars Ellenberg	22d81140ae	drbd: fix bitmap writeout after aborted resync Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:04 +01:00
Philipp Reisner	08b165ba11	drbd: Consider the discard-my-data flag for all volumes [bugz 359] ...not only for the first volume Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:04 +01:00
Andreas Gruenbacher	935be260c1	drbd: Improve error reporting in drbd_md_sync_page_io() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:03 +01:00
Lars Ellenberg	25e409321a	drbd: fix connect failure with all default net-options If no net-options are configured (all on their default), no DRBD_NLA_NET_CONF will be passed to the kernel. The kernel must not require its presence, there is no required option in there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:03 +01:00
Andreas Gruenbacher	a209b4aec3	drbd: Update some outdated comments to match the code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:02 +01:00
Philipp Reisner	c4e7afdc01	drbd: Remove unused code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:02 +01:00
Philipp Reisner	4276dea70c	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:02 +01:00
Philipp Reisner	85d735138a	drbd: Cleanup all epoch objects upon connection loss Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:01 +01:00
Philipp Reisner	f132f554ce	drbd: Do not display bogus log lines for pdsk in case pdsk < D_UNKNOWN This was a regression recently introduced with commit 7848ddb752c09b6dfd1ddfabb06b69b08aa8f6b9 "drbd: Correctly handle resources without volumes" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:00 +01:00
Lars Ellenberg	97ddb68790	drbd: detach must not try to abort non-local requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:00 +01:00
Andreas Gruenbacher	f497609e4c	drbd: Get rid of MR_{READ,WRITE}_SHIFT Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:00 +01:00
Philipp Reisner	823bd832a6	drbd: Bugfix for the connection behavior If we get into the C_BROKEN_PIPE cstate once, the state engine set the thi->t_state of the receiver thread to restarting. But with the while loop in drbdd_init() a new connection gets established. After the call into drbdd() returns immediately since the thi->t_state is not RUNNING. The restart of drbd_init() then resets thi->t_state to RUNNING. I.e. after entering C_BROKEN_PIPE once, the next successful established connection gets wasted. The two parts of the fix: * Do not cause the thread to restart if we detect the issue with the sockets while we are in C_WF_CONNECTION. * Make sure that all actions that would have set us to C_BROKEN_PIPE happen before the state change to C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:59 +01:00
Andreas Gruenbacher	7d4c782cbd	drbd: Fix the data-integrity-alg setting The last data-integrity-alg fix made data integrity checking work when the algorithm was changed for an established connection, but the common case of configuring the algorithm before connecting was still broken. Fix that. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:59 +01:00
Andreas Gruenbacher	71fc7eedb3	drbd: Turn tl_apply() into tl_abort_disk_io() There is no need to overly generalize this function; it only makes the code harder to understand. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:58 +01:00
Philipp Reisner	1b7ab15b11	drbd: Fixed w_restart_disk_io() to handle non active AL-extents Since we now apply the AL in user space onto the bitmap, the AL is not active for the requests we want to reply. For that a al_write_transaction() that might be called from worker context became necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:58 +01:00
Philipp Reisner	9b743da96c	drbd: Missing assignment of mdev before drbd_queue_work() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:58 +01:00
Philipp Reisner	3b03ad5929	drbd: Do not mod_timer() with a past time In case we can not find out why the request takes too long (happens e.g. when IO got suspended on DRBD level). rearm the timer with a reasonable value. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:57 +01:00
Philipp Reisner	3fb4746d8d	drbd: Consider that the no-data-condition could be in connected state ...when the peer has inconsistent data. In that case we failed to clear the susp_nod flag. When the local disk was attached again Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:57 +01:00
Philipp Reisner	97af09d51e	drbd: Dropped wrong clause to generate new current UUIDs Looks like a remainder from long ago. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:56 +01:00
Andreas Gruenbacher	accdbcc5f9	drbd: receive_protocol(): We cannot change our own data-integrity-alg setting here Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:56 +01:00
Andreas Gruenbacher	d505d9bef2	drbd: Be consistent in reporting incompatibilities in P_PROTOCOL settings Refer to the settings by the names which drbdsetup and drbd.conf are using. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:56 +01:00
Andreas Gruenbacher	fbc12f4514	drbd: receive_protocol(): Make the program flow less confusing Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:55 +01:00
Andreas Gruenbacher	b792c35cfb	drbd: receive_protocol(): Give variables more easily searchable names Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:55 +01:00
Andreas Gruenbacher	5af172ed9e	drbd: Print memory address in hex instead of decimal in error message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:55 +01:00
Andreas Gruenbacher	f2257a56ee	drbd: Allow to create devices with a minor number > minor_count The minor_count module/kernel parameter serves to scale the size of drbd's internal memory pool, but it is no longer a limit for the number of minors or the minor number. (Minor numbers can be arbitrarily high within the allowed limit of 2^20.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:54 +01:00
Lars Ellenberg	367d675da8	drbd: report net config even for resources without a single volume Currently it is legal (though unusual) to create and connect a resource, before adding in all necessary volumes. We should include the network configuration details, even if we don't have a single volume (yet). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:53 +01:00
Philipp Reisner	e0e1665381	drbd: Correctly handle resources without volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:52 +01:00
Philipp Reisner	a67e1d9e8c	drbd: Eliminated the "notified peer" messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:52 +01:00
Philipp Reisner	369bea6371	drbd: Fixed removal of volumes/devices from connected resources When removing a volume/device we need to switch the connection status of the peer back into WFReportParams. Before this fix it was left in Connected state. That means that the peer device continued to inform us about state changes, etc... But we deleted that minor -> protocol error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:51 +01:00
Lars Ellenberg	d5d7ebd422	drbd: on attach, enforce clean meta data Detection of unclean shutdown has moved into user space. The kernel code will, whenever it updates the meta data, mark it as "unclean", and will refuse to attach to such unclean meta data. "drbdadm up" now schedules "drbdmeta apply-al", which will apply the activity log to the bitmap, and/or reinitialize it, if necessary, as well as set a "clean" indicator flag. This moves a bit code out of kernel space. As a side effect, it also prevents some 8.3 module from accidentally ignoring the 8.4 style activity log, if someone should downgrade, whether on purpose, or accidentally because he changed kernel versions without providing an 8.4 for the new kernel, and the new kernel comes with in-tree 8.3. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:51 +01:00
Philipp Reisner	cdfda633d2	drbd: detach from frozen backing device * drbd-8.3: documentation: Documented detach's --force and disk's --disk-timeout drbd: Implemented the disk-timeout option drbd: Force flag for the detach operation drbd: Allow new IOs while the local disk in in FAILED state drbd: Bitmap IO functions can not return prematurely if the disk breaks drbd: Added a kref to bm_aio_ctx drbd: Hold a reference to ldev while doing meta-data IO drbd: Keep a reference to the bio until the completion handler finished drbd: Implemented wait_until_done_or_disk_failure() drbd: Replaced md_io_mutex by an atomic: md_io_in_use drbd: moved md_io into mdev drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:50 +01:00
Andreas Gruenbacher	2fcb8f307f	drbd: Improve the "unexpected packet" error messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:50 +01:00
Philipp Reisner	9510b2411d	drbd: Fixed state transitions in case reading meta data failes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:50 +01:00
Philipp Reisner	2ffca4f3ee	drbd: Improve compatibility with drbd's older than 8.3.7 Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:49 +01:00
Philipp Reisner	d942ae4453	drbd: Fixes from the 8.3 development branch * commit 'ae57a0a': drbd: Only print sanitize state's warnings, if the state change happens drbd: we should write meta data updates with FLUSH FUA drbd: fix limit define, we support 1 PiByte now drbd: fix log message argument order drbd: Typo in user-visible message. drbd: Make "(rcv\|snd)buf-size" and "ping-timeout" available for the proxy, too. drbd: Allow keywords to be used in multiple config sections. drbd: fix typos in comments. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:49 +01:00
Lars Ellenberg	4dbdae3ec9	drbd: downgraded error printk to info Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:48 +01:00
David Howells	3b7cd457d0	DRBD: Fix comparison always false warning due to long/long long compare Fix warnings of the following nature in the drbd header: In file included from drivers/block/drbd/drbd_bitmap.c:32: drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress': drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which is always false on a 32-bit machine. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:48 +01:00
Andreas Gruenbacher	6dff290220	drbd: Rename --dry-run to --tentative drbdadm already has a --dry-run option, so this option cannot directly be passed through to drbdsetup. Rename the drbdsetup option to resolve this conflict. For backward compatibility, make --dry-run an alias of --tentative. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:47 +01:00
Andreas Gruenbacher	d0fa7fd680	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:47 +01:00
Andreas Gruenbacher	afbbfa88bc	drbd: Allow to pass resource options to the new-resource command This is equivalent to how the attach and connect commands work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:46 +01:00
Andreas Gruenbacher	089c075d88	drbd: Convert the generic netlink interface to accept connection endpoints Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:46 +01:00
Andreas Gruenbacher	44e52cfaa2	drbd: Rename DRBD_ADM_NEED_{CONN -> RESOURCE} Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:46 +01:00
Andreas Gruenbacher	01b39b50d3	drbd: Split off netlink mandatory attribute handling into separate file Duplicate this file in the kernel module and in user space; both sides need it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:45 +01:00
Andreas Gruenbacher	7c3063cc6f	drbd: Also need to check for DRBD_GENLA_F_MANDATORY flags before nla_find_nested() This is done by introducing drbd_nla_find_nested() which handles the flag before calling nla_find_nested(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:45 +01:00
Andreas Gruenbacher	789c1b626c	drbd: Use the terminology suggested by the command names in the source code and messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:44 +01:00
Lars Ellenberg	67b58bf723	drbd: spelling fix: too small It is not "to small", but "too small". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:44 +01:00
Lars Ellenberg	fc251d5c24	drbd: cosmetic: fix accidental division instead of modulo when pretty printing For large resync rates, seq_printf_with_thousands_grouping() accidentally only produced Y,000,00Y, instead of the real numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:43 +01:00
Andreas Gruenbacher	46530e859c	drbd: Use DRBD_MINOR_COUNT_DEF in one more place Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:43 +01:00
Philipp Reisner	a67b813cfa	drbd: Lower log priority for an event that is definitely not an error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:43 +01:00
Andreas Gruenbacher	c75b9b10e7	drbd: Don't use empty nested netlink attributes Before mainline commit `ea5693cc` (v2.6.29-rc1), empty nested netlink attributes were not allowed. Fix that by leaving out nested attributes if they are empty and by allowing the top-level attributes to be missing. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:57:40 +01:00
Andreas Gruenbacher	1e2a2551ee	drbd: drbd_adm_prepare(): Pass through error codes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:56 +01:00
Philipp Reisner	d659f2aaea	drbd: Send PROTOCOL_UPDATE packets when appropriate Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:54 +01:00
Philipp Reisner	036b17eaab	drbd: Receiving part for the PROTOCOL_UPDATE packet Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:53 +01:00
Philipp Reisner	7aca6c7549	drbd: Allocation of int_dig_in and int_dig_vv was missing Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:53 +01:00
Philipp Reisner	f179d76d76	drbd: Made cmp_after_sb() more generic into convert_after_sb() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:53 +01:00
Philipp Reisner	46e1ce4177	drbd: protect updates to integrits_tfm by tconn->data->mutex Since we need to hold that mutex anyways to make sure the peer gets that change in the right position in the data stream, it makes a lot of sense to use the same mutex to ensure existence of the tfm. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:52 +01:00
Philipp Reisner	dcb20d1a8e	drbd: Refuse to change network options online when... * the peer does not speak protocol_version 100 and the user wants to change one of: - wire_protocol - two_primaries - integrity_alg * the user wants to remove the allow_two_primaries flag when there are two primaries Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:52 +01:00
Andreas Gruenbacher	95f8efd08b	drbd: Fix the upper limit of resync-after The 32-bit resync_after netlink field takes a device minor number as parameter, which is no longer limited to 255. We cannot statically verify which device numbers are valid, so set the ummer limit to the highest possible signed 32-bit integer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:51 +01:00
Andreas Gruenbacher	69ef82dea4	drbd: Refer to connect-int consistently throughout the code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:50 +01:00
Andreas Gruenbacher	6394b9358e	drbd: Refer to resync-rate consistently throughout the code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:50 +01:00
Lars Ellenberg	7dc1d67f7c	drbd: skip spurious wait_event in drbd_al_begin_io Activity log transaction writes are serialized on a bit lock. If several CPUs race to write an AL transaction, those that did not get the lock the first time may continue as soon as there are no more pending transactions. The do not need to all grab the lock in turn, just to realize that the AL is clean already, and they have nothing to do. This also closes a potential deadlock with drbd_adm_disk_opts. Once it got the AL bit lock, it knows there are no pending transactions, the AL is clean, and it should be safe to wait for all element references to drop to zero. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:49 +01:00
Andreas Gruenbacher	6139f60dc1	drbd: Rename the want_lose field/flag to discard_my_data This is what it is called in config files and on the command line as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:49 +01:00
Andreas Gruenbacher	6f9b5f84f5	drbd: Make broadcast events return NO_ERROR Instead of returning a ret_code outside of the range of enum drbd_ret_code, use NO_ERROR to indicate success. This way, ret_code has the same meaning in all packets. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:48 +01:00
Philipp Reisner	c141ebda03	drbd: Removing drbd_cfg_rwsem * Updates to all configuration items is done under genl_lock(). Including removal of mdevs or tconns. * All read non sleeping read sides are protected by rcu * All sleeping read sides keep reference counts to keep the objects alive Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:48 +01:00
Philipp Reisner	ec0bddbc55	drbd: Use RCU for the drbd_tconns list Preparing removal of drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:47 +01:00
Philipp Reisner	81fa2e675c	drbd: Refcounting for mdev objects Preparing removal of drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:47 +01:00
Andreas Gruenbacher	bb77d34ecc	drbd: Turn no-tcp-cork into tcp-cork={yes\|no} Change the --no-tcp-cork drbdsetup command line option as well as the no_cork netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:46 +01:00
Andreas Gruenbacher	e544046ab8	drbd: Turn no-md-flushes into md-flushes={yes\|no} Change the --no-md-flushes drbdsetup command line option as well as the no_md_flush netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:46 +01:00
Andreas Gruenbacher	d0c980e236	drbd: Turn no-disk-drain into disk-drain={yes\|no} Change the --no-disk-drain drbdsetup command line option as well as the no_disk_drain netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:46 +01:00
Andreas Gruenbacher	66b2f6b9c5	drbd: Turn no-disk-flushes into disk-flushes={yes\|no} Change the --no-disk-flushes drbdsetup command line option as well as the no_disk_flush netlink packet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:45 +01:00
Philipp Reisner	813472ced7	drbd: RCU for rs_plan_s This removes the issue with using peer_seq_lock out of different contexts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:44 +01:00
Philipp Reisner	d589a21e5d	drbd: Enforce limits of disk_conf members; centralized these checks Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:44 +01:00
Philipp Reisner	9958c857c7	drbd: Made the fifo object a self contained object (preparing for RCU) * Moved rs_planed into it, named total * When having a pointer to the object the values can be embedded into the fifo object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:43 +01:00
Philipp Reisner	daeda1cca9	drbd: RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:43 +01:00
Lars Ellenberg	563e4cf25e	drbd: Introduce __s32_field in the genetlink macro magic ...and drop explicit typecasts (int)meta_dev_idx < 0. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:43 +01:00
Philipp Reisner	2ec91e0e29	drbd: Renamed (old\|new)_conf into (old\|new)_net_conf in receive_SyncParam Preparing RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:42 +01:00
Philipp Reisner	dc97b70801	drbd: Split drbd_alter_sa() into drbd_sync_after_valid() and drbd_sync_after_changed() Preparing RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:42 +01:00
Philipp Reisner	ef5e44a672	drbd: drbd_dew_dev_size() gets the user requests disk_size as argument Preparing RCU for disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:41 +01:00
Philipp Reisner	a0095508ca	drbd: Renamed the net_conf_update mutex to conf_update Preparing to use the same mutex for disk_conf updates Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:41 +01:00
Philipp Reisner	934e6138b5	drbd: Removed dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:40 +01:00
Andreas Gruenbacher	b966b5dd8e	drbd: Generate the drbd_set_*_defaults() functions from drbd_genl.h Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:55:38 +01:00
Lars Ellenberg	009ba89db5	drbd: fix schedule in atomic An administrative detach used to request a state change directly to D_DISKLESS, first suspending IO to avoid the last put_ldev() occuring from an endio handler, potentially in irq context. This is not enough on the receiving side (typically secondary), we may miss some peer_req on the way to local disk, which then may do the last put_ldev() from their drbd_peer_request_endio(). This patch makes the detach always go through the intermediate D_FAILED state. We may consider to rename it D_DETACHING. Alternative approach would be to create yet an other work item to be scheduled on the worker, do the destructor work from there, and get the timing right. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:53:01 +01:00
Lars Ellenberg	992d6e91d3	drbd: fix thread stop deadlock There are races where the receiver may be exiting, but still need the worker to process some stuff. Do not wait for the receiver to die from an exiting worker. The receiver must already be dead in case the worker decides to exit. If the receiver was still alive, it may still want to queue work, and do drbd_flush_workqueue() from it's disconnect cleanup code, which would no longer be processed by an exiting worker. This also would deadlock, if the worker was to synchornously wait for the receiver to die. Do not implicitly stop the worker. The worker will only be stopped from configuration context, from conn_reconfig_done(), drbd_adm_down() or drbd_adm_delete_connection(), after making sure the receiver is already stopped. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:53:00 +01:00
Lars Ellenberg	f3dfa40a67	drbd: fix race when forcefully disconnecting If a forced disconnect hits a restarting receiver right after it passed its final "if (C_DISCONNECTING)" test in drbdd_init(), but before it was actually restarted by drbd_thread_setup, we could be left with a connection stuck in C_DISCONNECTING, never reaching C_STANDALONE, which would be necessary to take it down or reconfigure it. Move the last cleanup into w_after_conn_state_ch(), and do an additional state change request in conn_try_disconnect(), just in case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:53:00 +01:00
Andreas Gruenbacher	88104ca458	drbd: Allow to change data-integrity-alg on the fly The main purpose of this is to allow to turn data integrity checking on and off on demand without causing interruptions. Implemented by allocating tconn->peer_integrity_tfm only when receiving a P_PROTOCOL message. l accesses to tconn->peer_integrity_tf happen in worker context, and no further synchronization is necessary. On the sender side, tconn->integrity_tfm is modified under tconn->data.mutex, and a P_PROTOCOL message is sent whenever. All accesses to tconn->integrity_tfm already happen under this mutex. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:59 +01:00
Andreas Gruenbacher	a7eb7bdf58	drbd: Introduce a "lockless" variant of drbd_send_protocoll() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:59 +01:00
Andreas Gruenbacher	4b6ad6d457	drbd: Remove obsolete drbd_crypto_is_hash() We allocate hash transformations with crypto_alloc_hash() which will only return hash algorithms. It is not necessary to reconfirm that we actually got a hash algorithm. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:58 +01:00
Andreas Gruenbacher	5b614abe30	drbd: Rename integrity_r_tfm -> peer_integrity_tfm Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:58 +01:00
Andreas Gruenbacher	8d412fc6d5	drbd: Rename integrity_w_tfm -> integrity_tfm Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:58 +01:00
Andreas Gruenbacher	86db06180a	drbd: Wrong use of RCU in receive_protocol() It is not enough to grab net_conf->integrity_alg under rcu_read_lock() and access it outside of it; the entire net_conf object may be gone by then. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:57 +01:00
Lars Ellenberg	acb104c396	drbd: fix copy/paste error in comment Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:57 +01:00
Lars Ellenberg	b57a1e27ee	drbd: rename variable sc to res_opts sc was short for syncer conf, which does not exist anymore anyways. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:56 +01:00
Lars Ellenberg	5ecc72c3b9	drbd: rename variable ndc to new_disk_conf Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:52:54 +01:00
Lars Ellenberg	5979e36155	drbd: on reconfiguration requests, mind the SET_DEFAULTS flag The DRBD_GENL_F_SET_DEFAULTS flag was ignored for drbd_adm_disk_opts() and drbd_adm_net_opts(). Factor out drbd_set_*_defaults() helper functions, and call them appropriately. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:50:38 +01:00
Philipp Reisner	0fd0ea064c	drbd: Consider all crypto options in connect and in net-options So for this was simply not considered after the options have been re-arranged. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:08 +01:00
Lars Ellenberg	d9cc6e2318	drbd: fix various disconnecting races If an admin requests disconnect at a time when the state handling already disconnects/reconnects, there have been some races. Make sure to always really stop the network threads before returning success for disconnect. Do not pretend successfull forced disconnect, if the state handling returned an error. Return success from drbd_adm_down() only after all threads are finished. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:08 +01:00
Lars Ellenberg	5ee743e92d	drbd: remove useless kobject_uevent from drbd_adm_connect Calling kobject_uevent, which may sleep, from within rcu_read_lock() protected regions is not possible. This particular kobject_uevent also is also wrong. It was supposed to trigger a udev run, just in case something relevant to udev symlink magic has changed, when adjusting runtime re-configurable settings while we still had the "syncer conf". It was improperly placed in connect when we dropped the "syncer conf". The right thing to do is probably to call "udevadm trigger" directly in those cases where drbdadm thinks there was a need to trigger extra udev runs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:07 +01:00
Philipp Reisner	a18e9d1eb0	drbd: Removed the OBJECT_DYING and the CONFIG_PENDING bits superseded by refcounting Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:07 +01:00
Philipp Reisner	0ace9dfabe	drbd: Take a reference on tconn when finding a tconn by name Rule #3 of kref.txt Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:06 +01:00
Philipp Reisner	9dc9fbb357	drbd: Basic refcounting for drbd_tconn References hold by: * Each (running) drbd thread has a reference on tconn * Each mdev has a referenc on tconn * Beeing in the all_tconn list counts for one reference * Each after_conn_state_chg_work has a reference to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:06 +01:00
Philipp Reisner	1d04122599	drbd: Eliminated drbd_free_resoruces() it is superseeded by conn_free_crypto() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:05 +01:00
Lars Ellenberg	f5e2b8b3b6	drbd: move comment about stopping the receiver thread to where it belongs When the last volume of a replication group is unconfigured, the worker thread exits. To not interfere with cleanup of other threads, before the the last cleanups run, we need to make sure the receiver has already exited. The commend explaining that clearly belongs above drbd_thread_stop(&tconn->receiver), not in the cleanup loop below. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:05 +01:00
Lars Ellenberg	ae25b336e0	drbd: cmdname() enum to string convertion was missing a few constants Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:05 +01:00
Lars Ellenberg	ed439848ca	drbd: fix setsockopt for user mode linux We use our own copy of kernel_setsockopt, and did not mess around with get_fs/set_fs, since we thought we knew we would always be KERNEL_DS anyways. Apparently not so for at least user mode linux, so put the set_fs(KERNEL_DS) in there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:04 +01:00
Lars Ellenberg	71932efc1c	drbd: allow status dump request all volumes of a specific resource We had drbd_adm_get_status (one single volume), and drbd_adm_get_status_all (dump of all volumes of all resources). This enhances the latter to be able to dump all volumes of just one specific resource. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:04 +01:00
Philipp Reisner	302bdeae49	drbd: Considering that the two_primaries config flag can change Now since it is possible to change the two_primaries config flag while the connection is up, make sure we treat a peer_req in a consistent way if the config flag changes while the peer_req is under IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:03 +01:00
Philipp Reisner	91fd4dad64	drbd: Proper locking for updates to net_conf under RCU Removing the get_net_conf()/put_net_conf() functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:49:03 +01:00
Philipp Reisner	44ed167da7	drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf Removing the get_net_conf()/put_net_conf() calls Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:48:59 +01:00
Philipp Reisner	b032b6fa35	drbd: Allow online change of replication protocol only with agreed_pv >= 100 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:18 +01:00
Philipp Reisner	cd64397c0b	drbd: Check consistency of net options when the get changed online Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:18 +01:00
Philipp Reisner	303d1448a0	drbd: Runtime changeable wire protocol The wire protocol is no longer a property that is negotiated between the two peers. It is now expressed with two bits (DP_SEND_WRITE_ACK and DP_SEND_RECEIVE_ACK) in each data packet. Therefore the primary node is free to change the wire protocol at any time without disconnect/reconnect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:18 +01:00
Philipp Reisner	d3fcb4908d	drbd: protect all idr accesses that might sleep with drbd_cfg_rwsem With this commit the locking for all accesses to IDRs is complete: * Non sleeping read accesses are protected by RCU * sleeping read accesses are protocted by a read lock on drbd_cfg_rwsem * accesses that add anything are protected by a write lock * accesses that remove an object are protoected by a write lock and a call to synchronize_rcu() after it is removed from the IDR and before the object is actually free()ed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:17 +01:00
Philipp Reisner	ef35626284	drbd: Converted drbd_cfg_mutex into drbd_cfg_rwsem Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:17 +01:00
Philipp Reisner	695d08fa94	drbd: rcu_read_[un]lock() for all idr accesses that do not sleep Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:16 +01:00
Philipp Reisner	cd1d9950f6	drbd: Inlined drbd_free_mdev(); it got called only from one place Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:16 +01:00
Philipp Reisner	ff370e5a9e	drbd: drbd_delete_device() takes a struct drbd_conf * now Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:15 +01:00
Andreas Gruenbacher	5cc287e0ae	drbd: Rename drbd_pp_free() to drbd_free_pages() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:15 +01:00
Andreas Gruenbacher	c37c8ecfee	drbd: Rename drbd_pp_alloc() to drbd_alloc_pages() and make it non-static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:15 +01:00
Andreas Gruenbacher	18c2d52249	drbd: Rename drbd_pp_first_pages_or_try_alloc() to __drbd_alloc_pages() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:14 +01:00
Andreas Gruenbacher	d4da15374b	drbd: Make drbd_wait_ee_list_empty() and _drbd_wait_ee_list_empty() static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:14 +01:00
Andreas Gruenbacher	045417f75c	drbd: Rename drbd_{ ee -> peer_req }_has_active_page Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:13 +01:00
Andreas Gruenbacher	a990be4682	drbd: Rename reclaim_net_ee(), drbd_process_done_ee(), drbd_process_done_ee(), tconn_process_done_ee() to *_peer_reqs Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:13 +01:00
Andreas Gruenbacher	7721f5675e	drbd: Rename drbd_release_ee() to drbd_free_peer_reqs() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:13 +01:00
Andreas Gruenbacher	3967deb192	drbd: Rename drbd_free_ee() and variants to *_peer_req() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:12 +01:00
Andreas Gruenbacher	0db55363cb	drbd: Rename drbd_alloc_ee() to drbd_alloc_peer_req() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:12 +01:00
Andreas Gruenbacher	e0ab6ad4bc	drbd: drbd_init_ee() no longer exists Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:11 +01:00
Andreas Gruenbacher	2735a59467	drbd: Make all asynchronous command handlers return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:11 +01:00
Andreas Gruenbacher	859976758d	drbd: validate_req_change_req_state(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:11 +01:00
Andreas Gruenbacher	b55d84ba17	drbd: Removed outdated comments and code that envisioned VNRs in header 95 Since have now header 100, that has space for 16 bit volume numbers, the high byte of the length in header 95 is no longer reserved for 8 bit volume numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:10 +01:00
Andreas Gruenbacher	0c8e36d9b8	drbd: Introduce protocol version 100 headers The 8 byte header finally becomes too small. With the protocol 100 header we have 16 bit for the volume number, proper 32 bit for the data length, and 32 bit for further extensions in the future. Previous versions of drbd are using version 80 headers for all packets short enough for protocol 80. They support both header versions in worker context, but only version 80 headers in asynchronous context. For backwards compatibility, continue to use version 80 headers for short packets before protocol version 100. From protocol version 100 on, use the same header version for all packets. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:10 +01:00
Andreas Gruenbacher	e658983af6	drbd: Remove headers from on-the-wire data structures (struct p_*) Prepare the introduction of the protocol 100 headers. The actual protocol header is removed for the packet declarations. I.e. allow us to use the packets with different headers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:09 +01:00
Andreas Gruenbacher	50d0b1ad78	drbd: Remove some fixed header size assumptions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:09 +01:00
Andreas Gruenbacher	da39fec492	drbd: Remove now-unused int_dig_out buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:09 +01:00
Andreas Gruenbacher	9f5bdc339e	drbd: Replace and remove old primitives Centralize sock->mutex locking and unlocking in [drbd\|conn]_prepare_command() and [drbd\|conn]_send_comman(). Therefore all _send_ functions are touched to use these primitives instead of drbd_get_data_sock()/drbd_put_data_sock() and former helper functions. That change makes the _send_ functions more standardized. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:08 +01:00
Andreas Gruenbacher	52b061a440	drbd: Introduce drbd_header_size() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:08 +01:00
Andreas Gruenbacher	dba5858750	drbd: Introduce new primitives for sending commands Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:07 +01:00
Andreas Gruenbacher	a17647aae4	drbd: drbd_send_ping(), drbd_send_ping(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:07 +01:00
Philipp Reisner	8b924f1d63	drbd: Use tconn in request_timer_fn() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:07 +01:00
Philipp Reisner	706cb24c23	drbd: Improved logging of state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:06 +01:00
Philipp Reisner	a6d00c8ec3	drbd: Implemented IO thawing for multiple volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:06 +01:00
Philipp Reisner	4669265a7b	drbd: Implemented conn_lowest_disk() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:05 +01:00
Philipp Reisner	19f83c7661	drbd: Implemented conn_lowest_conn() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:05 +01:00
Philipp Reisner	8c7e16c39f	drbd: Calculate and provide ns_min to the w_after_conn_state_ch() work Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:05 +01:00
Philipp Reisner	5f082f98f5	drbd: Renamed nms to ns_max Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:04 +01:00
Philipp Reisner	da9fbc276e	drbd: Introduced a new type union drbd_dev_state Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:04 +01:00
Philipp Reisner	8e0af25fa8	drbd: Moved susp, susp_nod and susp_fen to the connection object Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:03 +01:00
Philipp Reisner	2aebfabb17	drbd: Renamed id_susp(union drbd_state s) to drbd_suspended(struct drbd_conf *) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:03 +01:00
Philipp Reisner	78bae59b1b	drbd: Introduced drbd_read_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:03 +01:00
Lars Ellenberg	e15766e9c9	drbd: improvements to activate/deactivate multiple activity log extents Recent commit drbd: get rid of bio_split, allow bios of "arbitrary" size had a reference count leak: it only deactivated the first of several activity log extents for intervals crossing extent boundaries. This commit generalizes on bios spanning multiple activity log extents in drbd_al_begin_io, and adds the necessary loop around lc_put in drbd_al_complete_io as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:02 +01:00
Lars Ellenberg	23361cf32b	drbd: get rid of bio_split, allow bios of "arbitrary" size Where "arbitrary" size is currently 1 MiB, which is the BIO_MAX_SIZE for architectures with 4k PAGE_CACHE_SIZE (most). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:02 +01:00
Lars Ellenberg	7726547e67	drbd: prepare to activate two activity log extents at once Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:01 +01:00
Lars Ellenberg	181286ad22	drbd: preparation commit, pass drbd_interval to drbd_al_begin/complete_io We want to avoid bio_split for bios crossing activity log boundaries. So we may need to activate two activity log extents "atomically". drbd_al_begin_io() needs to know more than just the start sector. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:01 +01:00
Lars Ellenberg	85f103d88c	drbd: introduce the "initialized" activity log transaction type So we can initialize a clean on disk activity log area, without the module complaining with loud assert messages because of checksum or magic value mismatches. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:01 +01:00
Andreas Gruenbacher	6038178ebe	drbd: Change how the "handshake" packets are called Packets of type P_HAND_SHAKE define which protocol versions and features a node supports. For clarity, call those packets P_CONNECTION_FEATURES instead. (This does not determine the features that a specific drbd device supports, such as drbd protocol A, B, C.) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:00 +01:00
Andreas Gruenbacher	e5d6f33abe	drbd: Change how the initial packets are called The first packets exchanged when a connection is established are referred to as P_HAND_SHAKE_S and P_HAND_SHAKE_M in the code, followed by P_HAND_SHAKE packets. To avoid confusion between these two unrelated things, call the initial packets P_INITIAL_DATA and P_INITIAL_META. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:45:00 +01:00
Andreas Gruenbacher	7c96715aa8	drbd: _conn_send_cmd(), _drbd_send_cmd(): Pass a struct drbd_socket instead of a plain socket Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:59 +01:00
Andreas Gruenbacher	2bf896213d	drbd: drbd_connect(): Initialize struct drbd_socket before sending anything Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:59 +01:00
Andreas Gruenbacher	1952e9166a	drbd: Map from (connection, volume number) to device in the asender handlers Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:59 +01:00
Andreas Gruenbacher	e05e1e59db	drbd: Pass struct packet_info down to the asender receive functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:58 +01:00
Philipp Reisner	438c8374ae	drbd: Do not segfault if a sync dependency reaches a diskless device Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:58 +01:00
Philipp Reisner	778bcf2e29	drbd: Allow to disconnect if one volume is diskless Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:58 +01:00
Philipp Reisner	435693e89b	drbd: Print common state changes of all volumes as connection state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:57 +01:00
Philipp Reisner	88ef594ed7	drbd: Fixed logging of old connection state During a disconnect the oc variable in _conn_request_state() could become outdated. Determin the common old state after sleeping. While at it, I implemented that for all parts of the state Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:57 +01:00
Philipp Reisner	bd0c824a9d	drbd: Use the idr_for_each_entry() iterator instead of idr_for_each() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:56 +01:00
Andreas Gruenbacher	4a76b1612f	drbd: Map from (connection, volume number) to device in the receive handlers The receive handlers do not all handle unknown volume numbers the same way. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:56 +01:00
Andreas Gruenbacher	e28572167c	drbd: Pass struct packet_info down to the receive functions Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:56 +01:00
Andreas Gruenbacher	49ba9b1bb3	drbd: Remove useless error messages These messages can only trigger in case there is a pretty obvious internal programming error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:55 +01:00
Andreas Gruenbacher	deebe19579	drbd: A small cleanup in drbdd() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:55 +01:00
Andreas Gruenbacher	79ed9bd053	drbd: _drbd_send_bitmap(): Use the pre-allocated send buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:54 +01:00
Andreas Gruenbacher	5a87d920f3	drbd: Preallocate one page per drbd_socket as a send buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:54 +01:00
Andreas Gruenbacher	fc56815c81	drbd: receive_bitmap(): Use the pre-allocated receive buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:54 +01:00
Andreas Gruenbacher	e6ef8a5cb3	drbd: Preallocate one page per drbd_socket as a receive buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:53 +01:00
Philipp Reisner	cb703454a2	drbd: Converted drbd_try_outdate_peer() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:53 +01:00
Andreas Gruenbacher	a02d124091	drbd: Rename the DCBP_* functions to dcbp_* and move them to where they are used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:52 +01:00
Andreas Gruenbacher	058820cdd7	drbd: Make _drbd_send_bitmap() static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:52 +01:00
Andreas Gruenbacher	e307f352b4	drbd: Move drbd_send_ping() and drbd_send_ping_ack() to drbd_main.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:52 +01:00
Andreas Gruenbacher	0916e0e308	drbd: Always use the same protocol version for the same peer There is no need to send protocol 80 headers to peers that understand protocol 95 headers. Make sure that we don't send protocol 95 headers until we have agreed upon a protocol version with our peer, though. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:51 +01:00
Andreas Gruenbacher	0829f5edf3	drbd: drbd_connected(): Return an error code upon failure. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:51 +01:00
Andreas Gruenbacher	a5c3190435	drbd: Introduce and use drbd_recv_all_warn() The pattern of receiving a fixed number of bytes and warning if a short packet is received and the receiver has not actively been interruped is repeated many times; clean that up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:50 +01:00
Andreas Gruenbacher	309a834896	drbd: Get rid of typedef drbd_work_cb This type is not used anywhere else. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:50 +01:00
Andreas Gruenbacher	8f7bed7774	drbd: Rename various functions from _oos_ to _out_of_sync_ for clarity Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:50 +01:00
Andreas Gruenbacher	0da34df0d0	drbd: drbd_may_do_local_read(): Use bool/true/false Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:49 +01:00
Andreas Gruenbacher	1097e9a80c	drbd: Remove unnecessary assertion This is also checked further below in the same function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:49 +01:00
Andreas Gruenbacher	69f5ec728c	drbd: Remove duplicate initialization Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:49 +01:00
Andreas Gruenbacher	3fbf4d21ae	drbd: drbd_md_sync_page_io(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:48 +01:00
Andreas Gruenbacher	ac29f4039a	drbd: _drbd_md_sync_page_io(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:48 +01:00
Andreas Gruenbacher	22ab6a30b8	drbd: drbd_bm_read() never returns a positive value through drbd_bitmap_io() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:47 +01:00
Andreas Gruenbacher	82bc01940a	drbd: Make all command handlers return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:47 +01:00
Andreas Gruenbacher	c696774691	drbd: Add drbd_recv_all(): Receive an entire buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:47 +01:00
Andreas Gruenbacher	a982dd579c	drbd: send_bitmap_rle_or_plain(): Error handling cleanup Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:46 +01:00
Andreas Gruenbacher	e1c1b0fc8f	drbd: recv_resync_read(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:46 +01:00
Andreas Gruenbacher	28284ceff0	drbd: recv_dless_read(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:45 +01:00
Andreas Gruenbacher	fc5be8397f	drbd: drbd_drain_block(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:45 +01:00
Andreas Gruenbacher	69bc7bc351	drbd: drbd_recv_header(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:45 +01:00
Andreas Gruenbacher	8172f3e9bf	drbd: decode_header(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:44 +01:00
Andreas Gruenbacher	e2b3032b90	drbd: drbd_process_done_ee(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:44 +01:00
Andreas Gruenbacher	99920dc5c5	drbd: Make all worker callbacks return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:43 +01:00
Andreas Gruenbacher	b2f0ab62ec	drbd: Temporarily change the return type of all worker callbacks This helps to ensure that we don't miss one of them when changing their return value semantics. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:43 +01:00
Andreas Gruenbacher	a896527c06	drbd: drbd_send_short_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:43 +01:00
Andreas Gruenbacher	6bdb9b0e23	drbd: drbd_send_dblock(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:42 +01:00
Andreas Gruenbacher	7fae55da38	drbd: _drbd_send_bio(), _drbd_send_zc_bio(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:42 +01:00
Andreas Gruenbacher	7b57b89d62	drbd: drbd_send_block(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:42 +01:00
Andreas Gruenbacher	9f69230cd6	drbd: _drbd_send_zc_ee(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:41 +01:00
Andreas Gruenbacher	88b390ff63	drbd: _drbd_send_page(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:41 +01:00
Andreas Gruenbacher	b987427b53	drbd: _drbd_no_send_page(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:40 +01:00
Andreas Gruenbacher	73218a3c4c	drbd: drbd_send_oos(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:40 +01:00
Andreas Gruenbacher	db1b0b724e	drbd: drbd_send_drequest_csum(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:40 +01:00
Andreas Gruenbacher	6c1005e74d	drbd: drbd_send_drequest(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:39 +01:00
Andreas Gruenbacher	5b9f499c66	drbd: drbd_send_ov_request(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:39 +01:00
Andreas Gruenbacher	fa79abd893	drbd: drbd_send_ack_ex(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:38 +01:00
Andreas Gruenbacher	a9a9994dc7	drbd: drbd_send_ack_{dp,rp}(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:38 +01:00
Andreas Gruenbacher	dd5161218b	drbd: drbd_send_ack(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:38 +01:00
Andreas Gruenbacher	a8c32aa846	drbd: _drbd_send_ack(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:37 +01:00
Andreas Gruenbacher	d4e67d7c4f	drbd: drbd_send_b_ack(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:37 +01:00
Andreas Gruenbacher	2f4e7abe51	drbd: drbd_send_sr_reply(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:36 +01:00
Andreas Gruenbacher	d24ae219e9	drbd: drbd_send_state_req(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:36 +01:00
Andreas Gruenbacher	caee1c3a92	drbd: conn_send_state_req(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:36 +01:00
Andreas Gruenbacher	758970c832	drbd: _conn_send_state_req(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:35 +01:00
Andreas Gruenbacher	f02d4d0a9c	drbd: drbd_send_sizes(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:35 +01:00
Andreas Gruenbacher	9c1b7f7282	drbd: drbd_gen_and_send_sync_uuid(): Return void: the result is never used Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:34 +01:00
Andreas Gruenbacher	2ae5f95b1a	drbd: drbd_send_uuids() and its variants: Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:34 +01:00
Andreas Gruenbacher	387eb30817	drbd: drbd_send_protocol(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:34 +01:00
Andreas Gruenbacher	e8d17b015e	drbd: drbd_send_handshake(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:33 +01:00
Andreas Gruenbacher	927036f908	drbd: drbd_send_state(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:33 +01:00
Andreas Gruenbacher	103ea27528	drbd: drbd_send_sync_param(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:33 +01:00
Andreas Gruenbacher	f725446353	drbd: drbd_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:32 +01:00
Andreas Gruenbacher	7d168ed30f	drbd: Get rid of USE_DATA_SOCKET and USE_META_SOCKET Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:32 +01:00
Andreas Gruenbacher	596a37f9ef	drbd: conn_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:31 +01:00
Andreas Gruenbacher	04dfa13788	drbd: _drbd_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:31 +01:00
Andreas Gruenbacher	ecf2363cb5	drbd: _conn_send_cmd(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:31 +01:00
Andreas Gruenbacher	ce9879cb1f	drbd: conn_send_cmd2(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:30 +01:00
Andreas Gruenbacher	fb708e408f	drbd: Add drbd_send_all(): Send an entire buffer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:30 +01:00
Andreas Gruenbacher	11b0be28e5	drbd: drbd_get_data_sock(): Return 0 upon success and an error code otherwise Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:29 +01:00
Andreas Gruenbacher	c0d42c8e57	drbd: drbd_send(): Return a "real" error code if we have no socket Q: Can this case even trigger? Is failing this way any better than one that causes a NULL pointer access? Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:29 +01:00
Philipp Reisner	e90285e0ba	drbd: Fixed conn_lowest_minor It actually returned the lowest volume number. While doing that renamed a few wrongly named variables. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:28 +01:00
Lars Ellenberg	f399002e68	drbd: distribute former syncer_conf settings to disk, connection, and resource level This commit breaks the API again. Move per-volume former syncer options into disk_conf. Move per-connection former syncer options into net_conf. Renamed the remainign sync_conf to res_opts Syncer settings have been changeable at runtime, so we need to prepare for these settings to be runtime-changeable in their new home as well. Introduce new configuration operations, and share the netlink attribute between "attach" (create new disk) and "disk-opts" (change options). Same for "connect" and "net-opts". Some fields cannot be changed at runtime, however. Introduce a new flag GENLA_F_INVARIANT to be able to trigger on that in the generated validation and assignment functions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:44:20 +01:00
Roger Pau Monne	cb5bd4d19b	xen/blkback: persistent-grants fixes This patch contains fixes for persistent grants implementation v2: * handle == 0 is a valid handle, so initialize grants in blkback setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported by Konrad Rzeszutek Wilk. * new_map is a boolean, use "true" or "false" instead of 1 and 0. Reported by Konrad Rzeszutek Wilk. * blkfront announces the persistent-grants feature as feature-persistent-grants, use feature-persistent instead which is consistent with blkback and the public Xen headers. * Add a consistency check in blkfront to make sure we don't try to access segments that have not been set. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> [v1: The new_map int->bool had already been changed] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-04 10:35:40 -05:00
Philipp Reisner	6b75dced00	drbd: conn_khelper() for user mode callbacks for connections Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:32 +01:00
Philipp Reisner	047e95e259	drbd: Allow volumes to become primary only on one side Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:31 +01:00
Lars Ellenberg	40cbf085f5	drbd: fix conn_reconfig_start without conn_reconfig_done in drbd_adm_attach If drbd_adm_attach failed early, it left the CONFIG_PENDING bit on, blocking any further conn_reconfig_start on that connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:31 +01:00
Philipp Reisner	e4f78edee1	drbd: Separate connection state changes from minor dev state changes #2 New function got_conn_RqSReply() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:30 +01:00
Philipp Reisner	f19e4f8ba7	drbd: Converted got_Ping() and got_PingAck() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:30 +01:00
Philipp Reisner	a4fbda8eca	drbd: Allow packet handler functions that take a connection (meta connection) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:29 +01:00
Philipp Reisner	dfafcc8a7b	drbd: Separate connection state changes from minor dev state changes #1 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:28 +01:00
Philipp Reisner	7204624c5e	drbd: Converted receive_protocol() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:28 +01:00
Philipp Reisner	d9ae84e790	drbd: Allow packet handler functions that take a connection That is necessary in case a connection does not have a volume 0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:27 +01:00
Philipp Reisner	8169e41b3e	drbd: Moved CONN_DRY_RUN to the per connection (tconn) flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:27 +01:00
Philipp Reisner	38fa9988fa	drbd: Do not modify the connection state with something else that conn_request_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:26 +01:00
Philipp Reisner	34f646bd57	drbd: Allow two diskless minors to be connected In the context of drbd-8.4 it no longer makes sense to dissalow that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:25 +01:00
Philipp Reisner	2325eb661f	drbd: New minors have to intherit the connection state form their connection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:25 +01:00
Philipp Reisner	082a3439a2	drbd: process_done_ee() has to handle unconfigured devices now Took the chance and converted tconn_process_done_ee() to use idr_for_each_entry() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:24 +01:00
Philipp Reisner	2de876efa6	drbd: Ignore packets for non existing volumes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:23 +01:00
Lars Ellenberg	85f75dd763	drbd: introduce in-kernel "down" command This greatly simplifies deconfiguration of whole resources. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:23 +01:00
Lars Ellenberg	3c5e5f6afd	drbd: add forgotten spin_unlock somehow a "goto abort" was introduced with commit drbd: Extracted is_valid_transition() out of sanitize_state() which left drbd_req_state still holding the spin lock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:22 +01:00
Lars Ellenberg	527f4b24e5	drbd: bail out if a config requrest is over-determined, and not matching We have resources resp. connections, volumes, and minor numbers. A config request may specifies all three of them. If it turns out that the minor belongs to a different connection, or a different volume number in the same connection, that configuration request is invalid. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:21 +01:00
Lars Ellenberg	38f19616d2	drbd: new-connection and new-minor succeed, if the object already exists Follow O_CREAT semantics when creating connection or minor device/volume objects. If we need O_CREAT\|O_EXCL semantics some time down the road, we can add NLM_F_EXCL to the netlink message flags. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:21 +01:00
Lars Ellenberg	cffec5b2fe	drbd: Allow a Diskless Secondary volume to be removed Even if the connection is still established. We should be able to reduce a volume from a replication group, without taking the whole group offline. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:20 +01:00
Lars Ellenberg	d0456c72df	drbd: simplify conn_all_vols_unconf, make it bool Get rid of a temporary variable and, funny bitand assignment. Just short circuit, returning false, once we encounter the first still configured volume. FIXME verify call sites for need of rcu_read_lock or stronger. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:19 +01:00
Lars Ellenberg	543cc10b4c	drbd: drbd_adm_get_status needs to show some more detail We want to see existing connection objects, even if they do not currently have volumes attached. Change the .dumpit variant of drbd_adm_get_status to iterate not over minor devices, but over connections + volumes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:19 +01:00
Lars Ellenberg	8432b31457	drbd: allow holes in minor and volume id allocation s/idr_get_new/idr_get_new_above/ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:17 +01:00
Lars Ellenberg	3b98c0c209	drbd: switch configuration interface from connector to genetlink Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-04 00:16:17 +01:00
Alex Elder	9e15b77d9a	rbd: get additional info in parent spec When a layered rbd image has a parent, that parent is identified only by its pool id, image id, and snapshot id. Images that have been mapped also record names for those three id's. Add code to look up these names for parent images so they match mapped images more closely. Skip doing this for an image if it already has its pool name defined (this will be the case for images mapped by the user). It is possible that an the name of a parent image can't be determined, even if the image id is valid. If this occurs it does not preclude correct operation, so don't treat this as an error. On the other hand, defined pools will always have both an id and a name. And any snapshot of an image identified as a parent for a clone image will exist, and will have a name (if not it indicates some other internal error). So treat failure to get these bits of information as errors. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	86b00e0da6	rbd: get parent spec for version 2 images Add support for getting the the information identifying the parent image for rbd images that have them. The child image holds a reference to its parent image specification structure. Create a new entry "parent" in /sys/bus/rbd/image/N/ to report the identifying information for the parent image, if any. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	a92ffdf8a9	rbd: allow null image name Format 2 parent images are partially identified by their image id, but it may not be possible to determine their image name. The name is not strictly needed for correct operation, so we won't be treating it as an error if we don't know it. Handle this case gracefully in rbd_name_show(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	2c0d0a10ea	rbd: allow null image name We will know the image id for format 2 parent images, but won't initially know its image name. Avoid making the query for an image id in rbd_dev_image_id() if it's already known. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	83a0626362	rbd: encapsulate last part of probe Group the activities that now take place after an rbd_dev_probe() call into a single function, and move the call to that function into rbd_dev_probe() itself. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	c53d589337	rbd: define rbd_dev_{create,destroy}() helpers Encapsulate the creation/initialization and destruction of rbd device structures. The rbd_client and the rbd_spec structures provided on creation hold references whose ownership is transferred to the new rbd_device structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	bd4ba6554d	rbd: consolidate rbd_dev init in rbd_add() Group the allocation and initialization of fields of the rbd device structure created in rbd_add(). Move the grouped code down later in the function, just prior to the call to rbd_dev_probe(). This is for the most part simple code movement. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	9d3997fdf4	rbd: don't pass rbd_dev to rbd_get_client() The only reason rbd_dev is passed to rbd_get_client() is so its rbd_client field can get assigned. Instead, just return the rbd_client pointer as a result and have the caller do the assignment. Change rbd_put_client() so it takes an rbd_client structure, so follows the more typical symmetry with rbd_get_client(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	859c31df9c	rbd: fill rbd_spec in rbd_add_parse_args() Pass the address of an rbd_spec structure to rbd_add_parse_args(). Use it to hold the information defining the rbd image to be mapped in an rbd_add() call. Use the result in the caller to initialize the rbd_dev->id field. This means rbd_dev is no longer needed in rbd_add_parse_args(), so get rid of it. Now that this transformation of rbd_add_parse_args() is complete, correct and expand on the its header documentation to reflect the new reality. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:42 -05:00
Alex Elder	8b8fb99c5c	rbd: add reference counting to rbd_spec With layered images we'll share rbd_spec structures, so add a reference count to it. It neatens up some code also. A silly get/put pair is added to the alloc routine just to avoid "defined but not used" warnings. It will go away soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-11-01 07:55:41 -05:00
Roger Pau Monne	0a8704a51f	xen/blkback: Persistent grant maps for xen blk drivers This patch implements persistent grants for the xen-blk{front,back} mechanism. The effect of this change is to reduce the number of unmap operations performed, since they cause a (costly) TLB shootdown. This allows the I/O performance to scale better when a large number of VMs are performing I/O. Previously, the blkfront driver was supplied a bvec[] from the request queue. This was granted to dom0; dom0 performed the I/O and wrote directly into the grant-mapped memory and unmapped it; blkfront then removed foreign access for that grant. The cost of unmapping scales badly with the number of CPUs in Dom0. An experiment showed that when Dom0 has 24 VCPUs, and guests are performing parallel I/O to a ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests (at which point 650,000 IOPS are being performed in total). If more than 5 guests are used, the performance declines. By 10 guests, only 400,000 IOPS are being performed. This patch improves performance by only unmapping when the connection between blkfront and back is broken. On startup blkfront notifies blkback that it is using persistent grants, and blkback will do the same. If blkback is not capable of persistent mapping, blkfront will still use the same grants, since it is compatible with the previous protocol, and simplifies the code complexity in blkfront. To perform a read, in persistent mode, blkfront uses a separate pool of pages that it maps to dom0. When a request comes in, blkfront transmutes the request so that blkback will write into one of these free pages. Blkback keeps note of which grefs it has already mapped. When a new ring request comes to blkback, it looks to see if it has already mapped that page. If so, it will not map it again. If the page hasn't been previously mapped, it is mapped now, and a record is kept of this mapping. Blkback proceeds as usual. When blkfront is notified that blkback has completed a request, it memcpy's from the shared memory, into the bvec supplied. A record that the {gref, page} tuple is mapped, and not inflight is kept. Writes are similar, except that the memcpy is peformed from the supplied bvecs, into the shared pages, before the request is put onto the ring. Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black tree. As the grefs are not known apriori, and provide no guarantees on their ordering, we have to perform a search through this tree to find the page, for every gref we receive. This operation takes O(log n) time in the worst case. In blkfront grants are stored using a single linked list. The maximum number of grants that blkback will persistenly map is currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to prevent a malicios guest from attempting a DoS, by supplying fresh grefs, causing the Dom0 kernel to map excessively. If a guest is using persistent grants and exceeds the maximum number of grants to map persistenly the newly passed grefs will be mapped and unmaped. Using this approach, we can have requests that mix persistent and non-persistent grants, and we need to handle them correctly. This allows us to set the maximum number of persistent grants to a lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although setting it will lead to unpredictable performance. In writing this patch, the question arrises as to if the additional cost of performing memcpys in the guest (to/from the pool of granted pages) outweigh the gains of not performing TLB shootdowns. The answer to that question is `no'. There appears to be very little, if any additional cost to the guest of using persistent grants. There is perhaps a small saving, from the reduced number of hypercalls performed in granting, and ending foreign access. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up the misuse of bool as int]	2012-10-30 09:50:04 -04:00
Alex Elder	0d7dbfce9d	rbd: define image specification structure Group the fields that uniquely specify an rbd image into a new reference-counted rbd_spec structure. This structure will be used to describe the desired image when mapping an image, and when probing parent images in layered rbd devices. Replace the set of fields in the rbd device structure with a pointer to a dynamically allocated rbd_spec. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:30 -05:00
Alex Elder	dc79b113d6	rbd: have rbd_add_parse_args() return error Change the interface to rbd_add_parse_args() so it returns an error code rather than a pointer. Return the ceph_options result via a pointer whose address is passed as an argument. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:30 -05:00
Alex Elder	4e9afeba7a	rbd: pass and populate rbd_options structure Have the caller pass the address of an rbd_options structure to rbd_add_parse_args(), to be initialized with the information gleaned as a result of the parse. I know, this is another near-reversal of a recent change... Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	819d52bf72	rbd: remove snap_name arg from rbd_add_parse_args() The snapshot name returned by rbd_add_parse_args() just gets saved in the rbd_dev eventually. So just do that inside that function and do away with the snap_name argument, both in rbd_add_parse_args() and rbd_dev_set_mapping(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	f28e565a1b	rbd: remove options args from rbd_add_parse_args() They "options" argument to rbd_add_parse_args() (and it's partner options_size) is now only needed within the function, so there's no need to have the caller allocate and pass the options buffer. Just allocate the options buffer within the function using dup_token(). Also distinguish between failures due to failed memory allocation and failing because a required argument was missing. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	e5c3553404	rbd: get rid of snap_name_len The value returned in the "snap_name_len" argument to rbd_add_parse_args() is never actually used, so get rid of it. The snap_name_len recorded in rbd_dev_v2_snap_name() is not useful either, so get rid of that too. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	0ddebc0c6c	rbd: do all argument parsing in one place This patch makes rbd_add_parse_args() be the single place all argument parsing occurs for an image map request: - Move the ceph_parse_options() call into that function - Use local variables rather than parameters to hold the list of monitor addresses supplied - Rather than returning it, pass the snapshot name (and its length) back via parameters - Have the function return a ceph_options structure pointer Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	78cea76e05	rbd: move ceph_parse_options() call up Move option parsing out of rbd_get_client() and into its caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	daba5fdb4c	rbd: rename snap_exists field A Boolean field "snap_exists" in an rbd mapping is used to indicate whether a mapped snapshot has been removed from an image's snapshot context, to stop sending requests for that snapshot as soon as we know it's gone. Generalize the interpretation of this field so it applies to non-snapshot (i.e. "head") mappings. That is, define its value to be false until the mapping has been set, and then define it to be true for both snapshot mappings or head mappings. Rename the field "exists" to reflect the broader interpretation. The rbd_mapping structure is on its way out, so move the field back into the rbd_device structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	971f839a76	rbd: move snap info out of rbd_mapping struct Moving the snap_id and snap_name fields into the separate rbd_mapping structure was misguided. (And in time, perhaps we'll do away with that structure altogether...) Move these fields back into struct rbd_device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	86992098e7	rbd: make pool_id a 64 bit value If a format 2 image has a parent, its pool id will be specified using a 64-bit value. Change the pool id we save for an image to match that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:29 -05:00
Alex Elder	41f38c2b2f	rbd: remove snapshots on error in rbd_add() If rbd_dev_snaps_update() has ever been called for an rbd device structure there could be snapshot structures on its snaps list. In rbd_add(), this function is called but a subsequent error path neglected to clean up any of these snapshots. Add a call to rbd_remove_all_snaps() in the appropriate spot to remedy this. Change a couple of error labels to be a little clearer while there. Drop the leading underscores from the function name; there's nothing special about that function that they might signify. As suggested in review, the leading underscores in __rbd_remove_snap_dev() have been removed as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:28 -05:00
Alex Elder	f7760dad28	rbd: simplify rbd_rq_fn() When processing a request, rbd_rq_fn() makes clones of the bio's in the request's bio chain and submits the results to osd's to be satisfied. If a request bio straddles the boundary between objects backing the rbd image, it must be represented by two cloned bio's, one for the first part (at the end of one object) and one for the second (at the beginning of the next object). This has been handled by a function bio_chain_clone(), which includes an interface only a mother could love, and which has been found to have other problems. This patch defines two new fairly generic bio functions (one which replaces bio_chain_clone()) to help out the situation, and then revises rbd_rq_fn() to make use of them. First, bio_clone_range() clones a portion of a single bio, starting at a given offset within the bio and including only as many bytes as requested. As a convenience, a request to clone the entire bio is passed directly to bio_clone(). Second, bio_chain_clone_range() performs a similar function, producing a chain of cloned bio's covering a sub-range of the source chain. No bio_pair structures are used, and if successful the result will represent exactly the specified range. Using bio_chain_clone_range() makes bio_rq_fn() a little easier to understand, because it avoids the need to pass very much state information between consecutive calls. By avoiding the need to track a bio_pair structure, it also eliminates the problem described here: http://tracker.newdream.net/issues/2933 Note that a block request (and therefore the complete length of a bio chain processed in rbd_rq_fn()) is an unsigned int, while the result of rbd_segment_length() is u64. This change makes this range trunctation explicit, and trips a bug if the the segment boundary is too far off. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-30 08:34:28 -05:00
Lars Ellenberg	ccae7868b0	drbd: log request sector offset and size for IO errors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	a2a3c74f24	drbd: always write bitmap on detach If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	06f10adbdb	drbd: prepare for more than 32 bit flags - struct drbd_conf { ... unsigned long flags; ... } + struct drbd_conf { ... unsigned long drbd_flags[N]; ... } And introduce wrapper functions for test/set/clear bit operations on this member. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	44edfb0d78	drbd: wait for meta data IO completion even with failed disk, unless force-detached The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	8b45a5c8a1	drbd: a few more GFP_KERNEL -> GFP_NOIO This has not yet been observed, but conceivably, when using GFP_KERNEL allocations from drbd_md_sync(), drbd_flush_after_epoch() or receive_SyncParam(), we could trigger additional IO to our own device, or an other device in a criss-cross setup, and end up in a local deadlock, or potentially a distributed deadlock in a criss-cross setup involving the peer blocked in a similar way waiting for us to make progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	0b143d4382	drbd: fix potential deadlock during bitmap (re-)allocation The former comment arguing that GFP_KERNEL was good enough was wrong: it did not take resize into account at all, and assumed the only path leading here was the normal attach on a still secondary device, so no deadlock would be possible. Both resize on a Primary, or attach on a diskless Primary, could potentially deadlock. drbd_bm_resize() is called while IO to the respective device is suspended, so we must use GFP_NOIO to avoid potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	7fb907c15f	drbd: panic on delayed completion of aborted requests "aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed error completion is OK, though we still will complain noisily about it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	dbd0820c6f	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	599377acb7	drbd: Avoid NetworkFailure state during disconnect Disconnecting is a cluster wide state change. In case the peer node agrees to the state transition, it sends back the fact on the meta-data connection and closes both sockets. In case the node node that initiated the state transfer sees the closing action on the data-socket, before the P_STATE_CHG_REPLY packet, it was going into one of the network failure states. At least with the fencing option set to something else thatn "dont-care", the unclean shutdown of the connection causes a short IO freeze or a fence operation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	c12a3d8c84	drbd: Fix a potential issue with the DISCARD_CONCURRENT flag The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Lars Ellenberg	02b91b5526	drbd: introduce stop-sector to online verify We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Philipp Reisner	9f2247bb9b	drbd: Protect accesses to the uuid set with a spinlock There is at least the worker context, the receiver context, the context of receiving netlink packts and processes reading a sysfs attribute that access the uuids. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Dave Chinner	a1ecac3b06	loop: Make explicit loop device destruction lazy xfstests has always had random failures of tests due to loop devices failing to be torn down and hence leaving filesytems that cannot be unmounted. This causes test runs to immediately stop. Over the past 6 or 7 years we've added hacks like explicit unmount -d commands for loop mounts, losetup -d after unmount -d fails, etc, but still the problems persist. Recently, the frequency of loop related failures increased again to the point that xfstests 259 will reliably fail with a stray loop device that was not torn down. That is despite the fact the test is above as simple as it gets - loop 5 or 6 times running mkfs.xfs with different paramters: lofile=$(losetup -f) losetup $lofile "$testfile" "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null \|\| echo "mkfs failed!" sync losetup -d $lofile And losteup -d $lofile is failing with EBUSY on 1-3 of these loops every time the test is run. Turns out that blkid is running simultaneously with losetup -d, and so it sees an elevated reference count and returns EBUSY. But why is blkid running? It's obvious, isn't it? udev has decided to try and find out what is on the block device as a result of a creation notification. And it is racing with mkfs, so might still be scanning the device when mkfs finishes and we try to tear it down. So, make losetup -d force autoremove behaviour. That is, when the last reference goes away, tear down the device. xfstests wants it gone, not causing random teardown failures when we know that all the operations the tests have specifically run on the device have completed and are no longer referencing the loop device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:31 +01:00
Selvan Mani	4453bc88f0	mtip32xx:Added appropriate timeout value for secure erase Added appropriate timeout value for secure erase based on identify device data Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:27 +01:00
Oliver Chick	1f999572f2	xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool Changing the type of bdev parameters to be unsigned int :1, rather than bool. This is more consistent with the types of other features in the block drivers. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:20 +01:00
Akinobu Mita	b7010ede43	cciss: select CONFIG_CHECK_SIGNATURE The patch cciss-use-check_signature.patch in -mm tree introduced a build error: drivers/built-in.o: In function `CISS_signature_present': drivers/block/cciss.c:4270: undefined reference to `check_signature' Add missing CONFIG_CHECK_SIGNATURE to fix this issue. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Fengguang Wu <wfg@linux.intel.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: "Stephen M. Cameron" <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:37:00 +01:00
Wei Yongjun	2541aa799f	cciss: remove unneeded memset() The memory return by kzalloc() or kmem_cache_zalloc() has already be set to zero, so remove useless memset(0). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:36:58 +01:00
Wei Yongjun	654dbef214	xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 08:36:27 +01:00
Herton Ronaldo Krzesinski	1a4ae43e4f	floppy: remove dr, reuse drive on do_floppy_init This is a small cleanup, that also may turn error handling of unitialized disks more readable. We don't need a separate variable to track allocated disks, remove dr and reuse drive variable instead. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:36:07 +01:00
Herton Ronaldo Krzesinski	8d3ab4ebfd	floppy: use common function to check if floppies can be registered The same checks to see if a drive can be or is registered are repeated through the code, factor out the checks in a common function and replace the repeated checks with it. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	d60e7ec18c	floppy: properly handle failure on add_disk loop On floppy initialization, if something failed inside the loop we call add_disk, there was no cleanup of previous iterations in the error handling. Cc: stable@vger.kernel.org Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	238ab78469	floppy: do put_disk on current dr if blk_init_queue fails If blk_init_queue fails, we do not call put_disk on the current dr (dr is decremented first in the error handling loop). Cc: stable@vger.kernel.org Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:25 +01:00
Herton Ronaldo Krzesinski	b54e1f8889	floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop Since commit `070ad7e` ("floppy: convert to delayed work and single-thread wq"), we end up calling alloc_ordered_workqueue multiple times inside the loop, which shouldn't be intended. Besides the leak, other side effect in the current code is if blk_init_queue fails, we would end up calling unregister_blkdev even if we didn't call yet register_blkdev. Just moved the allocation of floppy_wq before the loop, and adjusted the code accordingly. Cc: stable@vger.kernel.org # 3.5+ Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:34:24 +01:00
Konrad Rzeszutek Wilk	2911758f14	xen/blkback: Fix compile warning drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static? drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static? Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-10-30 08:32:43 +01:00
Alex Elder	069a4b5690	rbd: kill rbd_device->rbd_opts The rbd_device structure has an embedded rbd_options structure. Such a structure is needed to work with the generic ceph argument parsing code, but there's no need to keep it around once argument parsing is done. Use a local variable to hold the rbd options used in parsing in rbd_get_client(), and just transfer its content (it's just a read_only flag) into the field in the rbd_mapping sub-structure that requires that information. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	e5cfeed281	rbd: simplify rbd_merge_bvec() The aim of this patch is to make what's going on rbd_merge_bvec() a bit more obvious than it was before. This was an issue when a recent btrfs bug led us to question whether the merge function was working correctly. Use "obj" rather than "chunk" to indicate the units whose boundaries we care about we call (rados) "objects". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	d4b125e9eb	rbd: increase maximum snapshot name length Change RBD_MAX_SNAP_NAME_LEN to be based on NAME_MAX. That is a practical limit for the length of a snapshot name (based on the presence of a directory using the name under /sys/bus/rbd to represent the snapshot). The /sys entry is created by prefixing it with "snap_"; define that prefix symbolically, and take its length into account in defining the snapshot name length limit. Enforce the limit in rbd_add_parse_args(). Also delete a dout() call in that function that was not meant to be committed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	db2388b6ee	rbd: verify rbd image order value This adds a verification that an rbd image's object order is within the upper and lower bounds supported by this implementation. It must be at least 9 (SECTOR_SHIFT), because the Linux bio system assumes that minimum granularity. It also must be less than 32 (at the moment anyway) because there exist spots in the code that store the size of a "segment" (object backing an rbd image) in a signed int variable, which can be 32 bits including the sign. We should be able to relax this limit once we've verified the code uses 64-bit types where needed. Note that the CLI tool already limits the order to the range 12-25. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	4634246db8	rbd: consolidate rbd_do_op() calls The two calls to rbd_do_op() from rbd_rq_fn() differ only in the value passed for the snapshot id and the snapshot context. For reads the snapshot always comes from the mapping, and for writes the snapshot id is always CEPH_NOSNAP. The snapshot context is always null for reads. For writes, the snapshot context always comes from the rbd header, but it is acquired under protection of header semaphore and could change thereafter, so we can't simply use what's available inside rbd_do_op(). Eliminate the snapid parameter from rbd_do_op(), and set it based on the I/O direction inside that function instead. Always pass the snapshot context acquired in the caller, but reset it to a null pointer inside rbd_do_op() if the operation is a read. As a result, there is no difference in the read and write calls to rbd_do_op() made in rbd_rq_fn(), so just call it unconditionally. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	ff2e4bb5b3	rbd: drop rbd_do_op() opcode and flags The only callers of rbd_do_op() are in rbd_rq_fn(), where call one is used for writes and the other used for reads. The request passed to rbd_do_op() already encodes the I/O direction, and that information can be used inside the function to set the opcode and flags value (rather than passing them in as arguments). So get rid of the opcode and flags arguments to rbd_do_op(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	13f4042c05	rbd: kill rbd_req_{read,write}() Both rbd_req_read() and rbd_req_write() are simple wrapper routines for rbd_do_op(), and each is only called once. Replace each wrapper call with a direct call to rbd_do_op(), and get rid of the wrapper functions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	be466c1cc3	rbd: fix read-only option name The name of the "read-only" mapping option was inadvertently changed in this commit: `f84344f3` rbd: separate mapping info in rbd_dev Revert that hunk to return it to what it should be. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	a0ea3a40fd	rbd: zero return code in rbd_dev_image_id() When rbd_dev_probe() calls rbd_dev_image_id() it expects to get a 0 return code if successful, but it is getting a positive value. The reason is that rbd_dev_image_id() returns the value it gets from rbd_req_sync_exec(), which returns the number of bytes read in as a result of the request. (This ultimately comes from ceph_copy_from_page_vector() in rbd_req_sync_op()). Force the return value to 0 when successful in rbd_dev_image_id(). Do the same in rbd_dev_v2_object_prefix(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2012-10-26 17:18:08 -05:00
Alex Elder	b213e0b1a6	rbd: fix bug in rbd_dev_id_put() In rbd_dev_id_put(), there's a loop that's intended to determine the maximum device id in use. But it isn't doing that at all, the effect of how it's written is to simply use the just-put id number, which ignores whole purpose of this function. Fix the bug. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-26 17:18:08 -05:00
Kees Cook	b8977285ec	drivers/block: remove CONFIG_EXPERIMENTAL This config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Asai Thambi S P <asamymuthupa@micron.com> CC: Pete Zaitcev <zaitcev@redhat.com> CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-23 22:30:38 +02:00
Linus Torvalds	ce40be7a82	Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...	2012-10-11 09:04:23 +09:00
Alex Elder	35152979e6	rbd: activate v2 image support Now that v2 images support is fully implemented, have rbd_dev_v2_probe() return 0 to indicate a successful probe. (Note that an image that implements layering will fail the probe early because of the feature chekc.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:44:01 -07:00
Alex Elder	d889140c4a	rbd: implement feature checks Version 2 images have two sets of feature bit fields. The first indicates features possibly used by the image. The second indicates features that the client must support in order to use the image. When an image (or snapshot) is first examined, we need to make sure that the local implementation supports the image's required features. If not, fail the probe for the image. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:43:51 -07:00
Alex Elder	117973fb4c	rbd: define rbd_dev_v2_refresh() Define a new function rbd_dev_v2_refresh() to update/refresh the snapshot context for a format version 2 rbd image. This function will update anything that is not fixed for the life of an rbd image--at the moment this is mainly the snapshot context and (for a base mapping) the size. Update rbd_refresh_header() so it selects which function to use based on the image format. Rename __rbd_refresh_header() to be rbd_dev_v1_refresh() to be consistent with the naming of its version 2 counterpart. Similarly rename rbd_refresh_header() to be rbd_dev_refresh(). Unrelated--we use rbd_image_format_valid() here. Delete the other use of it, which was primarily put in place to ensure that function was referenced at the time it was defined. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:43:39 -07:00
Alex Elder	9478554ae5	rbd: define rbd_update_mapping_size() Encapsulate the code that handles updating the size of a mapping after an rbd image has been refreshed. This is done in anticipation of the next patch, which will make this common code for format 1 and 2 images. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-10 07:43:28 -07:00
Linus Torvalds	7035cdf36d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The bulk of this pull is a series from Alex that refactors and cleans up the RBD code to lay the groundwork for supporting the new image format and evolving feature set. There are also some cleanups in libceph, and for ceph there's fixed validation of file striping layouts and a bugfix in the code handling a shrinking MDS cluster." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits) ceph: avoid 32-bit page index overflow ceph: return EIO on invalid layout on GET_DATALOC ioctl rbd: BUG on invalid layout ceph: propagate layout error on osd request creation libceph: check for invalid mapping ceph: convert to use le32_add_cpu() ceph: Fix oops when handling mdsmap that decreases max_mds rbd: update remaining header fields for v2 rbd: get snapshot name for a v2 image rbd: get the snapshot context for a v2 image rbd: get image features for a v2 image rbd: get the object prefix for a v2 rbd image rbd: add code to get the size of a v2 rbd image rbd: lay out header probe infrastructure rbd: encapsulate code that gets snapshot info rbd: add an rbd features field rbd: don't use index in __rbd_add_snap_dev() rbd: kill create_snap sysfs entry rbd: define rbd_dev_image_id() rbd: define some new format constants ...	2012-10-08 06:38:18 +09:00
Linus Torvalds	dc92b1f9ab	Merge branch 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio changes from Rusty Russell: "New workflow: same git trees pulled by linux-next get sent straight to Linus. Git is awkward at shuffling patches compared with quilt or mq, but that doesn't happen often once things get into my -next branch." * 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits) lguest: fix occasional crash in example launcher. virtio-blk: Disable callback in virtblk_done() virtio_mmio: Don't attempt to create empty virtqueues virtio_mmio: fix off by one error allocating queue drivers/virtio/virtio_pci.c: fix error return code virtio: don't crash when device is buggy virtio: remove CONFIG_VIRTIO_RING virtio: add help to CONFIG_VIRTIO option. virtio: support reserved vqs virtio: introduce an API to set affinity for a virtqueue virtio-ring: move queue_index to vring_virtqueue virtio_balloon: not EXPERIMENTAL any more. virtio-balloon: dependency fix virtio-blk: fix NULL checking in virtblk_alloc_req() virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path virtio-blk: Add bio-based IO path for virtio-blk virtio: console: fix error handling in init() function tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace tools: Add guest trace agent as a user tool virtio/console: Allocate scatterlist according to the current pipe size ...	2012-10-07 21:04:56 +09:00
Linus Torvalds	f1c6872e49	Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQbJELAAoJEFjIrFwIi8fJSI4H/32qrQKyF5IIkFKHTN9FYDC1 OxEGc4y47DIQpGUd/PgZ/i6h9Iyhj+I6pb4lCevykwgd0j83noepluZlCIcJnTfL HVXNiRIQKqFhqKdjTANxVM4APup+7Lqrvqj6OZfUuoxaZ3tSTLhabJ/7UXf2+9xy g2RfZtbSdQ1sukQ/A2MeGQNT79rh7v7PrYQUYSrqytjSjSLPTqRf75HWQ+eapIAH X3aVz8Tn6nTixZWvZOK7rAaD4awsFxGP6E46oFekB02f4x9nWHJiCZiXwb35lORb tz9F9td99f6N4fPJ9LgcYTaCPwzVnceZKqE9hGfip4uT+0WrEqDxq8QmBqI5YtI= =gxJD -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...	2012-10-07 07:13:01 +09:00
Ed Cashin	322c9ec009	aoe: update aoe-internal version number to 50 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	1ac9e60262	aoe: remove unused code Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	08b6062351	aoe: make dynamic block minor numbers the default Because udev use is so widespread, making the old static mapping the default is too conservative, given the severe limitations it places on usable AoE addresses. Storage virtualization and larger shelves have made the old limitations too confining. These changes make the dynamic block device minor numbers the default, removing the limitations on usable AoE addresses. The static arrangement is still available with aoe_dyndevs=0, and the aoe-stat tool from the userland aoetools package, the user space counterpart to the aoe driver, recognizes the case where there is a mismatch between the minor number in sysfs and the minor number in a special device file. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	7159e969d1	aoe: update and specify AoE address guards and error messages In general, specific is better when it comes to messages about AoE usage problems. Also, explicit checks for the AoE broadcast addresses are added. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	4bcce1a355	aoe: retain static block device numbers for backwards compatibility The old mapping between AoE target shelf and slot addresses and the block device minor number is retained as a backwards-compatible feature, with a new "aoe_dyndevs" module parameter available for enabling dynamic block device minor numbers. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	0c96621458	aoe: support more AoE addresses with dynamic block device minor numbers The ATA over Ethernet protocol uses a major (shelf) and minor (slot) address to identify a particular storage target. These changes remove an artificial limitation the aoe driver imposes on the use of AoE addresses. For example, without these changes, the slot address has a maximum of 15, but users commonly use slot numbers much greater than that. The AoE shelf and slot address space is often used sparsely. Instead of using a static mapping between AoE addresses and the block device minor number, the block device minor numbers are now allocated on demand. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	fea05a26c3	aoe: update copyright year in touched files Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	7392fbe5ad	aoe: update internal version number to 49 The internal version number of the aoe driver appears in a console message when the driver loads and is usually obtained by the user with the userland aoe-version tool, part of the aoetools.[1] Although this patchset includes bugfixes backported from higher-numbered versions published on the coraid.com website, it is a form of version 49. 1. http://aoetools.sourceforge.net/ Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	b21faa25c6	aoe: remove unused code and add cosmetic improvements This change removes some unused code and attempts to increase code consistency. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	1b86fda9ad	aoe: increase net_device reference count while using it This change eliminates the danger that the user could rmmod the driver for a network interface that is being used for AoE by the aoe driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	64a80f5ac7	aoe: associate frames with the AoE storage target In the driver code, "target" and aoetgt refer to a particular remote interface on the AoE storage target. The latter is identified by its AoE major and minor addresses. Commands that are being sent to an AoE storage target {major, minor} can be sent or retransmitted to any of the remote MAC addresses associated with the AoE storage target. That is, frames are naturally associated with not an aoetgt (AoE major, AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor). Making the code reflect that reality simplifies the driver, especially when the path to a remote MAC address becomes unusable. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	6583303c5e	aoe: disallow unsupported AoE minor addresses A guard is inserted to prevent AoE minor addresses (slot addresses) higher than 15 to be used, as they are not yet supported by the driver. There is a change coming that will allow the aoe driver to overcome this limit by using system device minor numbers dynamically, but until then, this guard prevents unexpected targets from being used by the driver when AoE targets with high minor numbers are on the AoE network. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	25f4d75ea4	aoe: do revalidation steps in order The discovery process begins with an optional AoE config query command and an AoE config query response. Normally when an aoe device is already open, the config query response does not trigger an ATA identify device command to be sent out, since the response contains storage capacity information that, if changed, could surprise the user of the device. The userland "aoe-revalidate" tool uses a character device to trigger an AoE config query for a particular AoE storage target and an ATA device identify command, even when the device is open. This change causes the config query to go out first, reflecting the normal discovery sequence. The responses could come back in any order, so this change is fairly cosmetic. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	d54d35ac66	aoe: failover remote interface based on aoe_deadsecs parameter The aoe_deadsecs module parameter allows the user to specify a hard limit on the number of seconds an AoE command can be retransmitted before the AoE block device is considered to have failed. Using aoe_deadsecs to determine the time we try using a different remote interface helps to ensure that the hard limit is not reached before we've tried to recover by sending to a different remote port. As a data storage target, the AoE target is unambiguously identified by its {major, minor} AoE address tuple, and an AoE target can have multiple MAC addresses. However, note that "target" in the driver code and comments means a {major, minor, MAC address} tuple, as in "somewhere to send packets". Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	3f0f013374	aoe: use packets that work with the smallest-MTU local interface Users with several network interfaces dedicated to AoE generally do not configure them to support different-sized AoE data payloads on purpose. For a given AoE target, there will be a set of local network interfaces that can reach it. Using only the payload that will fit in the smallest-sized MTU of all those local interfaces greatly simplifies the driver, especially in failure scenarios. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	eb086ec596	aoe: use a kernel thread for transmissions The dev_queue_xmit function needs to have interrupts enabled, so the most simple way to get the locking right but still fulfill that requirement is to use a process that can call dev_queue_xmit serially over queued transmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	69cf2d85de	aoe: become I/O request queue handler for increased user control To allow users to choose an elevator algorithm for their particular workloads, change from a make_request-style driver to an I/O-request-queue-handler-style driver. We have to do a couple of things that might be surprising. We manipulate the page _count directly on the assumption that we still have no guarantee that users of the block layer are prohibited from submitting bios containing pages with zero reference counts.[1] If such a prohibition now exists, I can get rid of the _count manipulation. Just as before this patch, we still keep track of the sk_buffs that the network layer still hasn't finished yet and cap the resources we use with a "pool" of skbs.[2] Now that the block layer maintains the disk stats, the aoe driver's diskstats function can go away. 1. https://lkml.org/lkml/2007/3/1/374 2. https://lkml.org/lkml/2007/7/6/241 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	896831f590	aoe: kernel thread handles I/O completions for simple locking Make the frames the aoe driver uses to track the relationship between bios and packets more flexible and detached, so that they can be passed to an "aoe_ktio" thread for completion of I/O. The frames are handled much like skbs, with a capped amount of preallocation so that real-world use cases are likely to run smoothly and degenerate gracefully even under memory pressure. Decoupling I/O completion from the receive path and serializing it in a process makes it easier to think about the correctness of the locking in the driver, especially in the case of a remote MAC address becoming unusable. [dan.carpenter@oracle.com: cleanup an allocation a bit] Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	3d5b06051c	aoe: for performance support larger packet payloads tAdd adds the ability to work with large packets composed of a number of segments, using the scatter gather feature of the block layer (biovecs) and the network layer (skb frag array). The motivation is the performance gained by using a packet data payload greater than a page size and by using the network card's scatter gather feature. Users of the out-of-tree aoe driver already had these changes, but since early 2011, they have complained of increased memory utilization and higher CPU utilization during heavy writes.[1] The commit below appears related, as it disables scatter gather on non-IP protocols inside the harmonize_features function, even when the NIC supports sg. commit `f01a5236bd` Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). With that regression in place, transmits always linearize sg AoE packets, but in-kernel users did not have this patch. Before 2.6.38, though, these changes were working to allow sg to increase performance. 1. http://www.spinics.net/lists/linux-mm/msg15184.html Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	a336d29870	nbd: handle discard requests Add discard support to nbd. If the nbd-server supports discard, it will send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards for the device (QUEUE_FLAG_DISCARD). If discard support is enabled, then when the nbd client system receives a discard request, this will be passed along to the nbd-server. When the discard request is received by the nbd-server, it will perform: fallocate(.. FALLOC_FL_PUNCH_HOLE ..) To punch a hole in the backend storage, which is no longer needed. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Paul Clements	2f01250888	nbd: add set flags ioctl Add a set-flags ioctl, allowing various option flags to be set on an nbd device. This allows the nbd-client to set the device flags (to enable read-only mode, or enable discard support, etc.). Flags are typically specified by the nbd-server. During the negotiation phase of the nbd connection, the server sends its flags to the client. The client then uses NBD_SET_FLAGS to inform the kernel of the options. Also included is a one-line fix to debug output for the set-timeout ioctl. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:23 +09:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Sage Weil	6cae3717cd	rbd: BUG on invalid layout This shouldn't actually be possible because the layout struct is constructed from the RBD header and validated then. [elder@inktank.com: converted BUG() call to equivalent rbd_assert()] Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-10-01 17:20:00 -05:00
Linus Torvalds	d9a807461f	USB merge for 3.7-rc1 Here is the big USB pull request for 3.7-rc1 There are lots of gadget driver changes (including copying a bunch of files into the drivers/staging/ccg/ directory so that the other gadget drivers can be fixed up properly without breaking that driver), and we remove the old obsolete ub.c driver from the tree. There are also the usual XHCI set of updates, and other various driver changes and updates. We also are trying hard to remove the old dbg() macro, but the final bits of that removal will be coming in through the networking tree before we can delete it for good. All of these patches have been in the linux-next tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlBp3+AACgkQMUfUDdst+ym5vwCfe93FyJyXn/RDkGz7iBemvWFd vrwAoIxjaOa4/yWZWcgrWc5bP4aO3ssc =jYDr -----END PGP SIGNATURE----- Merge tag 'usb-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB changes from Greg Kroah-Hartman: "Here is the big USB pull request for 3.7-rc1 There are lots of gadget driver changes (including copying a bunch of files into the drivers/staging/ccg/ directory so that the other gadget drivers can be fixed up properly without breaking that driver), and we remove the old obsolete ub.c driver from the tree. There are also the usual XHCI set of updates, and other various driver changes and updates. We also are trying hard to remove the old dbg() macro, but the final bits of that removal will be coming in through the networking tree before we can delete it for good. All of these patches have been in the linux-next tree. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fix up several annoying - but fairly mindless - conflicts due to the termios structure having moved into the tty device, and often clashing with dbg -> dev_dbg conversion. * tag 'usb-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (339 commits) USB: ezusb: move ezusb.c from drivers/usb/serial to drivers/usb/misc USB: uas: fix gcc warning USB: uas: fix locking USB: Fix race condition when removing host controllers USB: uas: add locking USB: uas: fix abort USB: uas: remove aborted field, replace with status bit. USB: uas: fix task management USB: uas: keep track of command urbs xhci: Intel Panther Point BEI quirk. powerpc/usb: remove checking PHY_CLK_VALID for UTMI PHY USB: ftdi_sio: add TIAO USB Multi-Protocol Adapter (TUMPA) support Revert "usb : Add sysfs files to control port power." USB: serial: remove vizzini driver usb: host: xhci: Fix Null pointer dereferencing with `71c731a` for non-x86 systems Increase XHCI suspend timeout to 16ms USB: ohci-at91: fix null pointer in ohci_hcd_at91_overcurrent_irq USB: sierra_ms: don't keep unused variable fsl/usb: Add support for USB controller version 2.4 USB: qcaux: add Pantech vendor class match ...	2012-10-01 13:23:01 -07:00
Alex Elder	6e14b1a6c3	rbd: update remaining header fields for v2 There are three fields that are not yet updated for format 2 rbd image headers: the version of the header object; the encryption type; and the compression type. There is no interface defined for fetching the latter two, so just initialize them explicitly to 0 for now. Change rbd_dev_v2_snap_context() so the caller can be supplied the version for the header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:54 -05:00
Alex Elder	b8b1e2db52	rbd: get snapshot name for a v2 image Define rbd_dev_v2_snap_name() to fetch the name for a particular snapshot in a format 2 rbd image. Define rbd_dev_v2_snap_info() to to be a wrapper for getting the name, size, and features for a particular snapshot, using an interface that matches the equivalent function for version 1 images. Define rbd_dev_snap_info() wrapper function and use it to call the appropriate function for getting the snapshot name, size, and features, dependent on the rbd image format. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:54 -05:00
Alex Elder	35d489f946	rbd: get the snapshot context for a v2 image Fetch the snapshot context for an rbd format 2 image by calling the "get_snapcontext" method on its header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	b1b5402aa9	rbd: get image features for a v2 image The features values for an rbd format 2 image are fetched from the server using a "get_features" method. The same method is used for getting the features for a snapshot, so structure this addition with a generic helper routine that can get this information for either. The server will provide two 64-bit feature masks, one representing the features potentially in use for this image (or its snapshot), and one representing features that must be supported by the client in order to work with the image. For the time being, neither of these is really used so we keep things simple and just record the first feature vector. Once we start using these feature masks, what we record and what we expose to the user will most likely change. Signed-off-by: Alex Elder <elder@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	1e1301998e	rbd: get the object prefix for a v2 rbd image The object prefix of an rbd format 2 image is fetched from the server using a "get_object_prefix" method. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	9d475de5d1	rbd: add code to get the size of a v2 rbd image The size of an rbd format 2 image is fetched from the server using a "get_size" method. The same method is used for getting the size of a snapshot, so structure this addition with a generic helper routine that we can get this information for either. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	a30b71b999	rbd: lay out header probe infrastructure This defines a new function rbd_dev_probe() as a top-level function for populating detailed information about an rbd device. It first checks for the existence of a format 2 rbd image id object. If it exists, the image is assumed to be a format 2 rbd image, and another function rbd_dev_v2() is called to finish populating header data for that image. If it does not exist, it is assumed to be an old (format 1) rbd image, and calls a similar function rbd_dev_v1() to populate its header information. A new field, rbd_dev->format, is defined to record which version of the rbd image format the device represents. For a valid mapped rbd device it will have one of two values, 1 or 2. So far, the format 2 images are not really supported; this is laying out the infrastructure for fleshing out that support. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	cd892126c6	rbd: encapsulate code that gets snapshot info Create a function that encapsulates looking up the name, size and features related to a given snapshot, which is indicated by its index in an rbd device's snapshot context array of snapshot ids. This interface will be used to hide differences between the format 1 and format 2 images. At the moment this (looking up the name anyway) is slightly less efficient than what's done currently, but we may be able to optimize this a bit later on by cacheing the last lookup if it proves to be a problem. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	34b131849f	rbd: add an rbd features field Record the features values for each rbd image and each of its snapshots. This is really something that only becomes meaningful for version 2 images, so this is just putting in place code that will form common infrastructure. It may be useful to expand the sysfs entries--and therefore the information we maintain--for the image and for each snapshot. But I'm going to hold off doing that until we start making active use of the feature bits. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	c8d184250d	rbd: don't use index in __rbd_add_snap_dev() Pass the snapshot id and snapshot size rather than an index to __rbd_add_snap_dev() to specify values for a new snapshot. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	02cdb02cea	rbd: kill create_snap sysfs entry Josh proposed the following change, and I don't think I could explain it any better than he did: From: Josh Durgin <josh.durgin@inktank.com> Date: Tue, 24 Jul 2012 14:22:11 -0700 To: ceph-devel <ceph-devel@vger.kernel.org> Message-ID: <500F1203.9050605@inktank.com> Right now the kernel still has one piece of rbd management duplicated from the rbd command line tool: snapshot creation. There's nothing special about snapshot creation that makes it advantageous to do from the kernel, so I'd like to remove the create_snap sysfs interface. That is, /sys/bus/rbd/devices/<id>/create_snap would be removed. Does anyone rely on the sysfs interface for creating rbd snapshots? If so, how hard would it be to replace with: rbd snap create pool/image@snap Is there any benefit to the sysfs interface that I'm missing? Josh This patch implements this proposal, removing the code that implements the "snap_create" sysfs interface for rbd images. As a result, quite a lot of other supporting code goes away. Suggested-by: Josh Durgin <josh.durgin@inktank.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	589d30e0b3	rbd: define rbd_dev_image_id() New format 2 rbd images are permanently identified by a unique image id. Each rbd image also has a name, but the name can be changed. A format 2 rbd image will have an object--whose name is based on the image name--which maps an image's name to its image id. Create a new function rbd_dev_image_id() that checks for the existence of the image id object, and if it's found, records the image id in the rbd_device structure. Create a new rbd device attribute (/sys/bus/rbd/<num>/image_id) that makes this information available. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00
Alex Elder	3bb59ad515	rbd: define some new format constants Define constant symbols related to the rbd format 2 object names. This begins to bring this version of the "rbd_types.h" header more in line with the current user-space version of that file. Complete reconciliation of differences will be done at some point later, as a separate task. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2012-10-01 14:30:53 -05:00

... 6 7 8 9 10 ...

3534 Commits