Commit Graph

106 Commits

Author SHA1 Message Date
Philipp Reisner dc66c74de6 drbd: Fixed a race between disk-attach and unexpected state changes
This was a very hard to trigger race condition.

If we got a state packet from the peer, after drbd_nl_disk() has
already changed the disk state to D_NEGOTIATING but
after_state_ch() was not yet run by the worker, then receive_state()
might called drbd_sync_handshake(), which in turn crashed
when accessing p_uuid.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-06-14 12:19:41 +02:00
Philipp Reisner 2a0ab2cd73 drbd: Reduce verbosity
The "Local READ/WRITE failed" messages are too verbose.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Lars Ellenberg 7383506c87 drbd: use drbd specific ratelimit instead of global printk_ratelimit
using the global printk_ratelimit() may mask other messages.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Lars Ellenberg d255e5ff5f drbd: fix hang on local read errors while disconnected
"canceled" w_read_retry_remote never completed, if they have been
canceled after drbd_disconnect connection teardown cleanup has already
run (or we are currently not connected anyways).

Fixed by not queueing a remote retry if we already know it won't work
(pdsk not uptodate), and cleanup ourselves on "cancel", in case we hit a
race with drbd_disconnect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Philipp Reisner 32fa7e91f9 drbd: Removed the now empty w_io_error() function
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Andrea Gelmini 039e1fb654 drbd: removed duplicated #includes
drbd/drbd_receiver.c: linux/mm.h is included more than once.

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Lars Ellenberg ba11ad9a3b drbd: improve usage of MSG_MORE
It seems to improve performance if we allow the "p_data" header in its
own frame (no MSG_MORE), but sendpage all but the last page with MSG_MORE.
This is also in preparation of a later zero copy receive implementation.

Suggested by Eduard.Guzovsky@stratus.com on drbd-dev.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Lars Ellenberg 5dbf167338 drbd: need to set socket bufsize early to take effect
quoting tcp(7):
    On individual connections, the socket buffer size must be set prior to the
    listen(2) or connect(2) calls in order to have it take effect.

This adds a wrapper to do so, and uses it appropriately.
Improves performance in certain situations.

Note that because we cannot easily determine which socket will be
"meta" and wich "data" (bulk) socket, we adjust both sockets.
Previously, DRBD only adjusted the bufsizes of the "data" socket.

Thanks again to Eduard.Guzovsky@stratus.com.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Lars Ellenberg 344fa462e3 drbd: improve network latency, TCP_QUICKACK
On Thu, Apr 29, 2010 at 04:00:50PM -0400, Eduard.Guzovsky@stratus.com
 wrote on drbd-dev@lists.linbit.com
 Subject: [Drbd-dev] DRBD small synchronous writes performance improvements

> 1. TCP_QUICKACK option is set incorrectly. The goal was force TCP to
> send and ACK as a  "one time" event.  Instead the code permanently sets
> connection in the QUICKACK mode.

He is right, we actually want to use an even val with TCP_QUICKACK.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:27 +02:00
Philipp Reisner 2c8d196759 drbd: Revert "drbd: Create new current UUID as late as possible"
The late-UUID writing is delayed until the next release.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01 11:12:26 +02:00
Jens Axboe ee9a3607fb Merge branch 'master' into for-2.6.35
Conflicts:
	fs/ext3/fsync.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:27:26 +02:00
Philipp Reisner 4e23a59ed1 drbd: Do not free p_uuid early, this is done in the exit code of the receiver
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:01 +02:00
Philipp Reisner 23ce422748 drbd: Null pointer deref fix to the large "multi bio rewrite"
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:01 +02:00
Philipp Reisner fc8ce1941d drbd: Fix: Do not detach, if a bio with a barrier fails
Introduced a few days ago:
  commit 45bb912bd5
  Author: Lars Ellenberg <lars.ellenberg@linbit.com>
  Date:   Fri May 14 17:10:48 2010 +0200

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:00 +02:00
Philipp Reisner 4604d63668 drbd: Ensure to not trigger late-new-UUID creation multiple times
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:00 +02:00
Philipp Reisner 31a31dccdd drbd: Do not Oops when C_STANDALONE when uuid gets generated
Got introduces with

commit 0c3f34516e
Author: Philipp Reisner <philipp.reisner@linbit.com>
Date:   Mon May 17 16:10:43 2010 +0200

    drbd: Create new current UUID as late as possible

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:00 +02:00
Julia Lawall 2db4e42eac drivers/block/drbd: Use kzalloc
Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:04:10 +02:00
Philipp Reisner 0c3f34516e drbd: Create new current UUID as late as possible
The choice was to either delay creation of the new UUID until
IO got thawed or to delay it until the first IO request.

Both are correct, the later is more friendly to users of
dual-primary setups, that actually only write on one side.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:03:49 +02:00
Philipp Reisner 9a25a04c80 drbd: If we detect late that IO got frozen, retry after we thawed.
If we detect late (= after grabing mdev->req_lock) that IO got frozen, we
return 1 to generic_make_request(), which simply will retry to make a
request for that bio.

In the subsequent call of generic_make_request() into drbd_make_request_26()
we sleep in inc_ap_bio().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:03:32 +02:00
Lars Ellenberg a1c88d0d7a drbd: always use_bmbv, ignore setting
Now that the peer may handle multi-bio EEs,
we can ignore the peer's limit,
and concentrate on the limits of the local IO stack.

This is safe accross drbd protocol versions,
as our queue_max_sectors() will be adjusted accordingly.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:03:05 +02:00
Lars Ellenberg bb3d000cb9 drbd: allow resync requests to be larger than max_segment_size
this should allow for better background resync performance.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:02:36 +02:00
Lars Ellenberg 45bb912bd5 drbd: Allow drbd_epoch_entries to use multiple bios.
This should allow for better performance if the lower level IO stack
of the peers differs in limits exposed either via the queue,
or via some merge_bvec_fn.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:01:23 +02:00
Lars Ellenberg 708d740ed8 drbd: reduce sizeof struct drbd_epoch_entry by 8 byte by aligning members
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:28:35 +02:00
Philipp Reisner 162f3ec7f0 drbd: Fixes to the new delay_probes code
* Only send delay_probes with protocol 93 or newer
* drbd_send_delay_probes() is called only from worker context,
  no atomic_t needed for delay_seq

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:28:08 +02:00
Philipp Reisner a8cdfd8d3b drbd: A fixes to the new resync speed code
* Mention P_DELAY_PROBE in the packet naming array
* Do not corrupt the mdev->data.work list in case the timer goes
  off before delay_probe_work got handled by the worker
* Do not mod_timer() twice for a single delay_probe pair

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:26:51 +02:00
Philipp Reisner eedf386ae9 drbd: Proc bits of new resync speed stuff
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:26:27 +02:00
Philipp Reisner cdd67a7460 drbd: Control the actual resync rate based on the queuing delay of data packets
In a setup with a high bandwidth and high latency network, eventually
involving deep queues in routers, it is beneficial to only fill those
queues up to an limited extend with resync data.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:25:47 +02:00
Philipp Reisner bd26bfc5b4 drbd: Actually send delay probes
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:25:28 +02:00
Philipp Reisner 67c7ddd055 drbd: Four new configuration settings for resync speed control
To reasonably control resync speed over drbd-proxy connections,
drbd has to measure the current delay of packets transmitted over
the (possibly congested) data socket vs the meta-data socket.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:25:00 +02:00
Philipp Reisner 7237bc430f drbd: Sending of delay_probes
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:22:46 +02:00
Philipp Reisner 0ced55a3be drbd: Receiving of delay_probes
Delay_probes are new packets in the DRBD protocol, which allow
DRBD to know the current delay packets have on the data socket.
(relative to the meta data socket)

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:22:11 +02:00
Philipp Reisner 5223671bb0 drbd: Fixed bitmap in case of online-grow without resync
The "surplus" bits of the old (smaller) bitmap must be clean
in case of online-grow without resync.

Note: Reverted 67ae8b80d4a116ab3b7094eb3723506b20c06dff as
well, since the lines added by this patch are redundant. The
bits get set by the bm_set_surplus(b) call before that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:20:33 +02:00
Philipp Reisner 6b4388ac1f drbd: Added transmission faults to the fault injection code
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:19:51 +02:00
Philipp Reisner 087c24925c drbd: bugfix: Make resize work, if remote's size was limiting and increased in the meantime
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:18:22 +02:00
Philipp Reisner 6495d2c6d0 drbd: Implemented the --assume-clean option for drbdsetup resize
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:17:47 +02:00
Philipp Reisner b4ee79dac3 drbd: Added some missing statics
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:17:11 +02:00
Philipp Reisner fd76438c24 drbd: Make sure to resync all of the new storage upon online resize
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:16:20 +02:00
Philipp Reisner e89b591c3a drbd: Implemented flags for the resize packet
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:15:44 +02:00
Philipp Reisner 02d9a94bbb drbd: Implemented the set_new_bits parameter for drbd_bm_resize()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:14:43 +02:00
Philipp Reisner d845030f21 drbd: made determin_dev_size's parameter an flag enum
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:14:04 +02:00
Adam Gandelman 3a11a48789 drbd: New handler: initial-split-brain
Some wish to be notified of all instances of split brain, not just those that
go unresolved.  The initial-split-brain handler is called to notify someone
upon  detection of all split brain conditions even if auto-recovery policies
are configured.

Signed-off-by: Adam Gandelman <adam.gandelman@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:13:33 +02:00
Lars Ellenberg 979f5c7f1f drbd: fail_requests_early: remove incorrect and unnecessary optimization
The condition does not fit the commend (I may well be Primary,
even if I lost the disk earlier and now the connection).

And this is catched below anyways, where it also gets logged.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:10:31 +02:00
Lars Ellenberg 6666032ade drbd: check for corrupt or malicous sector addresses when receiving data
Even if it should never happen if the peer does behave, we need to
double check, and not even attempt access beyond end of device.
It usually would be caught by lower layers, resulting in "IO error",
but may also end up in the internal meta data area.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:09:57 +02:00
Philipp Reisner c3fe30b0e7 drbd: cleanup: This code path to trigger a resync is no longer needed
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:09:13 +02:00
Lars Ellenberg 8d4ce82b3c drbd: don't start a resync without access to up-to-date Data
In case both nodes are "inconsistent", invalidate would
have started a resync anyways, without a chance to ever
succeed, just filling the logs with warning messages.

Simply disallow that state change,
re-using the SS_NO_UP_TO_DATE_DISK return value.

This also changes the corresponding error string to
"Need access to UpToDate Data" -- I found the
"Refusing to be Primary without at least one UpToDate disk"
answer misleading in some situations anyways.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:08:18 +02:00
Lars Ellenberg c3470cde57 drbd: fix potential protocol error
Don't forget to drain the digest in case we cannot satisfy a
checksum based resync or online-verify request.

It would additionally cause a protocoll error,
dropping the connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:07:38 +02:00
Lars Ellenberg 8d1894ebe4 drbd: remove bogus ASSERT
block_id may be ID_SYNCER,
as well as checksum based resync request magic, or online verify magic.

Let's just drop that ASSERT.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:06:59 +02:00
Lars Ellenberg e0f83012dc drbd: fix regression: attach while connected failed
commit e4f925e12e
Author: Philipp Reisner <philipp.reisner@linbit.com>
Date:   Wed Mar 17 14:18:41 2010 +0100

    drbd: Do not upgrade state to Outdated if already Inconsistent

prevented the necessary state transition for attaching while connected
(Diskless -> Consistent respectively Outdated).
This is the fix for the fix.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:06:07 +02:00
Philipp Reisner e4f925e12e drbd: Do not upgrade state to Outdated if already Inconsistent [Bugz 277]
There was a race condition:
  In a situation with a SyncSource+Primary and a SyncTarget+Secondary node,
  and a resync dependency to some other device. After both nodes decided
  to do the resync, the other device finishes its resync process.
  At that time SyncSource already sent the P_SYNC_UUID packet, and
  already updated its peer disk state to Inconsistent.
  The SyncTarget node waits for the P_SYNC_UUID and sends a state packet
  to report the resync dependency change. That packet still carries
  a disk state of Outdated.

Impact:
  If application writes come in, during that time on the Primary node,
  those do not get replicated, and the out-of-sync counter gets increased.
  => The completion of resync is not detected on the primary node.
  => stalled.
  Those blocks get resync'ed with the next resync, since the are get
  marked as out-of-sync in the bitmap.

In order to fix this, we filter out that wrong state change in the
sanitize_state() function.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:01:05 +02:00
Lars Ellenberg 8c484ee491 drbd: use proc_create_data with explicit NULL argument
To document that we know about deprecation of proc_create,
even though we are not affected, as we don't use the ->data member,
open code proc_create_data(..., NULL);

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 00:59:00 +02:00