linux

Commit Graph

Author	SHA1	Message	Date
Ed Cashin	bbb44e30d0	aoe: improve handling of misbehaving network paths An AoE target can have multiple network ports used for AoE, and in the aoe driver, those are tracked by the aoetgt struct. These changes allow the aoe driver to handle network paths, or aoetgts, that are not working well, compared to the others. Paths that do not get responses despite the retransmission of AoE commands are marked as "tainted", and non-tainted paths are preferred. Meanwhile, the aoe driver attempts to "probe" the tainted path in the background by issuing reads of LBA 0 that are padded out to full (possibly jumbo-frame) size. If the probes get responses, then the path is "redeemed", and its taint is removed. This mechanism has been shown to be helpful in transparently handling and recovering from real-world network "brown outs" in ways that the earlier "shoot the help-needing target in the head" mechanism could not. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	b91316f2b7	aoe: return real minor number for static minors The value returned by the static minor device number number allocator is the real minor number, so it must be multiplied by the supported number of partitions per aoedev. Without this fix the support for systems without udev is incomplete, and the few users of aoe on such systems will have surprising results when device nodes names do not match the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	10935d052e	aoe: initialize sysminor to avoid compiler warning Because the minor_get and related functions use the return values for errors, the compiler doesn't know that sysminor will always either 1) be initialized in aoedev_by_aoeaddr by the call to minor_get, or 2) be unused as the "goto out" is executed. This patch avoids the compiler warning. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	e0b2bbab0b	aoe: make error messages more specific in static minor allocation For some special-purpose systems where udev isn't present, static allocation of minor numbers is desirable. This update distinguishes different failure scenarios, to help the user understand what went wrong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	60116cf773	aoe: remove call to request handler from I/O completion There is no need to call the request handler function in the I/O completion routine. The user impact of not doing it is a more "nice" aoe driver that is less susceptible to causing soft lockups. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	72837600ee	aoe: cleanup: correct comment for aoetgt nout A misplaced comment was attached to the nout member of the aoetgt. This change corrects the comment. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	7b6ccc5f97	aoe: increase default cap on outstanding AoE commands in the network The aoe driver will never be waiting for more than aoe_maxout AoE commands from a given remote network port on an AoE target. Increasing the cap increases performance. Users can tighten the setting to reduce the amount of memory used for handling AoE traffic or the network bandwidth used for AoE. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	0a41409c51	aoe: remove vestigial request queue allocation Before the aoe driver was an I/O request handler, it was a make_request-style block driver. Even so, there was a problem where sysfs expected a request queue to exist, so one was provided in commit `7135a71b19` ("aoe: allocate unused request_queue for sysfs"). During the transition to the request-handler style, a patch was merged that was based on a driver without the noop queue, and the noop queue remained in place after the patch was merged, even though a new functional queue was introduced by the patch, allocated through blk_init_queue. The user impact is a memory leak proportional to the number of AoE targets discovered. This patch removes the memory leak and cleans up vestiges of the old do-nothing queue from the aoeblk_gdalloc function. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:25 -08:00
Ed Cashin	fe7252bf51	aoe: copy fallback timing information on destination failover Commit f3b8e07af774 ("aoe: commands in retransmit queue use new destination on failure") omits the copying of the coarse-grained time when an AoE command was sent during the failover from one destination MAC address on the AoE target to another. The coarse-grained timing is only used when the system time changes or an unlikely length of time has passed since the sending of the AoE command. Users will not be impacted unless their system clock is very inaccurate or something unusual (e.g., 10 GbE link reset) happens during the period when the aoe driver is handling the failure of a port on the AoE target. Being effected will mean that an AoE target could be considered "down" too eagerly. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	519b77b032	aoe: update driver-internal version to 64+ Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	3fc9b03248	aoe: commands in retransmit queue use new destination on failure When one remote MAC address isn't working as a destination for AoE commands, the frames used to track information associated with the AoE commands are moved to a new aoetgt (defined by the tuple of {AoE major, AoE minor, target MAC address}). This patch makes sure that the frames on the queue for retransmits that need to be done are updated to use the new destination, so that retransmits will be sent through a working network path. Without this change, packets on the retransmit queue will be needlessly retransmitted to the unresponsive destination MAC, possibly causing premature target failure before there's time for the retransmit timer to run again, decide to retransmit again, and finally update the destination to a working MAC address on the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	5f0c9c48e7	aoe: use high-resolution RTTs with fallback to low-res These changes improve the accuracy of the decision about whether it's time to retransmit an AoE command by using the microsecond-resolution gettimeofday instead of jiffies. Because the system time can jump suddenly, the decision reverts to using jiffies if the high-resolution time difference is relatively large. Otherwise the AoE targets could be considered failed inappropriately. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	0d555ecfa4	aoe: manipulate aoedev network stats under lock With this bugfix in place the calculation of the criterion for "lateness" is performed under lock. Without the lock, there is a chance that one of the non-atomic operations performed on the round trip time statistics could be incomplete, such that an incorrect lateness criterion would be calculated. Without this change, the effect of the bug would be rare unecessary but benign retransmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	2292a7e109	aoe: err device: include MAC addresses for unexpected responses The /dev/etherd/err character device provides low-level information about normal but sometimes interesting AoE command retransmits and "unexpected responses", i.e., responses for packets that have already been retransmitted. This change adds MAC addresses to the messages about unexpected responses, so that when they occur, it's more easy to determine the network paths to which they belong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	3a0c40d2d2	aoe: improve network congestion handling The aoe driver already had some congestion handling, but it was limited in its ability to cope with the kind of congestion that can arise on more complex networks such as those involving paths through multiple ethernet switches. Some of the lessons from TCP's history of development can be applied to improving the congestion control and avoidance on AoE storage networks. These changes use familar concepts from Van Jacobson's "Congestion Avoidance and Control" paper from '88, without adding significant overhead. This patch depends on an upcoming patch that covers the failover case when AoE commands being retransmitted are transferred from one retransmit queue to another. Another upcoming patch increases the timing accuracy. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	667be1e757	aoe: provide ATA identify device content to user on request Make the aoe driver follow expected behavior when the user uses ioctl to get the ATA device identify information, allowing access to model, serial number, etc. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	cd220bf51f	aoe: update driver-internal version number to 60 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	a04b41cd2c	aoe: whitespace cleanup Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	d437962504	aoe: cleanup: remove unused ata_scnt function Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	90a2508d01	aoe: "payload" sysfs file exports per-AoE-command data transfer size The userland aoetools package includes an "aoe-stat" command that can display a "payload size" column when the aoe driver exports this information. Users can quickly see what amount of user data is transferred inside each AoE command on the network, network headers excluded. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	aa304fdefa	aoe: support larger I/O requests via aoe_maxsectors module param The GPFS filesystem is an example of an aoe user that requires the aoe driver to support I/O request sizes larger than the default. Most users will not need large I/O request sizes, because they would need to be split up into multiple AoE commands anyway. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	4ba9aa7f98	aoe: support the forgetting (flushing) of a user-specified AoE target Users sometimes want to cause the aoe driver to forget a particular previously discovered device when it is no longer online. The aoetools provide an "aoe-flush" command that users run to perform this administrative task. The changes below provide the support needed in the driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	1b8a1636ce	aoe: update cap on outstanding commands based on config query response The ATA over Ethernet config query response contains a "buffer count" field reflecting the AoE target's capacity to buffer incoming AoE commands. By taking the current value of this field into accound, we increase performance throughput or avoid network congestion, when the value has increased or decreased, respectively. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	4e78dd144b	aoe: print warning regarding a common reason for dropped transmits Dropped transmits are not common, but when they do occur, increasing the transmit queue length often helps. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	662a889608	aoe: describe the behavior of the "err" character device Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Linus Torvalds	9228ff9038	Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-block Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...	2012-12-17 13:39:11 -08:00
Alex Elder	b8f5c6edca	rbd: don't use ENOTSUPP ENOTSUPP is not a standard errno (it shows up as "Unknown error 524" in an error message). This is what was getting produced when the the local rbd code does not implement features required by a discovered rbd image. Change the error code returned in this case to ENXIO. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-17 12:07:32 -06:00
Alex Elder	2fd82b9e92	rbd: get rid of RBD_MAX_SEG_NAME_LEN RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object name (i.e., one of the objects providing storage backing an rbd image). Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to define the maximum length of any object name in an osd request. Right now they disagree, with RBD_MAX_SEG_NAME_LEN being too big. There's no real benefit at this point to defining the rbd object name length limit separate from any other object name, so just get rid of RBD_MAX_SEG_NAME_LEN and use MAX_OBJ_NAME_SIZE in its place. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-17 08:37:29 -06:00
Alex Elder	42382b709b	rbd: do not allow remove of mounted-on image There is no check in rbd_remove() to see if anybody holds open the image being removed. That's not cool. Add a simple open count that goes up and down with opens and closes (releases) of the device, and don't allow an rbd image to be removed if the count is non-zero. Protect the updates of the open count value with ctl_mutex to ensure the underlying rbd device doesn't get removed while concurrently being opened. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-12-17 08:36:59 -06:00
Roger Pau Monne	7dc341175a	xen-blkback: implement safe iterator for the list of persistent grants Change foreach_grant iterator to a safe version, that allows freeing the element while iterating. Also move the free code in free_persistent_gnts to prevent freeing the element before the rb_next call. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-12-07 15:13:09 -05:00
Lars Ellenberg	d2ec180c23	drbd: update Kconfig to match current dependencies We no longer need the connector. But we need libcrc32c. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:08:29 +01:00
Philipp Reisner	ef86b77957	drbd: Fix drbdsetup wait-connect, wait-sync etc... commands This was introduces when moving the code over from the 8.3 codebase with commit `328e0f125b` Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:04:34 +01:00
Philipp Reisner	13c76aba78	drbd: close race between drbd_set_role and drbd_connect drbd_set_role(, R_PRIMARY, ) does the state change to Primary, some more housekeeping, and possibly generates a new UUID set. All of this holding the "state_mutex". The connection handshake involves sending of various state information, including the current data generation UUID set, and two connection state changes from C_WF_CONNECTION to C_WF_REPORT_PARAMS further to a number of different outcomes, resync being one of them. If the connection handshake happens between the state change to Primary and the generation of the new UUIDs, the resync decision based on the old UUID set may be confused, depending on circumstances. Make sure that, before we do the handshake, any promotion to Primary role will either be complete (including the housekeeping stuff), or can see, and serialize with, the ongoing handshake, based on the "STATE_SENT" bit, which is set when we start the handshake, and cleared only when we leave C_WF_REPORT_PARAMS again. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:00:33 +01:00
Lars Ellenberg	691631c065	drbd: respect no-md-barriers setting also when changed online via disk-options We need to propagate the configuration into the flag bits, or it won't be effective. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 13:00:04 +01:00
Philipp Reisner	298307ed1d	drbd: Remove obsolete check Smatch complained about it this redundanct check. The check was introduced in 2006-09-13. On 2007-07-24 the body of the function was enclosed by get_ldev()/put_ldev() reference counting. Since then the check is useless and miss leading. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-12-06 12:09:55 +01:00
Jens Axboe	84ad6845fb	Merge branch 'stable/for-jens-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.8/drivers	2012-12-01 09:42:41 +01:00
Jens Axboe	2cecb73098	drbd: fixup after wait_even_lock_irq() addition to generic code Compiling drbd yields: drivers/block/drbd/drbd_state.c: In function ‘_conn_request_state’: drivers/block/drbd/drbd_state.c:1804:5: error: macro "wait_event_lock_irq" passed 4 arguments, but takes just 3 drivers/block/drbd/drbd_state.c:1801:3: error: ‘wait_event_lock_irq’ undeclared (first use in this function) drivers/block/drbd/drbd_state.c:1801:3: note: each undeclared identifier is reported only once for each function it appears in drivers/block/drbd/drbd_state.c: At top level: drivers/block/drbd/drbd_state.c:1734:1: warning: ‘_conn_rq_cond’ defined but not used [-Wunused-function] Due to drbd having copied the MD definition for wait_event_lock_irq() as well. Kill them. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-30 21:20:15 +01:00
Lukas Czerner	7b5a35225b	loop: Limit the number of requests in the bio list Currently there is not limitation of number of requests in the loop bio list. This can lead into some nasty situations when the caller spawns tons of bio requests taking huge amount of memory. This is even more obvious with discard where blkdev_issue_discard() will submit all bios for the range and wait for them to finish afterwards. On really big loop devices and slow backing file system this can lead to OOM situation as reported by Dave Chinner. With this patch we will wait in loop_make_request() if the number of bios in the loop bio list would exceed 'nr_congestion_on'. We'll wake up the process as we process the bios form the list. Some threshold hysteresis is in place to avoid high frequency oscillation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-30 11:48:05 +01:00
Roger Pau Monne	07c540a0b5	xen-blkfront: free allocated page Free the page allocated for the persistent grant. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-26 14:58:11 -05:00
Roger Pau Monne	4d4f270f18	xen-blkback: move free persistent grants code Move the code that frees persistent grants from the red-black tree to a function. This will make it easier for other consumers to move this to a common place. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-26 14:58:11 -05:00
Selvan Mani	836413e8c7	mtip32xx: Fix padding issue Hi Jens, Another tiny patch. Removed __packed before the struct smart_attr and added __packed at end of the structure to fix padding issue. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Ed Cashin	11cfb6ff73	aoe: avoid running request handler on plugged queue Calling the request handler directly on a plugged queue defeats the performance improvements provided by the plugging mechanism. Use the __blk_run_queue function instead of calling the request handler directly, so that we don't interfere with the block layer's ability to plug the queue. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Wei Yongjun	298d80152c	mtip32xx: fix potential NULL pointer dereference in mtip_timeout_function() The dereference to port should be moved below the NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Jens Axboe	7c5d62388e	mtip32xx: fix shift larger than type warning If we're building a 32-bit kernel and CONFIG_LBADF isn't set, sector_t is 32-bits wide. The shifts by 32 and 40 are thus larger than we support. Cast the sector offset to a u64 to avoid these warnings. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Selvan Mani	4b9e884523	mtip32xx: Fix incorrect mask used for erase mode Previous commit use value 3 for erasemode mask. Changing the mask to correct value to 2 Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Selvan Mani	eda4531492	mtip32xx: Fix to make lba address correct in big-endian systems Earlier lba address was assigned directly to lba_low and lba_low_ex, which would result in a different number (bytes reversed) in big-endian systems. Now assigning lba address byte-by-byte to fis. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Selvan Mani	3208795e61	mtip32xx: fix potential crash on SEC_ERASE_UNIT The mtip driver lifted this code from elsewhere and then added a special handling check for SEC_ERASE_UNIT. If the caller tries to do a security erase but passes no output data for the command then outbuf is not allocated and the driver duly explodes. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:54 +01:00
Jiri Kosina	eac7cc52c6	floppy: destroy floppy workqueue before cleaning up the queue We need to first destroy the floppy_wq workqueue before cleaning up the queue. Otherwise we might race with still pending work with the workqueue, but all the block queue already gone. This might lead to various oopses, such as CPU 0 Pid: 6, comm: kworker/u:0 Not tainted 3.7.0-rc4 #1 Bochs Bochs RIP: 0010:[<ffffffff8134eef5>] [<ffffffff8134eef5>] blk_peek_request+0xd5/0x1c0 RSP: 0000:ffff88000dc7dd88 EFLAGS: 00010092 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88000f602688 RSI: ffffffff81fd95d8 RDI: 6b6b6b6b6b6b6b6b RBP: ffff88000dc7dd98 R08: ffffffff81fd95c8 R09: 0000000000000000 R10: ffffffff81fd9480 R11: 0000000000000001 R12: 6b6b6b6b6b6b6b6b R13: ffff88000dc7dfd8 R14: ffff88000dc7dfd8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001e11000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/u:0 (pid: 6, threadinfo ffff88000dc7c000, task ffff88000dc5ecc0) Stack: 0000000000000000 0000000000000000 ffff88000dc7ddb8 ffffffff8134efee ffff88000dc7ddb8 0000000000000000 ffff88000dc7dde8 ffffffff814aef3c ffffffff81e75d80 ffff88000dc0c640 ffff88000fbfb000 ffffffff814aed90 Call Trace: [<ffffffff8134efee>] blk_fetch_request+0xe/0x30 [<ffffffff814aef3c>] redo_fd_request+0x1ac/0x400 [<ffffffff814aed90>] ? start_motor+0x130/0x130 [<ffffffff8106b526>] process_one_work+0x136/0x450 [<ffffffff8106af65>] ? manage_workers+0x205/0x2e0 [<ffffffff8106bb6d>] worker_thread+0x14d/0x420 [<ffffffff8106ba20>] ? rescuer_thread+0x1a0/0x1a0 [<ffffffff8107075a>] kthread+0xba/0xc0 [<ffffffff810706a0>] ? __kthread_parkme+0x80/0x80 [<ffffffff818b553a>] ret_from_fork+0x7a/0xb0 [<ffffffff810706a0>] ? __kthread_parkme+0x80/0x80 Code: 0f 84 c0 00 00 00 83 f8 01 0f 85 e2 00 00 00 81 4b 40 00 00 80 00 48 89 df e8 58 f8 ff ff be fb ff ff ff fe ff ff <49> 8b 1c 24 49 39 dc 0f 85 2e ff ff ff 41 0f b6 84 24 28 04 00 RIP [<ffffffff8134eef5>] blk_peek_request+0xd5/0x1c0 RSP <ffff88000dc7dd88> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:54 +01:00
Akinobu Mita	d48c152a41	cciss: use check_signature() Use check_signature() to find a signature in the mmio address. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:28:34 +01:00
Akinobu Mita	1f118bc479	cciss: cleanup bitops usage - Remove unnecessary correction of bit and address - Use BITS_TO_LONGS macro to calculate bitmap size - Use bitmap_zero() Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:27:22 +01:00
Keith Busch	2b19603415	NVMe: Initialize iod nents to 0 For commands that do not map a scatter list, we need to initilaize the iod's number of sg entries (nents) to 0 and not unmap in this case. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:50 -05:00
Keith Busch	6ecec74520	NVMe: Define SMART log This data structure is defined in the NVMe specification. It's not used by the kernel, but is available for use by userspace software. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:50 -05:00
Keith Busch	08df1e0565	NVMe: Add result to nvme_get_features nvme_get_features() was not returning the result. Add a parameter to return the result in (similar to nvme_set_features()) and change all callers. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Keith Busch	f4f117f64b	NVMe: Set result from user admin command The ioctl data structure includes space for the 'result' of the admin command to be returned; it just wasn't filled in. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Keith Busch	3295874b60	NVMe: End queued bio requests when freeing queue If the queue has bios queued on it when it is freed, bio_endio() must be called for them first. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Keith Busch	859361a228	NVMe: Free cmdid on nvme_submit_bio error nvme_map_bio() is called after the cmdid is allocated, so we have to free the cmdid before returning from nvme_submit_bio() if nvme_map_bio() returned an error. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2012-11-13 09:13:49 -05:00
Jens Axboe	8d0ff3924b	Merge branch 'stable/for-jens-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.8/drivers	2012-11-12 09:18:47 -07:00
Akinobu Mita	f1d6a328bb	drbd: use copy_highpage Use copy_highpage() to copy from one page to another. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:22:26 +01:00
Lars Ellenberg	ed635cb067	drbd: if the replication link breaks during handshake, keep retrying The 8.3.12 commit drbd: Bugfix for the connection behavior fixes a "wasted established connection", if a former connection attempt failed during its early stages. However it opened a window for a regression, if a connection attempt fails during its last stages. The result was a terminated receiver thread, that left behind the supposedly transient "C_UNCONNECTED" state. Any later requests to change the connection state fail, as they wait for the connection state to "stabilize". Fix: short circuit and keep retrying to restablish a new connection, if we don't reach C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:22:19 +01:00
Jing Wang	063eacf88c	drbd: check return of kmalloc in receive_uuids Signed-off-by: Jing Wang <windsdaemon@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:22:10 +01:00
Philipp Reisner	986836503e	Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6	2012-11-09 14:20:23 +01:00
Philipp Reisner	328e0f125b	drbd: Broadcast sync progress no more often than once per second Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:43 +01:00
Philipp Reisner	518a4d53b2	drbd: don't try to clear bits once the disk has failed If the disk has failed already, there is no point trying to change the bitmap. drbd_set_out_of_sync() already had this safeguard, time to add it to drbd_set_in_sync() as well. This also prevents some warning messages, like FIXME asender in bm_change_bits_to, bitmap locked for 'detach' by worker if our disk fails during resync, while there are some resync acks queued up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:42 +01:00
Philipp Reisner	fd0017c124	drbd: fix regression: potential NULL pointer dereference recent commit drbd: always write bitmap on detach introduced a bitmap writeout during detach, which obviously needs some meta data device to write to. Unfortunately, that same error path may be taken if we fail to attach, e.g. due to UUID mismatch, after we changed state to D_ATTACHING, but before the lower level device pointer is even assigned. We need to test for presence of mdev->ldev. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:42 +01:00
Philipp Reisner	4035e4c2eb	drbd: Fix clearing of MDF_AL_DISABLED Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:42 +01:00
Lars Ellenberg	42839f6536	drbd: log request sector offset and size for IO errors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:41 +01:00
Lars Ellenberg	edc9f5eb7a	drbd: always write bitmap on detach If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:41 +01:00
Lars Ellenberg	e34b677d09	drbd: wait for meta data IO completion even with failed disk, unless force-detached The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:40 +01:00
Lars Ellenberg	8747d30af9	drbd: a few more GFP_KERNEL -> GFP_NOIO This has not yet been observed, but conceivably, when using GFP_KERNEL allocations from drbd_md_sync(), drbd_flush_after_epoch() or receive_SyncParam(), we could trigger additional IO to our own device, or an other device in a criss-cross setup, and end up in a local deadlock, or potentially a distributed deadlock in a criss-cross setup involving the peer blocked in a similar way waiting for us to make progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:40 +01:00
Lars Ellenberg	bc891c9ae3	drbd: fix potential deadlock during bitmap (re-)allocation The former comment arguing that GFP_KERNEL was good enough was wrong: it did not take resize into account at all, and assumed the only path leading here was the normal attach on a still secondary device, so no deadlock would be possible. Both resize on a Primary, or attach on a diskless Primary, could potentially deadlock. drbd_bm_resize() is called while IO to the respective device is suspended, so we must use GFP_NOIO to avoid potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:39 +01:00
Lars Ellenberg	a506c13a4d	drbd: use list_move_tail instead of list_del/list_add_tail Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:39 +01:00
Philipp Reisner	1b6dd252e6	drbd: panic on delayed completion of aborted requests "aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed error completion is OK, though we still will complain noisily about it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:39 +01:00
Philipp Reisner	a3025a2737	drbd: Fix comparison of is_valid_transition()'s return code is_valid_transition() might return SS_NOTHING_TO_DO. The condition function _req_st_cond() returned SS_NOTHING_TO_DO, which caused the wait_event to abort too early. Therefore drbd_req_state() did not consume the next CL_ST_CHG_SUCCESS or SS_CW_FAILED_BY_PEER causing serve disruption of the state machine logic... Detaching from a single volue was one way to trigger this bug. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:38 +01:00
Philipp Reisner	1393b59f8c	drbd: Remove duplicate code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:38 +01:00
Lars Ellenberg	70f17b6bd1	drbd: differentiate early and later "postponing" of requests We use the RQ_POSTPONED flag to mark a request for several reasons. It may be a conflicting request in a dual-primary setup, where conflict detection and resolution on the peer decided that this request needs to be re-submitted, it needs to re-enter drbd_make_request() to fix the data divergence caused by these conflicting, partially overlapping, quasi-simultaneous requests. In this case we need to mark the corresponding area as out-of-sync, before we call drbd_al_complete_io(). We also use the RQ_POSTPONED flag to just "push back" a request, before even processing it, if IO is suspended for some reason. In this case, as this request was neither submitted nor sent yet, we must not touch the bitmap. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:37 +01:00
Philipp Reisner	76590cd1fc	drbd: Fix postponed requests A postponed request might has RQ_IN_ACT_LOG already set, but is POSTPONED before it gets something in the RQ_LOCAL_MASK set. Up to now this caused a left-over active extent. Fix that by only testing for the RQ_IN_ACT_LOG bit in drbd_req_destroy() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:37 +01:00
Philipp Reisner	19fffd7b03	drbd: Call drbd_md_sync() explicitly after a state change on the connection Without this, the meta-data gets updates after 5 seconds by the md_sync_timer. Better to do it immeditaly after a state change. If the asender detects a network failure, it may take a bit until the worker processes the according after-conn-state-change work item. The worker might be blocked in sending something, i.e. it takes until it gets into its timeout. That is 6 seconds by default which is longer than the 5 seconds of the md_sync_timer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:11:08 +01:00
Philipp Reisner	d76440181d	drbd: Fix postponed requests * Postponed requests should not set or clear out-of-sync marks * When a request gets postponed we need to drop its reference mdev->local_cnt (put_ldev()). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:24 +01:00
Philipp Reisner	4ae98b4db3	drbd: Imporve the error reporting of failed conn state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:24 +01:00
Philipp Reisner	7970201177	drbd: Fix the way the STATE_SENT bit is cleared With merging the commit 'drbd: Delay/reject other state changes while establishing a connection' the condition check for clearing the flag was wrong. Move the bit clearing to the __drbd_set_state() function in order to have it already cleared for the other parts of the function. I.e. clearing the susp_fen in the after_state_ch() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:23 +01:00
Philipp Reisner	07fc96197a	drbd: Do not check aspects that are not subject to change in _conn_requests_state() When _conn_requests_state() is used to change other parts of the state than the connection, do not check for a valid connection transition. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:23 +01:00
Philipp Reisner	892fdd1aee	drbd: Improve readability of IO resuming after freeze due to no data access The previous way of doing the state change was also okay since the state change on the susp flag gets propagated from the mdev to the tconn. Fortunately all this goes away in drbd-9.0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:22 +01:00
Philipp Reisner	88f79ec4ae	drbd: Fix IO resuming after connection was established while executing the fence handler Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:22 +01:00
Lars Ellenberg	b792b655cd	drbd: fix potential list_add corruption If the md_sync_timer triggers a second time, while the work queued during the first time is still pending, this could result in list_add() of an already added item, and corrupt the work item list. This likely only triggered because of the erroneous batch-dequeueing of work items fixed with drbd: dequeue single work items in wait_for_work() Still, skip queueing if md_sync_work is already queued. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:21 +01:00
Lars Ellenberg	bc317a9ecd	drbd: dequeue single work items in wait_for_work() As long as we still use drbd_queue_work_front(), we must only dequeue the single first item during normal operation. The comment in drbd_worker() even says so, but bc8a5a1 drbd: remove struct drbd_tl_epoch objects (barrier works) introduced the batch dequeueing again via list_splice_init() in wait_for_work(). Change back to list_move() of the first item, if any. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:21 +01:00
Lars Ellenberg	c02abda2b2	drbd: mutex_unlock "... must no be used in interrupt context" Documentation of mutex_unlock says we must not use it in interrupt context. So do not call it while holding the spin_lock_irq, but give up the spinlock temporarily. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:21 +01:00
Philipp Reisner	c1fd29a11f	drbd: Fix a race condition that can lead to a BUG() If the preconditions for a state change change after the wait_event() we might hit the BUG() statement in conn_set_state(). With holding the spin_lock while evaluating the condition AND until the actual state change we ensure the the preconditions can not change anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:20 +01:00
Lars Ellenberg	0ee98e2eb0	drbd: temporarily suspend io in drbd_adm_disk_opts drbd_adm_disk_opts() does wait_event(mdev->al_wait, lc_try_lock(mdev->act_log)); drbd_al_shrink(mdev); If the device is very busy, this can take a very long time to succeed. Fix this by temporarily suspending IO, then quickly change the settings, and resume. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:20 +01:00
Lars Ellenberg	4eb9b3cba0	drbd: don't send out P_BARRIER with stale information We must only send P_BARRIER for epochs we actually sent P_DATA in. If we (re-)establish a connection, we reinitialized the send.current_epoch_nr, but forgot to reset send.current_epoch_writes. This could result in a spurious P_BARRIER with stale epoch information, and a disconnect/reconnect cycle once the then "unexpected" P_BARRIER_ACK is received: BAD! BarrierAck #28823 received, expected #28829! Introduce re_init_if_first_write() and maybe_send_barrier() helpers, and call them appropriately for read/write/set-out-of-sync requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:19 +01:00
Lars Ellenberg	08332d7325	drbd: properly call drbd_rs_cancel_all() in drbd_disconnected() drbd_disconnected() is supposed to clear the resync lru cache, by calling drbd_rs_cancel_all(). We must do so before we call drbd_flush_workqueue(), as at least the callback w_restart_disk_io() may wait for resync progres, and would otherwise deadlock. drbd_finish_peer_reqs() may again populate that cache, which will then potentially be stale after the next resync handshake and bitmap exchange, we have to do it again after that. A stale resync lru cache causes no harm but ugly messages like this: BAD! sector=196608s enr=6 rs_left=-256 rs_failed=0 count=256 cstate=SyncTarget Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:19 +01:00
Philipp Reisner	155522df5b	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:19 +01:00
Philipp Reisner	b66623e33e	drbd: Avoid NetworkFailure state during disconnect Disconnecting is a cluster wide state change. In case the peer node agrees to the state transition, it sends back the fact on the meta-data connection and closes both sockets. In case the node node that initiated the state transfer sees the closing action on the data-socket, before the P_STATE_CHG_REPLY packet, it was going into one of the network failure states. At least with the fencing option set to something else thatn "dont-care", the unclean shutdown of the connection causes a short IO freeze or a fence operation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:18 +01:00
Philipp Reisner	39a1aa7f49	drbd: Protect accesses to the uuid set with a spinlock There is at least the worker context, the receiver context, the context of receiving netlink packts. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:08:04 +01:00
Philipp Reisner	fef45d297e	drbd: Write all pages of the bitmap after an online resize We need to write the whole bitmap after we moved the meta data due to an online resize operation. With the support for one peta byte devices bitmap IO was optimized to only write out touched pages. This optimization must be turned off when writing the bitmap after an online resize. This issue was introduced with drbd-8.3.10. The impact of this bug is that after an online resize, the next resync could become larger than expected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:51 +01:00
Philipp Reisner	5af2e8ce2b	drbd: Fix completion of requests while the device is suspended In various places (E.g. CONNECTION_LOST_WHILE_PENDING) the RQ_COMPLETION_SUSP mask is passed in the clear set to mod_rq_state(). The issue was that it tried to clear the RQ_COMPLETION_SUSP bit out of the state mask first, and eventuelly set it afterwards, in the drbd_req_put_completion_ref() function. Fixed that by moving the reference getting out of drbd_req_put_completion_ref() into the mod_rq_state(), before the place where the extra reference might be put. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:50 +01:00
Andreas Gruenbacher	715306f69d	drbd: Don't unregister socket state_change callback from within the callback Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:50 +01:00
Lars Ellenberg	eb12010e9a	drbd: disambiguation, s/ERR_DISCARD/ERR_DISCARD_IMPOSSIBLE/ If for some reason (typically "split-brained" cluster manager) drbd replica data has diverged, we can chose a victim, and reconnect using "--discard-my-data", causing the victim to become sync-target, fetching all changed blocks from the peer. If we are Primary, we are potentially in use, and we refuse to "roll back" changes to the data below the page cache and other users. Rename the error symbol for this to ERR_DISCARD_IMPOSSIBLE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:50 +01:00
Lars Ellenberg	427c0434fc	drbd: disambiguation, s/DISCARD_CONCURRENT/RESOLVE_CONFLICTS/ We don't discard anything here, really. We resolve conflicting, concurrent writes to overlapping data blocks. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:49 +01:00
Lars Ellenberg	d4dabbe22d	drbd: disambiguation, s/P_DISCARD_WRITE/P_SUPERSEDED/ To avoid confusion with REQ_DISCARD aka TRIM, rename our "discard concurrent write acks" from P_DISCARD_WRITE to P_SUPERSEDED. At the same time, rename the drbd request event DISCARD_WRITE to CONFLICT_RESOLVED. It already triggers both successful completion or restart of the request, depending on our RQ_POSTPONED flag. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:49 +01:00
Lars Ellenberg	232fd3f4a0	drbd: cleanup, drop unused struct Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:48 +01:00
Lars Ellenberg	46e21bbadb	drbd: NEG_ACK does not imply a barrier-ack Don't drop a request from the transfer log just because it was NEG_ACKED. We need it around to be able to verify P_BARRIER_ACKs against the transver log. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:48 +01:00
Lars Ellenberg	99b4d8fe6d	drbd: only start a new epoch, if the current epoch contains writes Almost all code paths calling start_new_tl_epoch() guarded it with if (... current_tle_writes > 0 ... ). Just move that inside start_new_tl_epoch(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:47 +01:00
Philipp Reisner	8a0bab2a6d	drbd: Finish requests that completed while IO was frozen Requests of an acked epoch are stored on the barrier_acked_requests list. In case the private bio of such a request completes while IO on the drbd device is suspended [req_mod(completed_ok)] then the request stays there. When thawing IO because the fence_peer handler returned, then we use tl_clear() to apply the connection_lost_while_pending event to all requests on the transfer-log and the barrier_acked_requests list. Up to now the connection_lost_while_pending event was not applied on requests on the barrier_acked_requests list. Fixed that. I.e. now the connection_lost_while_pending and resend events are applied to requests on the barrier_acked_requests list. For that it is necessary that the resend event finishes (local only) READS correctly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:47 +01:00
Lars Ellenberg	e959d08d3e	drbd: Fix a potential issue with the DISCARD_CONCURRENT flag The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:47 +01:00
Lars Ellenberg	519b6d3eac	drbd: fix drbd wire compatibility for empty flushes DRBD has a concept of request epochs or reorder-domains, which are separated on the wire by P_BARRIER packets. Older DRBD is not able to handle zero-sized requests at all, so we need to map empty flushes to these drbd barriers. These are the equivalent of empty flushes, and by default trigger flushes on the receiving side anyways (unless not supported or explicitly disabled), so there is no need to handle this differently in newer drbd either. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:46 +01:00
Philipp Reisner	80c6eed49d	drbd: More random to the connect logic Since the listening socket is open all the time, it was possible to get into stable "initial packet S crossed" loops. * when both sides realize in the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with repeated "packet S crossed" messages. * when both sides do not realize with the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with alternating "packet S crossed" "packet M crossed" messages. In order to break out these stable loops randomize the behaviour if such a crossing of P_INITIAL_DATA or P_INITIAL_META packets is detected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:46 +01:00
Philipp Reisner	92f14951c0	drbd: Try to connec to peer only once per cycle Since now our listening socket is open all the time we will get connection tries of the peer always in. No need to try it three times. This is valid when connecting to older peers as well, it simply increases the probability that the new version DRBD will accept a connection instead that it will establish one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:45 +01:00
Philipp Reisner	b666dbf819	drbd: Remove redundant and wrong test for NULL simplification in conn_connect() Since the drbd_socket_okay() function itself tests if the the socket is NULL, the explicit test "if (sock.socket && &msock.socket)" was redundent. Apart from that the address opperator ('&') before msock.socket rendered the test pointless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:45 +01:00
Philipp Marek	3174f8c504	drbd: pass some more information to userspace. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:45 +01:00
Lars Ellenberg	81a3537a97	drbd: announce FLUSH/FUA capability to upper layers In 8.4, we may have bios spanning two activity log extents. Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:44 +01:00
Lars Ellenberg	58ffa580a7	drbd: introduce stop-sector to online verify We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Had to bump the protocol version differently, we are now 101. Added verify_can_do_stop_sector() { protocol >= 97 && protocol != 100; } Also, the return value convention for worker callbacks has changed, we returned "true/false" for "keep the connection up" in 8.3, we return 0 for success and <= for failure in 8.4. Affected: receive_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-09 14:05:32 +01:00
Lars Ellenberg	970fbde1f1	drbd: flush drbd work queue before invalidate/invalidate remote If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:41 +01:00
Lars Ellenberg	6f1a656325	drbd: call local-io-error handler early In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:40 +01:00
Lars Ellenberg	a324896b17	drbd: do not reset rs_pending_cnt too early Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:40 +01:00
Lars Ellenberg	8a94317071	drbd: reset congestion information before reporting it in /proc/drbd We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:39 +01:00
Lars Ellenberg	6f3465ed82	drbd: report congestion if we are waiting for some userland callback If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:39 +01:00
Lars Ellenberg	0c84966601	drbd: differentiate between normal and forced detach Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:39 +01:00
Lars Ellenberg	bf709c8552	drbd: cleanup, remove two unused global flags The two unused "global flags" in 8.3 are "per volume" flags in 8.4. Still, they are unused, so lose them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:38 +01:00
Lars Ellenberg	3b9ef85e05	drbd: fix null pointer dereference with on-congestion policy when diskless We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:38 +01:00
Lars Ellenberg	27012382bc	drbd: take error path in drbd_adm_down if interrupted by signal drbd_adm_down() does adm_detach(), which can fail with various error codes, or be interrupted by a signal. The interrupted by signal case was not properly handled, leading to block drbd0: ASSERT( mdev->state.disk == D_DISKLESS && mdev->state.conn == C_STANDALONE ) in drbd/drbd_worker.c and further to destroying objects while still in use, and resulting crashes. Detect the interruption, and take the error path out. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:37 +01:00
Lars Ellenberg	9a278a7906	drbd: allow read requests to be retried after force-detach Sometimes, a lower level block device turns into a tar-pit, not completing requests at all, not even doing error completion. We can force-detach from such a tar-pit block device, either by disk-timeout, or by drbdadm detach --force. Queueing for retry only from the request destruction path (kref hit 0) makes it impossible to retry affected read requests from the peer, until the local IO completion happened, as the locally submitted bio holds a reference on the drbd request object. If we can only complete READs when the local completion finally happens, we would not need to force-detach in the first place. Instead, queue for retry where we otherwise had done the error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:37 +01:00
Lars Ellenberg	934722a2db	drbd: __req_mod: make DISCARD_WRITE and independend case cherry-picked and adapted from drbd 9 devel branch This looks cleaner to me, and also gets rid of the other ugly if-inside-case-fall-through. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:37 +01:00
Lars Ellenberg	a0d856dfae	drbd: base completion and destruction of requests on ref counts cherry-picked and adapted from drbd 9 devel branch The logic for when to get or put a reference is in mod_rq_state(). To not get confused in the freeze/thaw respectively resend/restart paths, or when cleaning up requests waiting for P_BARRIER_ACK, this also introduces additional state flags: RQ_COMPLETION_SUSP, and RQ_EXP_BARR_ACK. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:36 +01:00
Lars Ellenberg	b406777e64	drbd: introduce completion_ref and kref to struct drbd_request cherry-picked and adapted from drbd 9 devel branch completion_ref will count pending events necessary for completion. kref is for destruction. This only introduces these new members of struct drbd_request, a followup patch will make actual use of them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:36 +01:00
Lars Ellenberg	5df69ece6e	drbd: __drbd_make_request() is now void The previous commit causes __drbd_make_request() to always return 0. Change it to void. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:35 +01:00
Lars Ellenberg	5da9c83644	drbd: better separate WRITE and READ code paths in drbd_make_request cherry-picked and adapted from drbd 9 devel branch READs will be interesting to at most one connection, WRITEs should be interesting for all established connections. Introduce some helper functions to hopefully make this easier to follow. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:35 +01:00
Lars Ellenberg	b6dd1a8976	drbd: remove struct drbd_tl_epoch objects (barrier works) cherry-picked and adapted from drbd 9 devel branch DRBD requests (struct drbd_request) are already on the per resource transfer log list, and carry their epoch number. We do not need to additionally link them on other ring lists in other structs. The drbd sender thread can recognize itself when to send a P_BARRIER, by tracking the currently processed epoch, and how many writes have been processed for that epoch. If the epoch of the request to be processed does not match the currently processed epoch, any writes have been processed in it, a P_BARRIER for this last processed epoch is send out first. The new epoch then becomes the currently processed epoch. To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK, the sender thread also needs to handle the case when the current epoch was closed already, but no new requests are queued yet, and send out P_BARRIER as soon as possible. This is done by comparing the per resource "current transfer log epoch" (tconn->current_tle_nr) with the per connection "currently processed epoch number" (tconn->send.current_epoch_nr), while waiting for new requests to be processed in wait_for_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:35 +01:00
Lars Ellenberg	d5b27b01f1	drbd: move the drbd_work_queue from drbd_socket to drbd_connection cherry-picked and adapted from drbd 9 devel branch In 8.4, we don't distinguish between "resource work" and "connection work" yet, we have one worker for both, as we still have only one connection. We only ever used the "data.work", no need to keep the "meta.work" around. Move tconn->data.work to tconn->sender_work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:34 +01:00
Lars Ellenberg	8c0785a5c9	drbd: allow to dequeue batches of work at a time cherry-picked and adapted from drbd 9 devel branch In 8.4, we still use drbd_queue_work_front(), so in normal operation, we can not dequeue batches, but only single items. Still, followup commits will wake the worker without explicitly queueing a work item, so up() is replaced by a simple wake_up(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:34 +01:00
Lars Ellenberg	b379c41ed7	drbd: transfer log epoch numbers are now per resource cherry-picked from drbd 9 devel branch. In preparation of multiple connections, the "barrier number" or "epoch number" needs to be tracked per-resource, not per connection. The sequence number space will not be reset anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:33 +01:00
Lars Ellenberg	9d05e7c4e7	drbd: rename drbd_restart_write to drbd_restart_request Meanwhile, this is used to restart failed READ requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:33 +01:00
Lars Ellenberg	629663c942	drbd: fix wrong assert in completion/retry path of failed local reads Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:33 +01:00
Lars Ellenberg	ab53b90e89	drbd: fix local read error hung forever The commit drbd: simplify retry path of failed READ requests simplified it too much: it just did not do anything for local read errors. Add the missing req_may_be_completed_not_susp() to the READ_COMPLETED_WITH_ERROR case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:32 +01:00
Lars Ellenberg	1b6f19740d	drbd: fix access of unallocated pages and kernel panic BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:32 +01:00
Philipp Reisner	7a426fd8d5	drbd: Keep the listening socket open while trying to connect to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:31 +01:00
Philipp Reisner	1f3e509b76	drbd: pull prepare_listen_socket() out of drbd_wait_for_connect() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:31 +01:00
Philipp Reisner	9a51ab1c1b	drbd: New disk option al-updates By disabling al-updates one might increase performace. The price for that is that in case a crashed primary (that had al-updates disabled) is reintegraded, it will receive a full-resync instead of a bitmap based resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:31 +01:00
Andreas Gruenbacher	26ec92871b	drbd: Stop using NLA_PUT*(). These macros no longer exist in kernel version v3.5-rc1. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:30 +01:00
Philipp Reisner	7e0f096b8d	drbd: Remove drbd_accept() and use kernel_accept() instead Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:30 +01:00
Philipp Reisner	2820fd3969	drbd: Move the call to listen() out of drbd_accept() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:29 +01:00
Philipp Reisner	c5b005ab70	drbd: use bitmap_parse instead of __bitmap_parse The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:29 +01:00
Lars Ellenberg	1882e22df7	drbd: grammar fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:29 +01:00
Lars Ellenberg	f66ee69746	drbd: bm_page_async_io: properly initialize page->private If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:28 +01:00
Lars Ellenberg	a220d29180	drbd: allow bitmap to change during writeout from resync_finished Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:28 +01:00
Lars Ellenberg	5016b82a49	drbd: fix race between drbdadm invalidate/verify and finishing resync When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If in that very moment a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:27 +01:00
Lars Ellenberg	07be15b12c	drbd: fix resend/resubmit of frozen IO DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith), or because we lost access to data (on-no-data-accessible suspend-io). Resuming from there (re-connect, or re-attach, or explicit admin intervention) should "just work". Unfortunately, if the re-attach/re-connect did not happen within the timeout, since the commit drbd: Implemented real timeout checking for request processing time if so configured, the request_timer_fn() would timeout and detach/disconnect virtually immediately. This change tracks the most recent attach and connect, and does not timeout within <configured timeout interval> after attach/connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:27 +01:00
Philipp Reisner	3ea35df83f	drbd: fix spelling, remove boring development log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:27 +01:00
Philipp Reisner	e4bad1bcac	drbd: Ensure that data_size is not 0 before using data_size-1 as index This could be exploited by a peer which runs modified code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:26 +01:00
Philipp Reisner	a1096a6e9d	drbd: Delay/reject other state changes while establishing a connection Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:26 +01:00
Philipp Reisner	27eb13e99b	drbd: Fixed processing of disk-barrier, disk-flushes and disk-drain Since drbd_bump_write_ordering() is called in the attaching process while the disk state is D_ATTACHING, it was not considering these three flags during attach. A call to this function was missing form drbd_adm_disk_opts(). Fixed both issues. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:25 +01:00
Lars Ellenberg	9ed57dcbda	drbd: ignore volume number for drbd barrier packet exchange Transfer log epochs, and therefore P_BARRIER packets, are per resource, not per volume. We must not associate them with "some random volume". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:25 +01:00
Lars Ellenberg	648e46b531	drbd: complete_conflicting_writes() should not care about connections complete_conflicting_writes() should not cause -EIO. It should not timeout either, or care for connection states. Connection timeout is detected elsewhere, and it's cleanup path is supposed to remove any pending requests or peer_requests from the write_requests tree. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:25 +01:00
Lars Ellenberg	4439c400ab	drbd: simplify retry path of failed READ requests If a local or remote READ request fails, just push it back to the retry workqueue. It will re-enter __drbd_make_request, and be re-assigned to a suitable local or remote path, or failed, if we do not have access to good data anymore. This obsoletes w_read_retry_remote(), and eliminates two goto...retry blocks in __req_mod() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:24 +01:00
Lars Ellenberg	2415308eb9	drbd: move put_ldev from __req_mod() to the endio callback Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:24 +01:00
Lars Ellenberg	6870ca6d46	drbd: factor out master_bio completion and drbd_request destruction paths In preparation for multiple connections and reference counting, separate the code paths for completion of the master bio and destruction of the request object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:23 +01:00
Lars Ellenberg	8d6cdd7848	drbd: conflicting writes: make wake_up of waiting peer_requests explicit Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:23 +01:00
Lars Ellenberg	0afd569a40	drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:23 +01:00
Lars Ellenberg	ea9d6729bd	drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:22 +01:00
Lars Ellenberg	27a434fe40	drbd: make OOS_HANDED_TO_NETWORK its own case Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:22 +01:00
Lars Ellenberg	2312f0b3c5	drbd: fix potential deadlock during "restart" of conflicting writes w_restart_write(), run from worker context, calls __drbd_make_request() and further drbd_al_begin_io(, delegate=true), which then potentially deadlocks. The previous patch moved a BUG_ON to expose such call paths, which would now be triggered. Also, if we call __drbd_make_request() from resource worker context, like w_restart_write() did, and that should block for whatever reason (!drbd_state_is_stable(), resource suspended, ...), we potentially deadlock the whole resource, as the worker is needed for state changes and other things. Create a dedicated retry workqueue for this instead. Also make sure that inc_ap_bio()/dec_ap_bio() are properly paired, even if do_retry() needs to retry itself, in case __drbd_make_request() returns != 0. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:21 +01:00
Lars Ellenberg	f9916d61a4	drbd: don't pretend that barrier_nr == 0 was special Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:21 +01:00
Lars Ellenberg	0642d5f8e0	drbd: remove unused static helper function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:21 +01:00
Lars Ellenberg	e44d71f36c	drbd: remove some very outdated comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:20 +01:00
Lars Ellenberg	9dab3842b5	drbd: fix memleak in error path in bm_rw and drbd_bm_write_range Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:20 +01:00
Lars Ellenberg	a6a7d4f0c1	drbd: missing wakeup after drbd_rs_del_all Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:19 +01:00
Lars Ellenberg	5cdb0bf322	drbd: remove now unused seq_num member from struct drbd_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:19 +01:00
Lars Ellenberg	4b8514ee28	drbd: fix potential data corruption and protocol error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:19 +01:00
Lars Ellenberg	e8744f5aca	drbd: Fixed detach Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:18 +01:00
Lars Ellenberg	d93f63028f	drbd: Fix a potential write ordering issue on SyncTarget nodes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:18 +01:00
Lars Ellenberg	81f448629a	drbd: Fix a potential race that could case data inconsistency When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE at the same time, and it happens that the line remote = remote && drbd_should_do_remote(s); stills sees C_WF_BITMAP_S, and send_oos = rw == WRITE && drbd_should_send_oos(s); already sees C_SYNC_SOURCE both are 0. This causes the write to not be mirrored, but marked as out-of-sync on the Sync_Source node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	38a05c16b8	drbd: Consider that bio->bi_bdev might be modified below DRBD Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	72585d2428	drbd: add missing part_round_stats to _drbd_start_io_acct Without this, iostat frequently sees bogus svctime and >= 100% "utilization". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:17 +01:00
Philipp Reisner	dd9b360475	drbd: Fix module refcount leak in drbd_accept() drbd_accept was modelled after kernel_accept with drbd commit 53eb779 in July 2008. Only, kernel_accept was then broken, and only fixed later with kernel commit `1b08534e` in Dec 2008: net: Fix module refcount leak in kernel_accept() Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp, would soon have their reference count become negative, preventing them from being unloaded (likely), or worse, hit zero without actually being unused, allowing them to be unloaded while still in use (unlikely, but if triggered, causing a kernel crash). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:16 +01:00
Philipp Reisner	93f5afe956	drbd: If disk timeout expires fail only the affected volume ...and not all volumes of the resource Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:16 +01:00
Philipp Reisner	32db80f6f6	drbd: Consider the disk-timeout also for meta-data IO operations If the backing device is already frozen during attach, we failed to recognize that. The current disk-timeout code works on top of the drbd_request objects. During attach we do not allow IO and therefore never generate a drbd_request object but block before that in drbd_make_request(). This patch adds the timeout to all drbd_md_sync_page_io(). Before this patch we used to go from D_ATTACHING directly to D_DISKLESS if IO failed during attach. We can no longer do this since we have to stay in D_FAILED until all IO ops issued to the backing device returned. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Philipp Reisner	25b0d6c8c1	drbd: Reinstate disabling AL updates with invalidate-remote Commit d0ef827e (drbd: switch configuration interface from connector to genetlink) introduced a regression by removing the ability to set all bits in the out of sync bitmap and to suspend updates to the activity log of a disconnected device via the invalidate-remote management call. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Lars Ellenberg	b17f33cb0a	drbd: explicitly clear unused dp_flags in drbd_send_block We send left-over garbage from the previous packet in P_DATA_REPLY and P_RS_DATA_REPLY packets. That's bad behaviour. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:15 +01:00
Philipp Reisner	4d0fc3fdc3	drbd: Fixed compat issue with disconnecting 8.4 from a primary 8.3 For compatibility reasons 8.4 has to send P_STATE_CHG_REQ (instead of P_CONN_ST_CHG_REQ) when disconnecting. In the receiving code path we missed to convert the old answer (P_STATE_CHG_REPLY) back to 8.4 logic. Therefore the CL_ST_CHG_SUCCESS or CL_ST_CHG_FAIL bit in the flags word of mdev got set, while the state code was waiting for the CONN_WD_ST_CHG_OKAY or CONN_WD_ST_CHG_FAIL bits in tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:14 +01:00
Andreas Gruenbacher	1a3cde4406	drbd: drbd_bm_ALe_set_all(): Remove unused function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:14 +01:00
Philipp Reisner	69b6a3b159	drbd: restart loop in drbd_make_request() [prepare for Linux-3.2] With Linux-3.2 generic_make_request() will no longer loop over the request function until it finally returns 0. Move this loop into our drbd_make_request() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	7da358625c	drbd: Restore late assigning of tconn->data.sock and meta.sock With commit from Mon Mar 28 16:33:12 2011 +0200 "drbd: drbd_connect(): Initialize struct drbd_socket before sending anything" tconn->data.sock and tconn->meta.sock get assigned early, in conn_connect. The early assigning can trigger an OOPS, because it may released the socket without acquiring the mutex protecting the socket. An other thread (worker) might use setsockopt() on the socket while it gets free()ed. Restored the (proven) 8.3 behavior of assigning these sockets after the two connections are established. Credits for reporting the issue are going to Arne Redlich. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	a01842ebee	drbd: Log failures of connection state changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:13 +01:00
Philipp Reisner	e8cdc34335	drbd: Consider that read requests could be NEG_ACKEDed ap_in_flight only counts writes. NEG_ACKED is an action on a request that might be called for reads and writes. This bug was there forever, but it becomes much more relevant with the read balincing code. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	6ab9b1b60b	drbd: Do not send state packets while lower than C_CONNECTED cstate I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION. Sending may already work in these cstates, but the peer still expects the HandShake / ConnectionFeatures packet. Actually triggered by the Testuite on kugel. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	b8853dbd8c	drbd: fix race between disconnect and receive_state If the asender thread, or request_timer_fn(), or some other part of the code, decided to drop the connection (because of timeout or other), but the receiver just now was processing a P_STATE packet, there was a chance that receive_state() would do a hard state change "re-establishing" an already failed connection without additional handshake. Log excerpt: Remote failed to finish a request within ko-count * timeout peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) asender terminated ... peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) ... Connection closed peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 ) receiver terminated Impact: while the connection state is erroneously "Connected", requests may be queued and even sent, which would never be acknowledged, and may have been missed by the cleanup. These requests would never be completed. The next drbd_suspend_io() will then lock up, waiting forever for these requests to complete. Fixed in several code paths: Make sure the connection state is NetworkFailure or worse before starting the cleanup in drbd_disconnect(). This should make sure the cleanup won't miss any requests. Disallow receive_state() to "upgrade" the connection state from an error state. This will make sure the "illegal" state transition won't happen. For all connection failure states, relax the safe-guard in sanitize_state() again to silently mask out those state changes (e.g. Timeout -> Connected becomes Timeout -> Timeout). Note by Philipp Reisner: The 3rd chunk described as "relax the safe-guard..." is not there in 8.4 as it is relaxed to the maximum in 8.4 already Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:12 +01:00
Philipp Reisner	57bcb6cf1d	drbd: Do not call generic_make_request() while holding req_lock Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:11 +01:00
Philipp Reisner	d60de03a66	drbd: Load balancing method: striping Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:11 +01:00
Philipp Reisner	380207d08e	drbd: Load balancing of read requests New config option for the disk secition "read-balancing", with the values: prefer-local, prefer-remote, round-robin, when-congested-remote. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Philipp Reisner	d10b4ea32b	drbd: Get rid of "ASSERTION FAILED: tconn->current_epoch->list not empty" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Lars Ellenberg	615e087fbd	drbd: add missing rcu locks around recently introduced idr_for_each Recent commit drbd: Move write_ordering from mdev to tconn introduced a new idr_for_each loop over all volumes, but did not take necessary rcu locks or krefs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:10 +01:00
Andreas Gruenbacher	03d63e1d1e	drbd: Remove leftover prototype Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:09 +01:00
Philipp Reisner	975b297947	drbd: fix potential spinlock deadlock drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks left to be resynced (rs_left) in the current resync extent. If it detects a mismatch, it complains, and forces a disconnect using drbd_force_state(mdev, NS(conn, C_DISCONNECTING)); Unfortunately, this may be called while holding the req_lock, and drbd_force_state() want's to aquire that lock itself. Deadlock. Don't force a disconnect, but fix up rs_left by recounting and reassigning the number of dirty blocks in that extent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:09 +01:00
Philipp Reisner	77fede5137	drbd: Fix the WO=drain implementation for multiple volumes Wait until IO is drained in all volumes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	1e9dd2912e	drbd: Switch drbd_may_finish_epoch() from mdev to tconn Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	12038a3a71	drbd: Move list of epochs from mdev to tconn This is necessary since the transfer_log on the sending is also per tconn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:08 +01:00
Philipp Reisner	1d2783d532	drbd: Prepare epochs per connection An epoch object needs a pointer to the mdev it was received for. This is necessary to be able to send the barrier ack packet for the same volume as the original barrier packet was assigned to. This prepares the next step, in which the (receiver side) epoch list is moved from the device (mdev) to the connection (tconn) object. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:07 +01:00
Philipp Reisner	4b0007c0e8	drbd: Move write_ordering from mdev to tconn This is necessary in order to prepare the move of the (receiver side) epoch list from the device (mdev) to the connection (tconn) objects. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:07 +01:00
Philipp Reisner	6936fcb49a	drbd: Move the CREATE_BARRIER flag from connection to device That is necessary since the whole transfer log is per connection(tconn) and not per device(mdev). This bug caused list corruption on the worker list. When a barrier is queued for sending in the context of one device, another device did not see the CREATE_BARRIER bit, and queued the same object again -> list corruption. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	36baf6117b	drbd: Fixed an obvious copy-n-paste mistake Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00
Philipp Reisner	43de7c852b	drbd: Fixes from the drbd-8.3 branch * drbd-8.3: drbd: O_SYNC gives EIO on ramdisks for some kernels (eg. RHEL6). drbd: send intermediate state change results to the peer Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-11-08 16:58:06 +01:00

... 2 3 4 5 6 ...

3503 Commits