Commit Graph

285 Commits

Author SHA1 Message Date
Zhen Wei 925037bcba ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
When there is a lot of multithreaded I/O usage, two threads can collide
while sending out a message to the other nodes. This is due to the lack of
locking between threads while sending out the messages.

When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
itself by acquiring a lock at invocation by calling lock_sock().
tcp_sendmsg() then loops over the buffers in the iovec, allocating
associated sk_buff's and cache pages for use in the actual send. As it does
so, it pushes the data out to tcp for actual transmission. However, if one
of those allocation fails (because a large number of large sends is being
processed, for example), it must wait for memory to become available. It
does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
contains the call to release_sock().

The following patch adds a lock to the socket container in order to
properly serialize outbound requests.

From: Zhen Wei <zwei@novell.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:11 -08:00
Srinivas Eeda 1faf289454 ocfs2_dlm: disallow a domain join if node maps mismatch
There is a small window where a joining node may not see the node(s) that
just died but are still part of the domain. To fix this, we must disallow
join requests if the joining node has a different node map.

A new field node_map is added to dlm_query_join_request to send the current
nodes nodemap along with join request. On the receiving end the nodes that
are part of the cluster verifies if this new node sees all the nodes that
are still part of the cluster. They disallow the join if the maps mismatch.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:09:14 -08:00
Sunil Mushran ab81afd30b ocfs2: Binds listener to the configured ip address
This patch binds the o2net listener to the configured ip address
instead of INADDR_ANY for security. Fixes oss.oracle.com bugzilla#814.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:49 -08:00
Kurt Hackel d74c9803a9 ocfs2: Added post handler callable function in o2net message handler
Currently o2net allows one handler function per message type. This
patch adds the ability to call another function to be called after
the handler has returned the message to the other node.

Handlers are now given the option of returning a context (in the form of a
void **) which will be passed back into the post message handler function.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:56 -08:00
Kurt Hackel ba2bf21851 ocfs2_dlm: fix cluster-wide refcounting of lock resources
This was previously broken and migration of some locks had to be temporarily
disabled. We use a new (and backward-incompatible) set of network messages
to account for all references to a lock resources held across the cluster.
once these are all freed, the master node may then free the lock resource
memory once its local references are dropped.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:07 -08:00
Zhen Wei 92efc15241 ocfs2: export heartbeat thread pid via configfs
The patch allows the ocfs2 heartbeat thread to prioritize I/O which may
help cut down on spurious fencing. Most of this will be in the tools -
we can have a pid configfs attribute and let userspace (ocfs2_hb_ctl)
calls the ioprio_set syscall after starting heartbeat, but only cfq
scheduler supports I/O priorities now.

Signed-off-by: Zhen Wei <zwei@novell.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:40:32 -08:00
Robert P. J. Day cd86128088 [PATCH] Fix numerous kcalloc() calls, convert to kzalloc()
All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:52 -08:00
Andrew Beekhof 828ae6afbe [patch 3/3] OCFS2 Configurable timeouts - Protocol changes
Modify the OCFS2 handshake to ensure essential timeouts are configured
identically on all nodes.

Only allow changes when there are no connected peers

Improves the logic in o2net_advance_rx() which broke now that
sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)

Included is the field for userspace-heartbeat timeout to avoid the need for
further protocol changes.

Uses a global spinlock to ensure the decisions to update configfs entries
are made on the correct value.  The region covered by the spinlock when
incrementing the counter is much larger as this is the more critical case.

Small cleanup contributed by Adrian Bunk <bunk@stusta.de>

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-11 14:26:44 -08:00
Jeff Mahoney b5dd80304d [patch 2/3] OCFS2 Configurable timeouts
Allow configuration of OCFS2 timeouts from userspace via configfs

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:13:20 -08:00
Andrew Beekhof 296b75ed6a [patch 1/3] OCFS2 - Expose struct o2nm_cluster
Subsequent patches (namely userspace heartbeat and configurable timeouts)
require access to the o2nm_cluster struct.  This patch does the necessary
shuffling.

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:13:01 -08:00
David Howells c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Akinobu Mita 79cd22d3ac ocfs2: delete redundant memcmp()
This patch deletes redundant memcmp() while looking up in rb tree.

Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:26:06 -07:00
Mark Fasheh 24c19ef404 ocfs2: Remove i_generation from inode lock names
OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
Typically, i_generation is encoded in the lock name so that a deleted inode
on and a new one in the same block don't share the same lvb.

Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
is potentially thrown away as soon as the meta data lock is taken - we
cannot encode the lock name without first knowing i_generation, which
requires a disk read.

This patch encodes i_generation in the inode meta data lvb, and removes the
value from the inode meta data lock name. This way, the read can be covered
by a lock, and at the same time we can distinguish between an up to date and
a stale LVB.

This will help cold-cache stat(2) performance in particular.

Since this patch changes the protocol version, we take the opportunity to do
a minor re-organization of two of the LVB fields.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:46 -07:00
Mark Fasheh 379dfe9d0d ocfs2: Hook rest of the file system into dentry locking API
Actually replace the vote calls with the new dentry operations. Make any
necessary adjustments to get the scheme to work.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24 13:50:43 -07:00
Mathieu Avila 471e3f5728 ocfs2: Fix heartbeat sector calculation
This fixes things for devices which set max_sectors to 8.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20 15:50:53 -07:00
Sunil Mushran 781ee3e2b1 ocfs2: Cleanup message prints
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:56:26 -07:00
Mark Fasheh 0db638f44e ocfs2: warn the user on a dead timeout mismatch
Print a warning to the user when a node with a different dead count joins
the region.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 15:45:35 -07:00
Joel Becker 2b388c6790 ocfs2: Compile-time disabling of ocfs2 debugging output.
Give gcc the chance to compile out the debug logging code in ocfs2.
This saves some size at the expense of being able to debug the code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 14:48:30 -07:00
Ingo Molnar 34af946a22 [PATCH] spin/rwlock init cleanups
locking init cleanups:

 - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
 - convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

 - cleanliness
 - lockdep needs control of lock initialization, which the open-coded
   variants do not give
 - it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:39 -07:00
Mark Fasheh a9e2ae3917 ocfs2: Better I/O error handling in heartbeat
Propagate errors received in o2hb_bio_end_io() back to the heartbeat thread
so it can skip re-arming the timer.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-04-07 18:03:09 -07:00
Mark Fasheh ea8aa68d36 ocfs2: finally remove MLF* macros
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:29 -08:00
Mark Fasheh 70bacbdbfa ocfs2: don't use MLF* in cluster/ files
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24 14:58:26 -08:00
Sunil Mushran b7668c72d2 [PATCH] ocfs2: added source addr to bind() in o2net_start_connect()
to prevent confusion when a virtual ip is created on the same interface

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-01 12:17:04 -08:00
Joel Becker 93cc9ac455 ocfs2: Set .owner on masklog sysfs attributes.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-01 11:43:20 -08:00
Jeff Mahoney 895928b838 [PATCH] ocfs2: complete failure recovery for nodemanager init
This patch finishes cleaning up the node manager allocations if it fails
 to initialize.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-01 11:29:30 -08:00
Mark Fasheh 362342f68e [PATCH] ocfs2: remove non existing function prototypes
Remove some prototypes from tcp.h for functions which have long been gone.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-01 11:19:47 -08:00
Jeff Mahoney 6b7a6c94c9 [PATCH] ocfs2: fix -Wformat warnings when building UML on x86-64
The check to determine which format string is appopriate for u64 and
 friends works in most cases, but UML on x86_64 doesn't define CONFIG_X86_64,
 so it results in screen fulls of compile-time warnings.

 This patch fixes it to handle that case.

 fs/ocfs2/cluster/masklog.h |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-01 11:15:49 -08:00
Mark Fasheh 215c7f9fa1 [PATCH] ocfs2: fix compile warnings
Fix a couple of compile warnings found when compiling on a ppc64 build box.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-03 13:55:26 -08:00
Jes Sorensen 1b1dcc1b57 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-09 15:59:24 -08:00
Andrew Morton a136564702 [PATCH] remove gcc-2 checks
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.

From: Adrian Bunk <bunk@stusta.de>

    Some documentation updates and removes some code paths for gcc < 3.2.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Adrian Bunk 82353b594c [PATCH] This patch contains the following cleanups:
- cluster/sys.c: make needlessly global code static
- dlm/: "extern" declarations for variables belong into header files
        (and in this case, they are already in dlmdomain.h)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03 11:45:55 -08:00
Zach Brown 98211489d4 [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Node messaging via tcp. Used by the dlm and the file system for point
to point communication between nodes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:46 -08:00
Mark Fasheh a7f6a5fb4b [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Disk based heartbeat. Configured and started from userspace, the
kernel component handles I/O submission and event generation via
callback mechanism.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:46 -08:00
Kurt Hackel 0c83ed8eeb [PATCH] OCFS2: The Second Oracle Cluster Filesystem
A simple node information service, filled and updated from
userspace. The rest of the stack queries this service for simple node
information.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:46 -08:00
Zach Brown 52fd3d6fea [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Very simple printk wrapper which adds the ability to enable various
sets of debug messages at run-time.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:45 -08:00