Now when all devices are cleaned up, bond can be cleaned up as well
- remove bond->vlgrp
- remove bond_vlan_rx_register
- substitute necessary occurences of vlan_group_get_device
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for a configuring the minimum number of links that
must be active before asserting carrier. It is similar to the Cisco
EtherChannel min-links feature. This allows setting the minimum number
of member ports that must be up (link-up state) before marking the
bond device as up (carrier on). This is useful for situations where
higher level services such as clustering want to ensure a minimum
number of low bandwidth links are active before switchover.
See:
http://bugzilla.vyatta.com/show_bug.cgi?id=7196
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all received packets are handled by bond_handle_frame,
and arp_mon_pt isn't used any more.
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For backward compatibility, we should retain the module parameters and
sysfs attributes to control the number of peer notifications
(gratuitous ARPs and unsolicited NAs) sent after bonding failover.
Also, it is possible for failover to take place even though the new
active slave does not have link up, and in that case the peer
notification should be deferred until it does.
Change ipv4 and ipv6 so they do not automatically send peer
notifications on bonding failover.
Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
notifications when the link is up, as many times as requested. Since
it does not directly control which protocols send notifications, make
num_grat_arp and num_unsol_na aliases for a single parameter. Bump
the bonding version number and update its documentation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now when bonding uses rx_handler, all traffic going into bond
device goes thru bond_handle_frame. So there's no need to go back into
bonding code later via ptype handlers. This patch converts
original ptype handlers into "bonding receive probes". These functions
are called from bond_handle_frame and they are registered per-mode.
Note that vlan packets are also handled because they are always untagged
thanks to vlan_untag()
Note that this also allows arpmon for eth-bond-bridge-vlan topology.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is undesirable for the bonding driver to be poking into higher
level protocols, and notifiers provide a way to avoid that. This does
mean removing the ability to configure reptitition of gratuitous ARPs
and unsolicited NAs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This updates the bonding driver to support v2.6.27-rc3 enhancements
(b11f8d8c aka. "ethtool: Expand ethtool_cmd.speed to 32 bits") which
allow to encode the Mbps link speed on 32-bits (Max 4 Pbps) instead of
16 (Max 65536 Mbps).
This patch also attempts to compact struct slave by reordering its
fields.
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets rid of minor sparse complaints:
drivers/net/bonding/bond_main.c:4361:4: warning: do-while statement is not a compound statement
drivers/net/bonding/bond_main.c:243:12: warning: symbol 'bond_mode_name' was not declared. Should it be static?
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This prevents possible race between bond_enslave and bond_handle_frame
as reported by Nicolas by moving rx_handler register/unregister.
slave->bond is added to hold pointer to master bonding sructure. That
way dev->master is no longer used in bond_handler_frame.
Also, this removes "BUG: scheduling while atomic" message
Reported-by: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Tested-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since bond-related code was moved from net/core/dev.c into bonding,
IFF_SLAVE_INACTIVE is no longer needed. Replace is with flag "inactive"
stored in slave structure
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
transfers slave->state into slave->backup (that it's going to transfer
into bitfield. Introduce wrapper inlines to do the work with it.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when bond-related code is moved from net/core/dev.c into bonding
code, multiple priv_flags are not needed anymore. So let them rot.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register slave pointer as rx_handler data. That would eventually prevent
need to loop over slave devices to find the right slave.
Use synchronize_net to ensure that bond_handle_frame does not get slave
structure freed when working with that.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
V2: Move #ifdef CONFIG_PROC_FS into bonding.h, as suggested by David.
bond_main.c is bloating, separate the procfs code out,
move them to bond_procfs.c
Signed-off-by: WANG Cong <amwang@redhat.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix use of zero where NULL expected. And wrap long line.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V4: rebase to net-next-2.6
This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
we have netpoll_tx_running() now.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
V4: rebase to net-next-2.6
V3: remove an useless #ifdef.
This patch unifies the netpoll code in bonding with netpoll code in bridge,
thanks to Herbert that code is much cleaner now.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The returned slave is incorrect, if the net device under check is not
charged yet by the master.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides the debugfs facility to the bonding driver.
The "bonding" directory is created in the debugfs root and directories of
each bonding interface (like bond0, bond1...) are created in that.
# mount -t debugfs none /sys/kernel/debug
# ls /sys/kernel/debug/bonding
bond0 bond1
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A while back I made some changes to enable netpoll in the bonding driver. Among
them was a per-cpu flag that indicated we were in a path that held locks which
could cause the netpoll path to block in during tx, and as such the tx path
should queue the frame for later use. This appears to have given rise to a
regression. If one of those paths on which we hold the per-cpu flag yields the
cpu, its possible for us to come back on a different cpu, leading to us clearing
a different flag than we set. This results in odd netpoll drops, and BUG
backtraces appearing in the log, as we check to make sure that we only clear set
bits, and only set clear bits. I had though briefly about changing the
offending paths so that they wouldn't sleep, but looking at my origional work
more closely, it doesn't appear that a per-cpu flag is warranted. We alrady
gate the checking of this flag on IFF_IN_NETPOLL, so we don't hit this in the
normal tx case anyway. And practically speaking, the normal use case for
netpoll is to only have one client anyway, so we're not going to erroneously
queue netpoll frames when its actually safe to do so. As such, lets just
convert that per-cpu flag to an atomic counter. It fixes the rescheduling bugs,
is equivalent from a performance perspective and actually eliminates some code
in the process.
Tested by the reporter and myself, successfully
Reported-by: Liang Zheng <lzheng@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only used in main file.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The monitoring paths in the bonding driver take write locks that are shared by
the tx path. If netconsole is in use, these paths can call printk which puts us
in the netpoll tx path, which, if netconsole is attached to the bonding driver,
result in deadlock (the xmit_lock guards are useless in netpoll_send_skb, as the
monitor paths in the bonding driver don't claim the xmit_lock, nor should they).
The solution is to use a per cpu flag internal to the driver to indicate when a
cpu is holding the lock in a path that might recusrse into the tx path for the
driver via netconsole. By checking this flag on transmit, we can defer the
sending of the netconsole frames until a later time using the retransmit feature
of netpoll_send_skb that is triggered on the return code NETDEV_TX_BUSY. I've
tested this and am able to transmit via netconsole while causing failover
conditions on the bond slave links.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow sysadmins to configure the number of multicast
membership report sent on a link failure event.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a failover, the IGMP membership is sent to update
the switch restoring the traffic, but it misses groups added
to VLAN devices running on top of bonding devices.
This patch changes it to iterate over all VLAN devices
on top of it sending IGMP memberships too.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: changed bonding module version, modified to apply on top of changes
from previous patch in series, and updated documentation to elaborate on
multiqueue awareness that now exists in bonding driver.
This patch give the user the ability to control the output slave for
round-robin and active-backup bonding. Similar functionality was
discussed in the past, but Jay Vosburgh indicated he would rather see a
feature like this added to existing modes rather than creating a
completely new mode. Jay's thoughts as well as Neil's input surrounding
some of the issues with the first implementation pushed us toward a
design that relied on the queue_mapping rather than skb marks.
Round-robin and active-backup modes were chosen as the first users of
this slave selection as they seemed like the most logical choices when
considering a multi-switch environment.
Round-robin mode works without any modification, but active-backup does
require inclusion of the first patch in this series and setting
the 'all_slaves_active' flag. This will allow reception of unicast traffic on
any of the backup interfaces.
This was tested with IPv4-based filters as well as VLAN-based filters
with good results.
More information as well as a configuration example is available in the
patch to Documentation/networking/bonding.txt.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: changed parameter name from 'keep_all' to 'all_slaves_active' and
skipped setting slaves to inactive rather than creating a new flag at
Jay's suggestion.
In an effort to suppress duplicate frames on certain bonding modes
(specifically the modes that do not require additional configuration on
the switch or switches connected to the host), code was added in the
generic receive patch in 2.6.16. The current behavior works quite well
for most users, but there are some times it would be nice to restore old
functionality and allow all frames to make their way up the stack.
This patch adds support for a new module option and sysfs file called
'all_slaves_active' that will restore pre-2.6.16 functionality if the
user desires. The default value is '0' and retains existing behavior,
but the user can set it to '1' and allow all frames up if desired.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is stored but never restored. So remove this as it is useless.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts the list and the core manipulating with it to be the same as uc_list.
+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to maintain stats in the bonding structure.
Use the instance of net_device_stats in netdevice.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.
Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)
Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch delegates the work of creating the sysfs groups
to the netdev layer and ultimately to the device layer. This
closes races between uevents.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the bonding device is no longer used in determining the device to
which to send packets, it can be dropped from the argument list of the various
xmit_hash_policy calls.
Signed-off-by: Jasper Spaans <spaans@fox-it.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases there is not desirable to switch back to primary interface when
it's link recovers and rather stay with currently active one. We need to avoid
packetloss as much as we can in some cases. This is solved by introducing
primary_reselect option. Note that enslaved primary slave is set as current
active no matter what.
Patch modified by Jay Vosburgh as follows: fixed bug in action
after change of option setting via sysfs, revised the documentation
update, and bumped the bonding version number.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can speedup ether addresses compares using compare_ether_addr_64bits()
instead of memcmp(). We make sure all operands are at least 8 bytes long and
16bits aligned (or better, long word aligned if possible)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use a network device destructor that is a function in
the module, since it can be called after module is unloaded if sysfs
handle is open.
When eventually using netlink, the device cleanup code needs to be done
via uninit function.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole read/write semaphore locking can be removed. It doesn't add any
protection that isn't already done by using the RTNL mutex properly.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_create() is always called with same parameters so move the argument
down.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this sparse warnings:
drivers/net/bonding/bond_main.c:104:20: warning: symbol 'bonding_defaults' was not declared. Should it be static?
drivers/net/bonding/bond_main.c:204:22: warning: symbol 'ad_select_tbl' was not declared. Should it be static?
drivers/net/bonding/bond_sysfs.c:60:21: warning: symbol 'bonding_rwsem' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_parse_parm() parses a parameter table for a particular value and
is therefore not modifying the table at all. Therefore make the 2nd
argument const, thus allowing to make the tables const later.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pr_debug() instead of own macros.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce and use bond_is_lb(), it is usefull to shorten the repetitive
check for either ALB or TLB mode.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.
This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements alternative aggregator selection policies
for 802.3ad. The existing policy, now termed "stable," selects the active
aggregator by greatest bandwidth, and only reselects a new aggregator
if the active aggregator is entirely disabled (no more ports or all ports
down).
This patch adds two new policies: bandwidth and count, selecting
the active aggregator by total bandwidth (like the stable policy) or by
the number of ports in the aggregator, respectively. These two policies
also differ from the stable policy in that they will reselect the active
aggregator when availability-related changes occur in the bond (e.g.,
link state change).
This permits "gang failover" within 802.3ad, allowing redundant
aggregators along parallel paths to always maintain the "best" aggregator
as the active aggregator (rather than having to wait for the active to
entirely fail).
This patch also updates the driver version to 3.5.0.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch adds better IPv6 failover support for bonding devices,
especially when in active-backup mode and there are only IPv6 addresses
configured, as reported by Alex Sidorenko.
- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
IPv6-specific routines. Both regular bonds and VLANs over bonds
are supported.
- Adds a new tunable, num_unsol_na, to limit the number of unsolicited
IPv6 Neighbor Advertisements that are sent on a failover event.
Default is 1.
- Creates two new IPv6 neighbor discovery functions:
ndisc_build_skb()
ndisc_send_skb()
These were required to support VLANs since we have to be able to
add the VLAN id to the skb since ndisc_send_na() and friends
shouldn't be asked to do this. These two routines are basically
__ndisc_send() split into two pieces, in a slightly different order.
- Updates Documentation/networking/bonding.txt and bumps the rev of bond
support to 3.4.0.
On failover, this new code will generate one packet:
- An unsolicited IPv6 Neighbor Advertisement, which helps the switch
learn that the address has moved to the new slave.
Testing has shown that sending just the NA results in pretty good
behavior when in active-back mode, I saw no lost ping packets for example.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The following sparse warnings are being generated
because bonding.h is missing definitons for items
declared in bond_main.c but also used in bond_sysfs.h
Also export bond_dev_list as this is also declared
in bond_main but used elsewhere in drivers/net/bonding.
bond_main.c:105:20: warning: symbol 'bonding_defaults' was not declared. Should it be static?
bond_main.c:148:1: warning: symbol 'bond_dev_list' was not declared. Should it be static?
bond_main.c:162:22: warning: symbol 'bond_lacp_tbl' was not declared. Should it be static?
bond_main.c:168:22: warning: symbol 'bond_mode_tbl' was not declared. Should it be static?
bond_main.c:179:22: warning: symbol 'xmit_hashtype_tbl' was not declared. Should it be static?
bond_main.c:186:22: warning: symbol 'arp_validate_tbl' was not declared. Should it be static?
bond_main.c:194:22: warning: symbol 'fail_over_mac_tbl' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Support for sending multiple gratuitous ARPs during failovers
was added by commit:
commit 7893b2491a
Author: Moni Shoua <monis@voltaire.com>
Date: Sat May 17 21:10:12 2008 -0700
bonding: Send more than one gratuitous ARP when slave takes over
This change modifies that support to remove duplicated code,
add support for ARP monitor (the original only supported miimon), clear
the grat ARP counter in bond_close (lest a later "ifconfig up" immediately
start spewing ARPs), and add documentation for the module parameter.
Also updated driver version to 3.3.0.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add a "follow" selection for fail_over_mac. This option
causes the MAC address to move from slave to slave as the active
slave changes. This is in addition to the existing fail_over_mac option
that causes the bond's MAC address to change during failover.
This new option is useful for devices that cannot tolerate
multiple ports using the same MAC address simultaneously, either
because it confuses them or incurs a performance penalty (as is the
case with some LPAR-aware multiport devices). Because the MAC of the
bond itself does not change, the "follow" option is slightly more
reliable during failover and doesn't change the MAC of the bond during
operation.
This patch requires a previous ARP monitor change to properly
handle RTNL during failovers.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Refactor ARP monitor for active-backup mode. The motivation for
this is to take care of locking issues in a clear manner (particularly to
correctly handle RTNL vs. the bonding locks). Currently, the a-b ARP
monitor does not hold RTNL at all, but future changes will require RTNL
during ARP monitor failovers.
Rather than using conditional locking, this patch instead breaks
up the ARP monitor into three discrete steps: inspection, commit changes,
and probe. The inspection phase marks slaves that require link state
changes. The commit phase is only called if inspection detects that
changes are needed, and is called with RTNL. Lastly, the probe phase
issues the ARP probes that the inspection phase uses to determine link
state.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
With IPoIB, reception of gratuitous ARP by neighboring hosts
is essential for a successful change of slaves in case of failure.
Otherwise, they won't learn about the HW address change and need
to wait a long time until the neighboring system gives up and sends
an ARP request to learn the new HW address. This patch decreases
the chance for a lost of a gratuitous ARP packet by sending it more
than once. The number retries is configurable and can be set with a
module param.
Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
While we're fixing the bond_create, I hope it's OK to polish it
a bit after the fixes.
The third argument is NULL at the first caller and is ignored by
the second one, so remove it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Update bonding to version 3.2.4.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent change to add an additional hash policy modified
bond_parse_parm, but it now does not correctly match parameters passed in
via sysfs.
Rewrote bond_parse_parm to handle (a) parameter matches that
are substrings of one another and (b) user input with whitespace (e.g.,
sysfs input often has a trailing newline).
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fixes a race condition in module unload. Without this change,
workqueue events may fire while bonding data structures are partially
freed but before bond_close() is invoked by unregister_netdevice().
Update version to 3.2.3.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add new hash for balance-xor and 802.3ad modes. Originally
submitted by "Glenn Griffin" <ggriffin.kernel@gmail.com>; modified by
Jay Vosburgh to move setting of hash policy out of line, tweak the
documentation update and add version update to 3.2.2.
Glenn's original comment follows:
Included is a patch for a new xmit_hash_policy for the bonding driver
that selects slaves based on MAC and IP information. This is a middle
ground between what currently exists in the layer2 only policy and the
layer3+4 policy. This policy strives to be fully 802.3ad compliant by
transmitting every packet of any particular flow over the same link.
As documented the layer3+4 policy is not fully compliant for extreme
cases such as ip fragmentation, so this policy is a nice compromise
for environments that require full compliance but desire more than the
layer2 only policy.
Signed-off-by: "Glenn Griffin" <ggriffin.kernel@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Change locking in balance-rr transmit processing to use a free
running counter to determine which slave to transmit on. Instead, a
free-running counter is maintained, and modulo arithmetic used to select
a slave for transmit.
This removes lock operations from the TX path, and eliminates
a deadlock introduced by the conversion to work queues.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert bonding timers to workqueues. This converts the various
monitor functions to run in periodic work queues instead of timers. This
patch introduces the framework and convers the calls, but does not resolve
various locking issues, and does not stand alone.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Two small fixes to IPoIB support for bonding:
1- copy header_ops from slave to bonding for IPoIB slaves
2- move release and destroy logic to UNREGISTER from GOING_DOWN
notifier to avoid double release
Set bonding to version 3.2.1.
Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Update the "don't change MAC of slaves" functionality added in
previous changes to be a generic option, rather than something tied to
IB devices, as it's occasionally useful for regular ethernet devices as
well.
Adds "fail_over_mac" option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.
Includes documentation update.
Updates bonding driver version to 3.2.0.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When bonding enslaves non Ethernet devices it takes pointers to functions
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved
because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero
ensures that these functions be used anymore.
Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev->state field is on. This improves the chances for the arp packet to
be transmitted.
Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).
Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).
Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Chad Tindel <ctindel@users.sourceforge.net>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Modify carrier state determination for 802.3ad mode to comply
with section 43.3.9 of IEEE 802.3, which requires that "Links that are
not successful candidates for aggregation (e.g., links that are attached
to other devices that cannot perform aggregation or links that have been
manually configured to be non-aggregatable) are enabled to operate as
individual IEEE 802.3 links."
Bug reported by Laurent Chavey <chavey@google.com>. This patch
is an updated version of his patch that changes the wording of
commentary and adds an update to the driver version.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
"extern inline" generates a warning with -Wmissing-prototypes and I'm
currently working on getting the kernel cleaned up for adding this to
the CFLAGS since it will help us to avoid a nasty class of runtime
errors.
If there are places that really need a forced inline, __always_inline
would be the correct solution.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
While working with the latest bonding code I noticed a nasty problem that
will prevent arp monitoring from always functioning correctly on x86_64
systems. Comparing ints to longs and expecting reliable results on x86_64
is a bad idea. With this patch, arp monitoring works correctly again.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@osdl.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
I neglected to properly update the version number in the recent
patch series; this sets it to something reasonable.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add logic to check ARP request / reply packets used for ARP
monitor link integrity checking.
The current method simply examines the slave device to see if it
has sent and received traffic; this can be fooled by extraneous traffic.
For example, if multiple hosts running bonding are behind a common
switch, the probe traffic from the multiple instances of bonding will
update the tx/rx times on each other's slave devices.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The value of "downdelay/miimon" and "updelay/miimon" are stored in
slave->delay. The type of downdelay, updelay, and miimon are all int.
However, slave->delay is type short, and it is not possible to store the
value of "downdelay/miimon" or "updelay/miimon" in some cases. (For example,
miimon=1 downdelay=32768)
The attached patch fixes this problem.
Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add support for the bonding master to specify its carrier state
based upon the state of the slaves. For 802.3ad, the bond is up if
there is an active, parterned aggregator. For other modes, the bond is
up if any slaves are up. Updates driver version to 3.0.3.
Based on a patch by jamal <hadi@cyberus.ca>.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Originally submitted by Kenzo Iwami; his original description is:
The current bonding driver receives duplicate packets when broadcast/
multicast packets are sent by other devices or packets are flooded by the
switch. In this patch, new flags are added in priv_flags of net_device
structure to let the bonding driver discard duplicate packets in
dev.c:skb_bond().
Modified by Jay Vosburgh to change a define name, update some
comments, rearrange the new skb_bond() for clarity, clear all bonding
priv_flags on slave release, and update the driver version.
Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
I believe I see the race Michael refers to (tlb_choose_channel
may set head, which tlb_init_slave clears), although I was not able to
reproduce it. I have updated his patch for the current netdev-2.6.git
tree and added a version update. His original comment follows:
Our systems have been crashing during testing of PCI HotPlug
support in the various networking components. We've faulted in
the bonding driver due to a bug in bond_alb.c:tlb_clear_slave()
In that routine, the last modification to the TLB hash table is
made without protection of the lock, allowing a race that can lead
tlb_choose_channel() to select an invalid table element.
-J
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
"extern inline" doesn't make much sense.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Bonding source files still have changelogs in the comments. This, then,
is an update to that changelog.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Minor spelling and whitespace corrections.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Update the version number for the bonding module. Since we've just
added a significant new feature (sysfs support), bump the major number.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This large patch adds sysfs functionality to the channel bonding module.
Bonds can be added, removed, and reconfigured at runtime without having
to reload the module. Multiple bonds with different configurations are
easily configured, and ifenslave is no longer required to configure bonds.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sysfs interface can create bonds at runtime, so we need a separate
function to do this, instead of just doing it in the module init code.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sysfs code needs access these functions, so make them
not static, and move the protos to the header file.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sysfs code needs to know what these structs look like, so make them
not static, and move the definition to the header.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This should resolve http://bugzilla.kernel.org/show_bug.cgi?id=5519
The current feature computation loses bits that it doesn't know about,
resulting in an inability to add VLANs and possibly other havoc.
Rewrote function to preserve bits it doesn't know about, remove an
unneeded state variable, and simplify the code.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
David S. Miller <davem@davemloft.net> wrote:
>I think removing support for older ifenslave binaries is
>the least painful solution to this problem.
This patch removes backwards compatibility for old ifenslave
binaries (ifenslave prior to verison 1.0.0).
I did not similarly modify ifenslave itself; with sysfs on the
horizon, I don't see that as being worthwhile.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Add support for alternate slave selection algorithms to bonding
balance-xor and 802.3ad modes. Default mode (what we have now: xor of
MAC addresses) is "layer2", new choice is "layer3+4", using IP and port
information for hashing to select peer.
Originally submitted by Jason Gabler for balance-xor mode;
modified by Jay Vosburgh to additionally support 802.3ad mode. Jason's
original comment is as follows:
The attached patch to the Linux Etherchannel Bonding driver modifies the
driver's "balance-xor" mode as follows:
- alternate hashing policy support for mode 2
* Added kernel parameter "xmit_policy" to allow the specification
of different hashing policies for mode 2. The original mode 2
policy is the default, now found in xmit_hash_policy_layer2().
* Added xmit_hash_policy_layer34()
This patch was inspired by hashing policies implemented by Cisco,
Foundry and IBM, which are explained in
Foundry documentation found at:
http://www.foundrynet.com/services/documentation/sribcg/Trunking.html#112750
Signed-off-by: Jason Gabler <jygabler@lbl.gov>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Add support for generating gratuitous ARPs in bonding
active-backup mode when failovers occur. Includes support for VLAN
tagging the ARPs as needed.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!