Commit Graph

322348 Commits

Author SHA1 Message Date
Ben Hutchings 8f8b3d5189 sfc: Fix the initial device operstate
Following commit 8f4cccb ('net: Set device operstate at registration
time') it is now correct and preferable to set the carrier off before
registering a device.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:24 +01:00
Ben Hutchings adeb15aa1c sfc: Assign efx and efx->type as early as possible in efx_pci_probe()
We also stop clearing *efx in efx_init_struct().  This is safe because
alloc_etherdev_mq() already clears it for us.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:23 +01:00
Ben Hutchings 3f65ea5b2a sfc: Remove bogus comment about MTU change and RX buffer overrun
RX DMA is limited by the length specified in each descriptor and not
by the MAC.  Over-length frames may get into the RX FIFO regardless of
the MAC settings, due to a hardware bug, but they will be truncated by
the packet DMA engine and reported as such in the completion event.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:23 +01:00
Ben Hutchings 7bde852afc sfc: Remove overly paranoid locking assertions from netdev operations
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:22 +01:00
Ben Hutchings 7153f623ea sfc: Fix reset vs probe/remove/PM races involving efx_nic::state
We try to defer resets while the device is not READY, but we're not
doing this quite correctly.  In particular, changes to efx_nic::state
are documented as serialised by the RTNL lock, but they aren't.

1. We check whether a reset was requested during probe (suggesting
broken hardware) before we allow requested resets to be scheduled.
This leaves a window where a requested reset would be deferred
indefinitely.

2. Although we cancel the reset work item during device removal,
there are still later operations that can cause it to be scheduled
again.  We need to check the state before scheduling it.

3. Since the state can change between scheduling and running of
the work item, we still need to check it there, and we need to
do so *after* acquiring the RTNL lock which serialises state
changes.

4. We must cancel the reset work item during device removal, if the
state could ever have been READY.  This wasn't done in some of the
failure paths from efx_pci_probe().  Move the cancellation to
efx_pci_remove_main().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:22 +01:00
Ben Hutchings b812f8b7a9 sfc: Improve log messages in case we abort probe due to a pending reset
The current informational message doesn't properly explain what
happens, and could also appear if we defer a reset during
suspend/resume.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:21 +01:00
Ben Hutchings 8b7325b4e2 sfc: Never try to stop and start a NIC that is disabled
efx_change_mtu() and efx_realloc_channels() each stop and start much
of the NIC, even if it has been disabled.  Since efx_start_all() is a
no-op when the NIC is disabled, this is probably harmless in the case
of efx_change_mtu(), but efx_realloc_channels() also reenables
interrupts which could be a bad thing to do.

Change efx_start_all() and efx_start_interrupts() to assert that the
NIC is not disabled, but make efx_stop_interrupts() do nothing if the
NIC is disabled (since it is already stopped), consistent with
efx_stop_all().

Update comments for efx_start_all() and efx_stop_all() to describe
their purpose and preconditions more accurately.

Add a common function to check and log if the NIC is disabled, and use
it in efx_net_open(), efx_change_mtu() and efx_realloc_channels().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:20 +01:00
Ben Hutchings 5642ceef46 sfc: Hold RTNL lock (only) when calling efx_stop_interrupts()
Interrupt state should be consistently guarded by the RTNL lock once
the net device is registered.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:20 +01:00
Ben Hutchings 6032fb56c5 sfc: Keep disabled NICs quiescent during suspend/resume
Currently we ignore and clear the disabled state.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:19 +01:00
Ben Hutchings 61da026d86 sfc: Hold the RTNL lock for more of the suspend/resume cycle
I don't think these PM functions can race with userland net device
operations, but it's much easier to reason about locking if state is
consistently guarded by the same lock.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:19 +01:00
Ben Hutchings f16aeea0e6 sfc: Change state names to be clearer, and comment them
STATE_INIT and STATE_FINI are equivalent and represent incompletely
initialised states; combine them as STATE_UNINIT.

Rename STATE_RUNNING to STATE_READY, to avoid confusion with
netif_running() and IFF_RUNNING.

The comments do not quite match current usage, but this will be
corrected in subsequent fixes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:18 +01:00
Ben Hutchings 9714284f83 sfc: Stash header offsets for TSO in struct tso_state
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:17 +01:00
Ben Hutchings 53cb13c680 sfc: Replace tso_state::full_packet_space with ip_base_len
We only use tso_state::full_packet_space to calculate the IPv4 tot_len
or IPv6 payload_len, not to set tso_state::packet_space.  Replace it
with an ip_base_len field holding the value of tot_len or payload_len
before including the TCP payload, which is much more useful when
constructing the new headers.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:17 +01:00
Ben Hutchings f7251a9ce9 sfc: Simplify TSO header buffer allocation
TSO header buffers contain a control structure immediately followed by
the packet headers, and are kept on a free list when not in use.  This
complicates buffer management and tends to result in cache read misses
when we recycle such buffers (particularly if DMA-coherent memory
requires caches to be disabled).

Replace the free list with a simple mapping by descriptor index.  We
know that there is always a payload descriptor between any two
descriptors with TSO header buffers, so we can allocate only one
such buffer for each two descriptors.

While we're at it, use a standard error code for allocation failure,
not -1.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 20:10:11 +01:00
Ben Hutchings 14bf718fb9 sfc: Stop TX queues before they fill up
We now have a definite upper bound on the number of descriptors per
skb; use that to stop the queue when the next packet might not fit.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 19:00:27 +01:00
Ben Hutchings 7668ff9c2a sfc: Refactor struct efx_tx_buffer to use a flags field
Add a flags field to struct efx_tx_buffer, replacing the
continuation and map_single booleans.

Since a single descriptor cannot be both a TSO header and the last
descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add
flags for validity of these fields.

Clear all flags in free buffers (whereas previously the continuation
flag would be set).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2012-08-24 19:00:26 +01:00
Ben Hutchings 8f4cccbbd9 net: Set device operstate at registration time
The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver.  The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.

For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened.  However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't.  But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.

The same issue exists with the dormant state.

The proper initialisation sequence, avoiding a race with opening of
the device, is:

        rtnl_lock();
        rc = register_netdevice(dev);
        if (rc)
                goto out_unlock;
        netif_carrier_off(dev); /* or netif_dormant_on(dev) */
        rtnl_unlock();

but it seems silly that this should have to be repeated in so many
drivers.  Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.

Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.

This initialises the operstate synchronously at registration time
only.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:46:13 -04:00
Timur Tabi 9f35a7342c net/fsl: introduce Freescale 10G MDIO driver
Similar to fsl_pq_mdio.c, this driver is for the 10G MDIO controller on
Freescale Frame Manager Ethernet controllers.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:42:42 -04:00
Neil Horman 3afa6d00fb cls_cgroup: Allow classifier cgroups to have their classid reset to 0
The network classifier cgroup initalizes each cgroups instance classid value to
0.  However, the sock_update_classid function only updates classid's in sockets
if the tasks cgroup classid is not zero, and if it differs from the current
classid.  The later check is to prevent cache line dirtying, but the former is
detrimental, as it prevents resetting a classid for a cgroup to 0.  While this
is not a common action, it has administrative usefulness (if the admin wants to
disable classification of a certain group temporarily for instance).

Easy fix, just remove the zero check.  Tested successfully by myself

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:41:17 -04:00
David S. Miller e6e94e392f Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:

====================
Included changes:
- a set of codestyle rearrangements/fixes
- new feature to early detect new joining (mesh-unaware) clients
- a minor fix for the gw-feature
- substitution of shift operations with the BIT() macro
- reorganization of the main batman-adv structure (struct batadv_priv)
- some more (very) minor cleanups and fixes
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 11:30:50 -04:00
Rami Rosen f63c45e0e6 packet: fix broken build.
This patch fixes a broken build due to a missing header:
...
  CC      net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
                 from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...

The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.

See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,

Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:29:45 -07:00
Eric Dumazet 748e2d9396 net: reinstate rtnl in call_netdevice_notifiers()
Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.

This patch is a direct transcription his feedback
against commit 0115e8e30d (net: remove delay at device dismantle)

Thanks Eric !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:24:42 -07:00
Sven Eckelmann fa4f0afcf4 batman-adv: Start new development cycle
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:23 +02:00
Antonio Quartulli 371351731e batman-adv: change interface_rx to get orig node
In order to understand  where a broadcast packet is coming from and use
this information to detect not yet announced clients, this patch modifies the
interface_rx() function by passing a new argument: the orig node
corresponding to the node that originated the received packet (if known).
This new argument if not NULL for broadcast packets only (other packets does not
have source field).

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:22 +02:00
Antonio Quartulli 30cfd02b60 batman-adv: detect not yet announced clients
With the current TT mechanism a new client joining the network is not
immediately able to communicate with other hosts because its MAC address has not
been announced yet. This situation holds until the first OGM containing its
joining event will be spread over the mesh network.

This behaviour can be acceptable in networks where the originator interval is a
small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs)
the client could suffer from several malfunctions like DHCP client timeouts,
etc.

This patch adds an early detection mechanism that makes nodes in the network
able to recognise "not yet announced clients" by means of the broadcast packets
they emitted on connection (e.g. ARP or DHCP request). The added client will
then be confirmed upon receiving the OGM claiming it or purged if such OGM
is not received within a fixed amount of time.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:22 +02:00
Sven Eckelmann c67893d17a batman-adv: Reduce accumulated length of simple statements
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:21 +02:00
Sven Eckelmann bbb1f90efb batman-adv: Don't break statements after assignment operator
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:20 +02:00
Sven Eckelmann 8de47de575 batman-adv: Use BIT(x) macro to calculate bit positions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:19 +02:00
Martin Hundebøll 74ee3634dc batman-adv: Drop tt queries with foreign dest
When enabling promiscuous mode, tt queries for other hosts might be
received. Before this patch, "foreign" tt queries were processed like
any other query and thus forwarded to its destination again and thereby
causing a loop.

This patch adds a check to drop foreign tt queries.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:19 +02:00
Martin Hundebøll ff51fd70ad batman-adv: Move batadv_check_unicast_packet()
batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so
move the former to before the latter.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:18 +02:00
Sven Eckelmann 807736f6e0 batman-adv: Split batadv_priv in sub-structures for features
The structure batadv_priv grows everytime a new feature is introduced. It gets
hard to find the parts of the struct that belongs to a specific feature. This
becomes even harder by the fact that not every feature uses a prefix in the
member name.

The variables for bridge loop avoidence, gateway handling, translation table
and visualization server are moved into separate structs that are included in
the bat_priv main struct.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:13 +02:00
Simon Wunderlich 624463079e batman-adv: check batadv_orig_hash_add_if() return code
If this call fails, some of the orig_nodes spaces may have been
resized for the increased number of interface, and some may not.
If we would just continue with the larger number of interfaces,
this would lead to access to not allocated memory later.

We better check the return code, and don't add the interface if
no memory is available. OTOH, keeping some of the orig_nodes
with too much memory allocated should hurt no one (except for
a few too many bytes allocated).

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:46 +02:00
Antonio Quartulli a51fb9b2ac batman-adv: fix typos in comments
the word millisecond is misspelled in several comments. This patch fixes it.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:45 +02:00
Antonio Quartulli d657e621a0 batman-adv: add reference counting for type batadv_tt_orig_list_entry
The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so
far. This patch introduces it and makes the structure being usable in much more
complex context.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:44 +02:00
Jonathan Corbet 3a7f291bf6 batman-adv: remove a misleading comment
As much as I'm happy to see LWN links sprinkled through the kernel by the
dozen, this one in particular reflects a very old state of reality; the
associated comment is now incorrect.  So just delete it.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:44 +02:00
Marek Lindner 1c9b0550f4 batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:43 +02:00
Simon Wunderlich 3eb8773e3a batman-adv: rename bridge loop avoidance claim types
for consistency reasons within the code and with the documentation,
we should always call it "claim" and "unclaim".

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:42 +02:00
Simon Wunderlich 99e966fc96 batman-adv: correct comments in bridge loop avoidance
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:41 +02:00
Simon Wunderlich 536a23f119 batman-adv: Add the backbone gateway list to debugfs
This is especially useful if there are no claims yet, but we still want
to know which gateways are using bridge loop avoidance in the network.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:41 +02:00
Antonio Quartulli c70437289c batman-adv: move function arguments on one line
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:40 +02:00
Pavel Emelyanov 0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Allan, Bruce W b32607dd47 mdio: translation of MMD EEE registers to/from ethtool settings
The helper functions which translate IEEE MDIO Manageable Device (MMD)
Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from
the comparable ethtool supported/advertised settings will be needed by
drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch).

In the same fashion as similar translation functions in linux/mii.h, move
these functions from the PHYLIB core to the linux/mdio.h header file so the
code will not have to be duplicated in each driver needing MMD-to-ethtool
(and vice-versa) translations.  The function and some variable names have
been renamed to be more descriptive.

Not tested on the only hardware that currently calls the related functions,
stmmac, because I don't have access to any.  Has been compile tested and
the translations have been tested on a locally modified version of e1000e.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
danborkmann@iogearbox.net 9e67030af3 af_packet: use define instead of constant
Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.

Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Ying Xue bfdc587c5a rds: Don't disable BH on BH context
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:52:04 -07:00
John Eaglesham 6b923cb718 bonding: support for IPv6 transmit hashing
Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Tested-by: John Eaglesham <linux@8192.net>
Signed-off-by: John Eaglesham <linux@8192.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:49:30 -07:00
Eric Dumazet b87fb39e39 ipv6: gre: fix ip6gre_err()
ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
not 40 as expected), and should take into account 'offset' parameter.

Also uses pskb_may_pull() to cope with some fragged skbs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:48:32 -07:00
Eric Dumazet ef8531b64c xfrm: fix RCU bugs
This patch reverts commit 56892261ed (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit 418a99ac6a ( Replace rwlock on xfrm_policy_afinfo
with rcu )

1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH

2) We must defer some writes after the synchronize_rcu() call or a reader
 can crash dereferencing NULL pointer.

3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()

4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()

5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
  and also move xfrm_policy_get_afinfo() declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:39:46 -07:00
Eric Dumazet 0115e8e30d net: remove delay at device dismantle
I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 21:50:36 -07:00
David S. Miller bf277b0cce Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 18:48:52 -07:00
David S. Miller bba6ec7e49 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
This series contains updates to ethtool.h, e1000, e1000e, and igb to
implement MDI/MDIx control.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 14:23:43 -07:00