Commit Graph

119915 Commits

Author SHA1 Message Date
Eric Dumazet a21bba9454 net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames()
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 16:07:50 -08:00
Eric Dumazet 2e77d89b2f net: avoid a pair of dst_hold()/dst_release() in ip_append_data()
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 15:52:46 -08:00
Jarek Poplawski 4db0acf3c0 net: gen_estimator: Fix gen_kill_estimator() lookups
gen_kill_estimator() linear lists lookups are very slow, and e.g. while
deleting a large number of HTB classes soft lockups were reported. Here
is another try to fix this problem: this time internally, with rbtree,
so similarly to Jamal's hashing idea IIRC. (Looking for next hits could
be still optimized, but it's really fast as it is.)

Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru>
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 15:48:05 -08:00
Patrick McHardy 3f0947c3ff pkt_sched: sch_drr: fix drr_dequeue loop()
Jarek Poplawski points out:

If all child qdiscs of sch_drr are non-work-conserving (e.g. sch_tbf)
drr_dequeue() will busy-loop waiting for skbs instead of leaving the
job for a watchdog. Checking for list_empty() in each loop isn't
necessary either, because this can never be true except the first time.

Using non-work-conserving qdiscs as children of DRR makes no sense,
simply bail out in that case.

Reported-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 15:46:08 -08:00
Wang Chen 4b40eed73e infiniband: Kill directly reference of netdev->priv
This use of netdev->priv is wrong.
The right way is:
alloc_netdev() with no memory for private data.
make netdev->ml_priv to point to c2_dev.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 15:34:00 -08:00
Wang Chen 486bf8de17 netdevice sbni: Convert directly reference of netdev->priv
1. convert netdev->priv to netdev_priv().
2. make sbni_pci_probe() be static.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 14:52:16 -08:00
Jirka Pirko 5c94afd79c tokenring/3c359.c: Prevent possible mem leak when open failed
Freeing previously allocated buffers in case of error.

Signed-off-by: Jirka Pirko <jirka@pirko.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 14:49:11 -08:00
Jirka Pirko 138a5cdf2f tokenring/3c359.c: Fix error message when allocating tx_ring
Pointed out by Joe Perches. Error message after tx_ring allocation check was
wrong.

Signed-off-by: Jirka Pirko <jirka@jirka.pirko.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 14:48:25 -08:00
Jirka Pirko d0cc10ab0e tokenring/3c359.c: fix allocation null check
Fixed typo when allocating rx_ring, tx_ring was checked for null instead.

Signed-off-by: Jirka Pirko <jirka@pirko.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 14:47:53 -08:00
Stephen Hemminger 85920d43bd 8139too: use err.h macros
Instead of using call by reference use the PTR_ERR macros to handle
return value with error case. Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 14:47:01 -08:00
Eric Dumazet 3755810ceb net: Make sure BHs are disabled in sock_prot_inuse_add()
There is still a call to sock_prot_inuse_add() in af_netlink
while in a preemptable section. Add explicit BH disable around
this call.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 14:05:22 -08:00
Eric Dumazet 920de804bc net: Make sure BHs are disabled in sock_prot_inuse_add()
The rule of calling sock_prot_inuse_add() is that BHs must
be disabled.  Some new calls were added where this was not
true and this tiggers warnings as reported by Ilpo.

Fix this by adding explicit BH disabling around those call sites,
or moving sock_prot_inuse_add() call inside an existing BH disabled
section.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-24 00:09:29 -08:00
Eric Dumazet 1f87e235e6 eth: Declare an optimized compare_ether_addr_64bits() function
Linus mentioned we could try to perform long word operations, even
on potentially unaligned addresses, on x86 at least. David mentioned
the HAVE_EFFICIENT_UNALIGNED_ACCESS test to handle this on all
arches that have efficient unailgned accesses.

I tried this idea and got nice assembly on 32 bits:

158:   33 82 38 01 00 00       xor    0x138(%edx),%eax
15e:   33 8a 34 01 00 00       xor    0x134(%edx),%ecx
164:   c1 e0 10                shl    $0x10,%eax
167:   09 c1                   or     %eax,%ecx
169:   74 0b                   je     176 <eth_type_trans+0x87>

And very nice assembly on 64 bits of course (one xor, one shl)

Nice oprofile improvement in eth_type_trans(), 0.17 % instead of 0.41 %,
expected since we remove 8 instructions on a fast path.

This patch implements a compare_ether_addr_64bits() function, that
uses the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ifdef to efficiently
perform the 6 bytes comparison on all capable arches.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 23:24:32 -08:00
David S. Miller 70eb1bfd52 axnet_cs: Fix build after net device ops ne2k conversion.
Commit 4e4fd4e485 ("ne2k: convert to
net_device_ops") exported some ei_* symbols from the 8390 library,
but the axnet_cs driver defines local static versions of the same
functions.

Rename them to avoid the namespace conflict.

Reported by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 20:01:59 -08:00
David S. Miller 6f756a8c36 net: Make sure BHs are disabled in sock_prot_inuse_add()
The rule of calling sock_prot_inuse_add() is that BHs must
be disabled.  Some new calls were added where this was not
true and this tiggers warnings as reported by Ilpo.

Fix this by adding explicit BH disabling around those call sites.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 17:34:03 -08:00
Alexey Dobriyan be77e59307 net: fix tunnels in netns after ndo_ changes
dev_net_set() should be the very first thing after alloc_netdev().

"ndo_" changes turned simple assignment (which is OK to do before netns
assignment) into quite non-trivial operation (which is not OK, init_net was
used). This leads to incomplete initialisation of tunnel device in netns.

BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f
*pde = 00000000 
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/lo/operstate

Pid: 10, comm: netns Not tainted (2.6.28-rc6 #1) 
EIP: 0060:[<c02efdb5>] EFLAGS: 00010246 CPU: 0
EIP is at ip6_tnl_exit_net+0x37/0x4f
EAX: 00000000 EBX: 00000020 ECX: 00000000 EDX: 00000003
ESI: c5caef30 EDI: c782bbe8 EBP: c7909f50 ESP: c7909f48
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process netns (pid: 10, ti=c7908000 task=c7905780 task.ti=c7908000)
Stack:
 c03e75e0 c7390bc8 c7909f60 c0245448 c7390bd8 c7390bf0 c7909fa8 c012577a
 00000000 00000002 00000000 c0125736 c782bbe8 c7909f90 c0308fe3 c782bc04
 c7390bd4 c0245406 c084b718 c04f0770 c03ad785 c782bbe8 c782bc04 c782bc0c
Call Trace:
 [<c0245448>] ? cleanup_net+0x42/0x82
 [<c012577a>] ? run_workqueue+0xd6/0x1ae
 [<c0125736>] ? run_workqueue+0x92/0x1ae
 [<c0308fe3>] ? schedule+0x275/0x285
 [<c0245406>] ? cleanup_net+0x0/0x82
 [<c0125ae1>] ? worker_thread+0x81/0x8d
 [<c0128344>] ? autoremove_wake_function+0x0/0x33
 [<c0125a60>] ? worker_thread+0x0/0x8d
 [<c012815c>] ? kthread+0x39/0x5e
 [<c0128123>] ? kthread+0x0/0x5e
 [<c0103b9f>] ? kernel_thread_helper+0x7/0x10
Code: db e8 05 ff ff ff 89 c6 e8 dc 04 f6 ff eb 08 8b 40 04 e8 38 89 f5 ff 8b 44 9e 04 85 c0 75 f0 43 83 fb 20 75 f2 8b 86 84 00 00 00 <8b> 40 04 e8 1c 89 f5 ff e8 98 04 f6 ff 89 f0 e8 f8 63 e6 ff 5b 
EIP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f SS:ESP 0068:c7909f48
---[ end trace 6c2f2328fccd3e0c ]---

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 17:26:26 -08:00
Eric Dumazet c25eb3bfb9 net: Convert TCP/DCCP listening hash tables to use RCU
This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.

The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.

We define a large value :

#define LISTENING_NULLS_BASE (1U << 29)

So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 17:22:55 -08:00
Gerrit Renker 8c862c23e2 dccp: Header option insertion routine for feature-negotiation
The patch extends existing code:
 * Confirm options divide into the confirmed value plus an optional preference
   list for SP values. Previously only the preference list was echoed for SP
   values, now the confirmed value is added as per RFC 4340, 6.1;
 * length and sanity checks are added to avoid illegal memory (or NULL) access.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 16:10:23 -08:00
Gerrit Renker d371056695 dccp: Support for Mandatory options
Support for Mandatory options is provided by this patch, which will
be used by subsequent feature-negotiation patches.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 16:09:11 -08:00
Gerrit Renker 02fa460ef5 dccp: Increase the scope of variable-length htonl/ntohl functions
This extends the scope of two available functions,
encode|decode_value_var, to work up to 6 (8) bytes, to match maximum
requirements in the RFC.

These functions are going to be used both by general option processing
and feature negotiation code, hence declarations have been put into
feat.h.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 16:07:53 -08:00
Gerrit Renker 71c262a3dd dccp: API to query the current TX/RX CCID
This provides function to query the current TX/RX CCID dynamically,
without reliance on the minisock value, using dynamic information
available in the currently loaded CCID module.

This query function is then used to
 (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
 (b) replace the current test for "which CCID is in use" in probe.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 16:04:59 -08:00
Gerrit Renker b20a9c24d5 dccp: Set per-connection CCIDs via socket options
With this patch, TX/RX CCIDs can now be changed on a per-connection
basis, which overrides the defaults set by the global sysctl variables
for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch
set are needed, which track dependencies and activate negotiated
feature values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 16:02:31 -08:00
Brice Goglin 2c62ad7b56 myri10ge: update firmware headers
Update myri10ge firmware headers.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 15:49:54 -08:00
Brice Goglin 4ee2ac5135 myri10ge: update DCA comments
Update DCA sections closing comments.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 15:49:28 -08:00
Eric Dumazet c1fd3b9455 net: af_netlink should update its inuse counter
In order to have relevant information for NETLINK protocol, in
/proc/net/protocols, we should use sock_prot_inuse_add() to
update a (percpu and pernamespace) counter of inuse sockets.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 15:48:22 -08:00
Eric Dumazet 04f258ce7f net: some optimizations in af_inet
1) Use eq_net() in inet_netns_ok() to speedup socket creation if
   !CONFIG_NET_NS

2) Reorder the tests about inet_ehash_secret generation (once only)
   Use the unlikely() macro when testing if inet_ehash_secret already
   generated.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-23 15:42:23 -08:00
David S. Miller c46920dadb Merge branch 'for-david' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 2008-11-21 21:30:58 -08:00
Alexander Duyck f5f4cf0846 igb: do not use phy ops in ethtool test cleanup for non-copper parts
Currently the igb driver is experiencing a panic due to a null function
pointer being used during the cleanup of the ethtool looback test on
fiber/serdes parts.  This patch prevents that and adds a check prior to
calling any phy function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 21:30:24 -08:00
Scott Feldman 21fc578dca enic: misc cleanup items:
Clarrify reading PBA has no side-effect (clearing).
Add missing GPL license text.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 21:29:25 -08:00
Scott Feldman 845964515a enic: move wmb closer to where needed: before writing posted_index to hw
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 21:29:01 -08:00
Scott Feldman cb3c766975 enic: mask off some reserved bits in CQ descriptor for future use
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 21:28:40 -08:00
Scott Feldman 27372bf5fa enic: driver/firmware API updates
Add driver/firmware compatibility check.
Update firmware notify cmd to honor notify area size.
Add new version of init cmd.
Add link_down_cnt to notify area to track link down count.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 21:28:18 -08:00
Scott Feldman 86ca9db794 enic: enable ethtool LRO support
Enable ethtool support for get/set_flags so LRO can be turned on/off
by fwding drivers such as the bridge driver.  LRO is not compatible
with fwding drivers.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 21:26:55 -08:00
Krzysztof Hałasa 6476a907b5 WAN pc300too.c: Fix PC300-X.21 detection
pc300too driver works around a bug in PCI9050 bridge.  Unfortunately
it was doing that too late.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:55:21 +01:00
Krzysztof Hałasa 72364706c3 WAN: syncppp.c is no longer used by any kernel code. Remove it.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa e022c2f07a WAN: new synchronous PPP implementation for generic HDLC.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa e1f024eb5d WAN: Simplify sca_init_port() in HD64572 driver.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa fcfe9ff3e2 WAN: Correct comments in hd6457[02].c
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa 0b59cef885 WAN: HD64572 drivers don't use next_desc() anymore.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa 61e0a6a268 WAN: Simplify HD64572 drivers.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa 967834361a WAN: don't print HD64572 driver versions anymore.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:48 +01:00
Krzysztof Hałasa 0954ed8269 WAN: Simplify HD64572 status handling.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa 0446c3b1e6 WAN: rework HD64572 interrupts a bit.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa b0942f78dd WAN: HD64572 already handles TX underruns with DMAC.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa 09fd65aa8a WAN: TX-done handler now uses the ownership bit in HD64572 drivers.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa abc9d91a35 WAN: convert HD64572-based drivers to NAPI.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa 302243922b WAN: remove SCA support from SCA-II drivers
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa 8859736457 WAN: remove SCA II support from SCA drivers
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Krzysztof Hałasa 6b40aba304 WAN: split hd6457x.c into hd64570.c and hd64572.c
Supporting both original SCA and SCA-II in one file was nice at some
point but now it's increasingly painful.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
2008-11-22 02:49:47 +01:00
Stephen Hemminger 4e4fd4e485 ne2k: convert to net_device_ops
Convert driver to new net_device_ops. Compile tested only.
This required some additional work to export common code ei_XXX.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-21 17:39:02 -08:00