* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
smc91c92_cs: fix the problem of "Unable to find hardware address"
r8169: clean up my printk uglyness
net: Hook up cxgb4 to Kconfig and Makefile
cxgb4: Add main driver file and driver Makefile
cxgb4: Add remaining driver headers and L2T management
cxgb4: Add packet queues and packet DMA code
cxgb4: Add HW and FW support code
cxgb4: Add register, message, and FW definitions
netlabel: Fix several rcu_dereference() calls used without RCU read locks
bonding: fix potential deadlock in bond_uninit()
net: check the length of the socket address passed to connect(2)
stmmac: add documentation for the driver.
stmmac: fix kconfig for crc32 build error
be2net: fix bug in vlan rx path for big endian architecture
be2net: fix flashing on big endian architectures
be2net: fix a bug in flashing the redboot section
bonding: bond_xmit_roundrobin() fix
drivers/net: Add missing unlock
net: gianfar - align BD ring size console messages
net: gianfar - initialize per-queue statistics
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: saving negative to unsigned char
9p: return on mutex_lock_interruptible()
9p: Creating files with names too long should fail with ENAMETOOLONG.
9p: Make sure we are able to clunk the cached fid on umount
9p: drop nlink remove
fs/9p: Clunk the fid resulting from partial walk of the name
9p: documentation update
9p: Fix setting of protocol flags in v9fs_session_info structure.
Saving -EINVAL as unsigned char truncates the high bits and changes it
into 234 instead of -22. This breaks the test for "if (ret == -EINVAL)"
in parse_opts().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
RPC6 requires that it be possible to create endpoints that listen
exclusively for IPv4 or IPv6 connection requests. This is not currently
supported by the RDMA API.
This fixes a server RDMA regression introduced by 37498292a "NFSD:
Create PF_INET6 listener in write_ports".
Signed-off-by: Tom Tucker<tom@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
dcache prune happen on umount. So we cannot mark the client
satus disconnect. That will prevent a 9p call to the server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The recent changes to add RCU lock verification to rcu_dereference() calls
caught out a problem with netlbl_unlhsh_hash(), see below.
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
net/netlabel/netlabel_unlabeled.c:246 invoked rcu_dereference_check()
without protection!
This patch fixes this problem as well as others like it in the NetLabel
code. Also included in this patch is the identification of future work
to eliminate the RCU read lock in netlbl_domhsh_add(), but in the interest
of getting this patch out quickly that work will happen in another patch
to be finished later.
Thanks to Eric Dumazet and Paul McKenney for their help in understanding
the recent RCU changes.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reported-by: David Howells <dhowells@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
check the length of the socket address passed to connect(2).
Check the length of the socket address passed to connect(2). If the
length is invalid, -EINVAL will be returned.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/bluetooth/l2cap.c | 3 ++-
net/bluetooth/rfcomm/sock.c | 3 ++-
net/bluetooth/sco.c | 3 ++-
net/can/bcm.c | 3 +++
net/ieee802154/af_ieee802154.c | 3 +++
net/ipv4/af_inet.c | 5 +++++
net/netlink/af_netlink.c | 3 +++
7 files changed, 20 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_read_sock() can have a eat skbs without immediately advancing copied_seq.
This can cause a panic in tcp_collapse() if it is called as a result
of the recv_actor dropping the socket lock.
A userspace program that splices data from a socket to either another
socket or to a file can trigger this bug.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"mac80211: fix skb buffering issue" still left a race
between enabling the hardware queues and the virtual
interface queues. In hindsight it's totally obvious
that enabling the netdev queues for a hardware queue
when the hardware queue is enabled is wrong, because
it could well possible that we can fill the hw queue
with packets we already have pending. Thus, we must
only enable the netdev queues once all the pending
packets have been processed and sent off to the device.
In testing, I haven't been able to trigger this race
condition, but it's clearly there, possibly only when
aggregation is being enabled.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
1st) a PREQ should only be processed, if it has the same SN and better
metric (instead of better or equal).
2nd) next_hop[ETH_ALEN] now actually used to buffer
mpath->next_hop->sta.addr for use out of lock.
Signed-off-by: Marco Porsch <marco.porsch@siemens.com>
Acked-by: Javier Cardona <javier@cozybit.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Stanse discovered that kmalloc is being called with GFP_KERNEL while
holding this spinlock. The spinlock can be a mutex instead, which also
enables the removal of the unlock/lock around the lock/unlock of
cfg80211_mutex and the call to set_regdom.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
r8169: offical fix for CVE-2009-4537 (overlength frame DMAs)
ipv6: Don't drop cache route entry unless timer actually expired.
tulip: Add missing parens.
r8169: fix broken register writes
pcnet_cs: add new id
bonding: fix broken multicast with round-robin mode
drivers/net: Fix continuation lines
e1000: do not modify tx_queue_len on link speed change
net: ipmr/ip6mr: prevent out-of-bounds vif_table access
ixgbe: Do not run all Diagnostic offline tests when VFs are active
igb: use correct bits to identify if managability is enabled
benet: Fix compile warnnings in drivers/net/benet/be_ethtool.c
net: Add MSG_WAITFORONE flag to recvmmsg
e1000e: do not modify tx_queue_len on link speed change
igbvf: do not modify tx_queue_len on link speed change
ipv4: Restart rt_intern_hash after emergency rebuild (v2)
ipv4: Cleanup struct net dereference in rt_intern_hash
net: fix netlink address dumping in IPv4/IPv6
tulip: Fix null dereference in uli526x_rx_packet()
gianfar: fix undo of reserve()
...
This is ipv6 variant of the commit 5e016cbf6.. ("ipv4: Don't drop
redirected route cache entry unless PTMU actually expired")
by Guenter Roeck <guenter.roeck@ericsson.com>.
Remove cache route entry in ipv6_negative_advice() only if
the timer is expired.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cache is unresolved, c->mf[6]c_parent is set to 65535 and
minvif, maxvif are not initialized, hence we must avoid to
parse IIF and OIF.
A second problem can happen when the user dumps a cache entry
where a VIF, that was referenced at creation time, has been
removed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new flag MSG_WAITFORONE for the recvmmsg() syscall.
When this flag is specified for a blocking socket, recvmmsg()
will only block until at least 1 packet is available. The
default behavior is to block until all vlen packets are
available. This flag has no effect on non-blocking sockets
or when used in combination with MSG_DONTWAIT.
Signed-off-by: Brandon L Black <blblack@gmail.com>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The the rebuild changes the genid which in turn is used at
the hash calculation. Thus if we don't restart and go on with
inserting the rt will happen in wrong chain.
(Fixed Neil's comment about the index passed into the rt_intern_hash)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need in getting it 3 times and gcc isn't smart enough
to understand this himself.
This is just a cleanup before the fix (next patch).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a dump is interrupted at the last device in a hash chain and
then continued, "idx" won't get incremented past s_idx, so s_ip_idx
is not reset when moving on to the next device. This means of all
following devices only the last n - s_ip_idx addresses are dumped.
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
A missing break statement in hashlimit_ipv6_mask(), and masks
between /64 and /95 are not working at all...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The order of the IPv6 raw table is currently reversed, that makes impossible
to use the NOTRACK target in IPv6: for example if someone enters
ip6tables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK
and if we receive fragmented packets then the first fragment will be
untracked and thus skip nf_ct_frag6_gather (and conntrack), while all
subsequent fragments enter nf_ct_frag6_gather and reassembly will never
successfully be finished.
Singed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
If dl_seq_start() memory allocation fails, we crash later in
dl_seq_stop(), trying to kfree(ERR_PTR(-ENOMEM))
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: don't try to decode GETATTR if DELEGRETURN returned error
sunrpc: handle allocation errors from __rpc_lookup_create()
SUNRPC: Fix the return value of rpc_run_bc_task()
SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel
SUNRPC: Fix a potential memory leak in auth_gss
NFS: Prevent another deadlock in nfs_release_page()
The original code saved the error value but just returned 0 in the end.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updates real_num_tx_queues in case underlying real device
has changed real_num_tx_queues.
-v2
As per Eric Dumazet<eric.dumazet@gmail.com> comment:-
-- adds BUG_ON to catch case of real_num_tx_queues exceeding num_tx_queues.
-- created this self contained patch to just update real_num_tx_queues.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to correctly select vlan tx queue for a driver
supporting multi tx queue with ndo_select_queue implemented since
currently selected vlan tx queue is unaligned to selected queue by
real net_devce ndo_select_queue.
Unaligned vlan tx queue selection causes thrash with higher vlan
tx lock contention for least fcoe traffic and wrong socket tx
queue_mapping for ixgbe having ndo_select_queue implemented.
-v2
As per Eric Dumazet<eric.dumazet@gmail.com> comments, mirrored
vlan net_device_ops to have them with and without vlan_dev_select_queue
and then select according to real dev ndo_select_queue present or not
for a vlan net_device. This is to completely skip vlan_dev_select_queue
calling for real net_device not supporting ndo_select_queue.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows the net_cls cgroup subsystem to be compiled as a module
This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem
to be optionally compiled as a module instead of builtin. The
cgroup_subsys struct is moved around a bit to allow the subsys_id to be
either declared as a compile-time constant by the cgroup_subsys.h include
in cgroup.h, or, if it's a module, initialized within the struct by
cgroup_load_subsys.
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2: update according to Frans' comments.
Currently, if we leave spaces before dst port,
netconsole will silently accept it as 0. Warn about this.
Also, when spaces appear in other places, make them
visible in error messages.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8ccb92ad (netfilter: xt_recent: fix false match) fixed supposedly
false matches in rules using a zero hit_count. As it turns out there is
nothing false about these matches and people are actually using entries
with a hit_count of zero to make rules dependant on addresses inserted
manually through /proc.
Since this slipped past the eyes of three reviewers, instead of
reverting the commit in question, this patch explicitly checks
for a hit_count of zero to make the intentions more clear.
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (38 commits)
ip_gre: include route header_len in max_headroom calculation
if_tunnel.h: add missing ams/byteorder.h include
ipv4: Don't drop redirected route cache entry unless PTMU actually expired
net: suppress lockdep-RCU false positive in FIB trie.
Bluetooth: Fix kernel crash on L2CAP stress tests
Bluetooth: Convert debug files to actually use debugfs instead of sysfs
Bluetooth: Fix potential bad memory access with sysfs files
netfilter: ctnetlink: fix reliable event delivery if message building fails
netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
NET_DMA: free skbs periodically
netlink: fix unaligned access in nla_get_be64()
tcp: Fix tcp_mark_head_lost() with packets == 0
net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
KS8695: update ksp->next_rx_desc_read at the end of rx loop
igb: Add support for 82576 ET2 Quad Port Server Adapter
ixgbevf: Message formatting cleanups
ixgbevf: Shorten up delay timer for watchdog task
ixgbevf: Fix VF Stats accounting after reset
ixgbe: Set IXGBE_RSC_CB(skb)->DMA field to zero after unmapping the address
ixgbe: fix for real_num_tx_queues update issue
...
Currently rpc_run_bc_task() will return NULL if the task allocation failed.
However the only caller is bc_send, which assumes that the return value
will be an ERR_PTR.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The ->release_request() callback was designed to allow the transport layer
to do housekeeping after the RPC call is done. It cannot be used to free
the request itself, and doing so leads to a use-after-free bug in
xprt_release().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Taking route's header_len into account, and updating gre device
needed_headroom will give better hints on upper bound of required
headroom. This is useful if the gre traffic is xfrm'ed.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP sessions over IPv4 can get stuck if routers between endpoints
do not fragment packets but implement PMTU instead, and we are using
those routers because of an ICMP redirect.
Setup is as follows
MTU1 MTU2 MTU1
A--------B------C------D
with MTU1 > MTU2. A and D are endpoints, B and C are routers. B and C
implement PMTU and drop packets larger than MTU2 (for example because
DF is set on all packets). TCP sessions are initiated between A and D.
There is packet loss between A and D, causing frequent TCP
retransmits.
After the number of retransmits on a TCP session reaches tcp_retries1,
tcp calls dst_negative_advice() prior to each retransmit. This results
in route cache entries for the peer to be deleted in
ipv4_negative_advice() if the Path MTU is set.
If the outstanding data on an affected TCP session is larger than
MTU2, packets sent from the endpoints will be dropped by B or C, and
ICMP NEEDFRAG will be returned. A and D receive NEEDFRAG messages and
update PMTU.
Before the next retransmit, tcp will again call dst_negative_advice(),
causing the route cache entry (with correct PMTU) to be deleted. The
retransmitted packet will be larger than MTU2, causing it to be
dropped again.
This sequence repeats until the TCP session aborts or is terminated.
Problem is fixed by removing redirected route cache entries in
ipv4_negative_advice() only if the PMTU is expired.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow fib_find_node() to be called either under rcu_read_lock()
protection or with RTNL held.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function alloc_enc_pages() currently fails to release the pointer
rqstp->rq_enc_pages in the error path.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: stable@kernel.org
Some of the debug files ended up wrongly in sysfs, because at that point
of time, debugfs didn't exist. Convert these files to use debugfs and
also seq_file. This patch converts all of these files at once and then
removes the exported symbol for the Bluetooth sysfs class.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When creating a high number of Bluetooth sockets (L2CAP, SCO
and RFCOMM) it is possible to scribble repeatedly on arbitrary
pages of memory. Ensure that the content of these sysfs files is
always less than one page. Even if this means truncating. The
files in question are scheduled to be moved over to debugfs in
the future anyway.
Based on initial patches from Neil Brown and Linus Torvalds
Reported-by: Neil Brown <neilb@suse.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes a bug that allows to lose events when reliable
event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR
and NETLINK_RECV_NO_ENOBUFS socket options are set.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, ENOBUFS errors are reported to the socket via
netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
that should not happen. This fixes this problem and it changes the
prototype of netlink_set_err() to return the number of sockets that
have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
value is used in the next patch in these bugfix series.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under NET_DMA, data transfer can grind to a halt when userland issues a
large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
This appears to be because the NET_DMA design queues up lots of memcpy
operations, but doesn't issue or wait for them (and thus free the
associated skbs) until it is time for tcp_recvmesg() to return.
The socket hangs when its TCP window goes to zero before enough data is
available to satisfy the read.
Periodically issue asynchronous memcpy operations, and free skbs for ones
that have completed, to prevent sockets from going into zero-window mode.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>