Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... and make it handle multi-segment iovecs - deals with that
"fix this later" issue for free. A bit of shame, really - it
had been there since 2.3.15pre3 when the whole thing went into the
tree, practically a historical artefact by now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".
When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.
Having a helper like this means there will be less places to touch
during that transformation.
Based upon descriptions and patch from Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.
Current handling of IFF_XMIT_DST_RELEASE is not optimal.
Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y
The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.
As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.
This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.
Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :
dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);
Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.
The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LECS response contains the MTU that should be used. Correctly
synchronize with other layers when updating.
Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
b67bfe0d42 (hlist: drop the node
parameter from iterators) dropped the node parameter from
iterators which lec_tbl_walk() was using to iterate the list.
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
One should not call blocking primitives inside a wait loop, since both
require task_struct::state to sleep, so the inner will destroy the
outer state.
sigd_enq() will possibly sleep for alloc_skb(). Move sigd_enq() before
prepare_to_wait() to avoid sleeping while waiting interruptibly. You do
not actually need to call sigd_enq() after the initial prepare_to_wait()
because we test the termination condition before calling schedule().
Based on suggestions from Peter Zijlstra.
Signed-off-by: Chas Williams <chas@cmf.n4rl.navy.mil>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
This preprocessor check is commented out ever since this file was added
during the v2.3 development cycle. It is unclear what it purpose might
have been. Whatever it was, it can safely be removed now.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.
Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.
And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.
So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.
Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use del_timer_sync to ensure that the timer is stopped on all CPUs before
the driver exists.
This change was suggested by Thomas Gleixner.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
declarer name module_exit;
identifier ex;
@@
module_exit(ex);
@@
identifier r.ex;
@@
ex(...) {
<...
- del_timer
+ del_timer_sync
(...)
...>
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/appletalk/aarp.c: In function ‘__aarp_send_query’:
net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
...
net/atm/lec.c: In function ‘send_to_lecd’:
net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default]
In file included from net/atm/lec.c:17:0:
include/linux/etherdevice.h:227:20: note: expected ‘u8 *’ but argument is of type ‘unsigned char (*)[6]’
...
net/caif/caif_usb.c: In function ‘cfusbl_create’:
net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration]
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 351638e7de (net: pass info struct via netdevice notifier)
breaks booting of my KVM guest, this is due to we still forget to pass
struct netdev_notifier_info in several places. This patch completes it.
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Conflicts:
drivers/nfc/microread/mei.c
net/netfilter/nfnetlink_queue_core.c
Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
an 802.3 frame. Frames with a lower value in the ethernet type field
are Ethernet II.
Also update all the users of this value that David Miller and
I could find to use the new constant.
Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
should be >= not >.
As suggested by Jesse Gross.
Compile tested only.
Cc: David Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-wireless@vger.kernel.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-media@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: dev@openvswitch.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.
this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Chen Gang <gang.chen@asianux.com>, we should ensure there
is enough space when formatting the sysfs buffers.
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to schedule the wakeup tasklet on *every* unlock; only if we
actually blocked the channel in the first place.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
The br2684 does not check if used vcc is in connected state,
causing potential Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.
Now br2684 can be assigned only on connected sockets; otherwise
-EINVAL error is returned.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The br2684 code used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.
Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
Now that we can return zero from pppoatm_send() for reasons *other* than
the queue being full, that means we can't depend on a subsequent call to
pppoatm_pop() waking the queue, and we might leave it stalled
indefinitely.
Use the ->release_cb() callback to wake the queue after the sock is
unlocked.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
Avoid submitting packets to a vcc which is being closed. Things go badly
wrong when the ->pop method gets later called after everything's been
torn down.
Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
which clears the ATM_VF_READY bit under the same lock. Otherwise, we
could end up submitting a packet to the device driver even after its
->ops->close method has been called. And it could call the vcc's ->pop
method after the protocol has been shut down. Which leads to a panic.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
The immediate use case for this is that it will allow us to ensure that a
pppoatm queue is woken after it has to drop a packet due to the sock being
locked.
Note that 'release_cb' is called when the socket is *unlocked*. This is
not to be confused with vcc_release() — which probably ought to be called
vcc_close().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
The pppoatm_may_send() is quite heavy and it's called three times
in pppoatm_send() and inlining costs more than 200 bytes of code
(more than 10% of total pppoatm driver code size).
add/remove: 1/0 grow/shrink: 0/1 up/down: 132/-367 (-235)
function old new delta
pppoatm_may_send - 132 +132
pppoatm_send 900 533 -367
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The vcc_destroy_socket() closes vcc before the protocol is detached
from vcc by calling vcc->push() with NULL skb. This leaves some time
window, where the protocol may call vcc->send() on closed vcc
and crash.
Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
indicate that vcc is not ready. If the vcc is not ready we just
drop frame. Queueing frames is much more complicated because we
don't have callbacks that inform us about vcc flags changes.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The pppoatm_send() does not take any lock that will prevent concurrent
vcc_sendmsg(). This causes two problems:
- there is no locking between checking the send queue size
with atm_may_send() and incrementing sk_wmem_alloc,
and the real queue size can be a little higher than sk_sndbuf
- the vcc->sendmsg() can be called concurrently. I'm not sure
if it's allowed. Some drivers (eni, nicstar, ...) seem
to assume it will never happen.
Now pppoatm_send() takes ATM socket lock, the same that is used
in vcc_sendmsg() and other ATM socket functions. The pppoatm_send()
is called with BH disabled, so bh_lock_sock() is used instead
of lock_sock().
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The pppoatm used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.
Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The pppoatm does not check if used vcc is in connected state,
causing an Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.
Now pppoatm can be assigned only on connected sockets; otherwise
-EINVAL error is returned.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The atm is using atmvcc->push(vcc, NULL) callback to notify protocol
that vcc will be closed and protocol must detach from it. This callback
is usually used by protocol to decrement module usage count by module_put(),
but it leaves small window then module is still used after module_put().
Now the owner of push() callback is kept in atmvcc and
module_put(atmvcc->owner) is called after the protocol is detached from vcc.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.
If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.
cf. commit 9d02daf754 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.
Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.
An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...
Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.
Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
At commit 07d106d0, Linus pointed out that ENOIOCTLCMD should be
translated as ENOTTY to user mode.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.
For example, this cast:
int y;
int *p = (int *)&y;
I used the coccinelle script below to find and remove these
unnecessary casts. I manually removed the conversions this
script produces of casts with __force and __user.
@@
type T;
T *p;
@@
- (T *)p
+ p
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some dropwatch/drop_monitor false positives.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial updates from Jiri Kosina:
"As usual, it's mostly typo fixes, redundant code elimination and some
documentation updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
edac, mips: don't change code that has been removed in edac/mips tree
xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
lib: Change mail address of Oskar Schirmer
net: Change mail address of Oskar Schirmer
arm/m68k: Change mail address of Sebastian Hess
i2c: Change mail address of Oskar Schirmer
net: Fix tcp_build_and_update_options comment in struct tcp_sock
atomic64_32.h: fix parameter naming mismatch
Kconfig: replace "--- help ---" with "---help---"
c2port: fix bogus Kconfig "default no"
edac: Fix spelling errors.
qla1280: Remove redundant NULL check before release_firmware() call
remoteproc: remove redundant NULL check before release_firmware()
qla2xxx: Remove redundant NULL check before release_firmware() call.
aic94xx: Get rid of redundant NULL check before release_firmware() call
tehuti: delete redundant NULL check before release_firmware()
qlogic: get rid of a redundant test for NULL before call to release_firmware()
bna: remove redundant NULL test before release_firmware()
tg3: remove redundant NULL test before release_firmware() call
typhoon: get rid of redundant conditional before all to release_firmware()
...
Standardize the net core ratelimited logging functions.
Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new bool function ether_addr_equal to add
some clarity and reduce the likelihood for misuse
of compare_ether_addr for sorting.
Done via cocci script:
$ cat compare_ether_addr.cocci
@@
expression a,b;
@@
- !compare_ether_addr(a, b)
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- compare_ether_addr(a, b)
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) == 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !ether_addr_equal(a, b) != 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) == 0
+ !ether_addr_equal(a, b)
@@
expression a,b;
@@
- ether_addr_equal(a, b) != 0
+ ether_addr_equal(a, b)
@@
expression a,b;
@@
- !!ether_addr_equal(a, b)
+ ether_addr_equal(a, b)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We discovered that PPPoATM has an excessively deep transmit queue. A
queue the size of the default socket send buffer (wmem_default) is
maintained between the PPP generic core and the ATM device.
Fix it to queue a maximum of *two* packets. The one the ATM device is
currently working on, and one more for the ATM driver to process
immediately in its TX done interrupt handler. The PPP core is designed
to feed packets to the channel with minimal latency, so that really
ought to be enough to keep the ATM device busy.
While we're at it, fix the fact that we were triggering the wakeup
tasklet on *every* pppoatm_pop() call. The comment saying "this is
inefficient, but doing it right is too hard" turns out to be overly
pessimistic... I think :)
On machines like the Traverse Geos, with a slow Geode CPU and two
high-speed ADSL2+ interfaces, there were reports of extremely high CPU
usage which could partly be attributed to the extra wakeups.
(The wakeup handling could actually be made a whole lot easier if we
stop checking sk->sk_sndbuf altogether. Given that we now only queue
*two* packets ever, one wonders what the point is. As it is, you could
already deadlock the thing by setting the sk_sndbuf to a value lower
than the MTU of the device, and it'd just block for ever.)
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge with latest Linus' tree, as I have incoming patches
that fix code that is newer than current HEAD of for-next.
Conflicts:
drivers/net/ethernet/realtek/r8169.c
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
Since all that include/linux/if_ppp.h does is #include <linux/ppp-ioctl.h>,
this replaces the occurrences of #include <linux/if_ppp.h> with
#include <linux/ppp-ioctl.h>.
It also corrects an error in Documentation/networking/l2tp.txt, where
it referenced include/linux/if_ppp.h as the source of some definitions
that are actually now defined in include/linux/if_pppol2tp.h.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/sfc/rx.c
Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 32092ecf06 (atm: clip: Use device neigh support on top of
"arp_tbl".) introduced a bug since clip_tbl is zeroed : Crash occurs in
__neigh_for_each_release()
idle_timer_check() must use instead arp_tbl and neigh_check_cb() should
ignore non clip neighbours.
Idea from David Miller.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
CLIP only support ipv4, and this is evidenced by the fact that it
is a device specific extension of arp_tbl, so this conversion is
pretty straightforward.
Signed-off-by: David S. Miller <davem@davemloft.net>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
Instead of instantiating an entire new neigh_table instance
just for ATM handling, use the neigh device private facility.
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev->neigh_priv_len records the private area length.
This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memcmp() instead of cast to u16 when checking the PAD field.
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Routed payload requires less headroom than bridged payload.
So do not reallocate headroom if not needed.
Also, add worst case AAL5 overhead to netdev->hard_header_len.
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the vcc backends do the right thing with respect the receive
queue on registration, allow MSK_PEEK for atm sockets.
This allows a userspace program to inspect the packets and decide what
backend to use to handle them.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function moves the implementation found in the clip and br2684
modules to common code, correctly unlinks the skb from the queue
before pushing it and makes pppoatm use it.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't see the point on substracting the skb len from the netdev
stats.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This snippet has caused several bugs in the past, and I don't see the
point on substracting the skb len from netdev stats.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> > ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit. The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.
Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.
[PATCH V5 net-next] neigh: new unresolved queue limits
unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.
$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This oops have been already fixed with commit
27141666b6
atm: [br2684] Fix oops due to skb->dev being NULL
It happens that if a packet arrives in a VC between the call to open it on
the hardware and the call to change the backend to br2684, br2684_regvcc
processes the packet and oopses dereferencing skb->dev because it is
NULL before the call to br2684_push().
but have been introduced again with commit
b6211ae7f2
atm: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: Daniel Schwierzeck <daniel.schwierzeck@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.
We will also be able to make dst entries neigh-less.
Signed-off-by: David S. Miller <davem@davemloft.net>
Unnecessary casts of void * clutter the code.
These are the remainder casts after several specific
patches to remove netdev_priv and dev_priv.
Done via coccinelle script:
$ cat cast_void_pointer.cocci
@@
type T;
T *pt;
void *pv;
@@
- pt = (T *)pv;
+ pt = pv;
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Kill ratelimit.h dependency in linux/net.h
net: Add linux/sysctl.h includes where needed.
net: Kill ether_table[] declaration.
inetpeer: fix race in unused_list manipulations
atm: expose ATM device index in sysfs
IPVS: bug in ip_vs_ftp, same list heaad used in all netns.
bug.h: Move ratelimit warn interfaces to ratelimit.h
bonding: cleanup module option descriptions
net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
net: davinci_emac: fix dev_err use at probe
can: convert to %pK for kptr_restrict support
net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags
netfilter: Fix several warnings in compat_mtw_from_user().
netfilter: ipset: fix ip_set_flush return code
netfilter: ipset: remove unused variable from type_pf_tdel()
netfilter: ipset: Use proper timeout value to jiffies conversion
It's currently exposed only through /proc which, besides requiring
screen-scraping, doesn't allow userspace to distinguish between two
identical ATM adapters with different ATM indexes. The ATM device index
is required when using PPPoATM on a system with multiple ATM adapters.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: David Woodhouse <dwmw2@infradead.org>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
gfs2: Drop __TIME__ usage
isdn/diva: Drop __TIME__ usage
atm: Drop __TIME__ usage
dlm: Drop __TIME__ usage
wan/pc300: Drop __TIME__ usage
parport: Drop __TIME__ usage
hdlcdrv: Drop __TIME__ usage
baycom: Drop __TIME__ usage
pmcraid: Drop __DATE__ usage
edac: Drop __DATE__ usage
rio: Drop __DATE__ usage
scsi/wd33c93: Drop __TIME__ usage
scsi/in2000: Drop __TIME__ usage
aacraid: Drop __TIME__ usage
media/cx231xx: Drop __TIME__ usage
media/radio-maxiradio: Drop __TIME__ usage
nozomi: Drop __TIME__ usage
cyclades: Drop __TIME__ usage
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.
Acked-by: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.
If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".
The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable 'eth' is set but unused in lec_handle_bridge(). Also,
the variable 'priv' is set but unused in lane_module_cleanup().
Just kill them off.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't flap VCs when carrier state changes; higher-level protocols
can detect loss of connectivity and act accordingly. This is more
consistent with how other network interfaces work.
We no longer use release_vccs() so we can delete it.
release_vccs() was duplicated from net/atm/common.c; make the
corresponding function exported, since other code duplicates it
and could leverage it if it were public.
Signed-off-by: Philip A. Prindeville <philipp@redfish-solutions.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.
Signed-off-by: David S. Miller <davem@davemloft.net>