linux

Commit Graph

Author	SHA1	Message	Date
Ying Xue	1b78414047	net: Remove iocb argument from sendmsg and recvmsg After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 13:06:31 -05:00
Gu Zheng	f95b414edb	net: introduce helper macro for_each_cmsghdr Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-10 22:41:55 -05:00
Al Viro	6ce8e9ce59	new helper: memcpy_from_msg() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-24 04:28:48 -05:00
David S. Miller	51f3d02b98	net: Add and use skb_copy_datagram_msg() helper. This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:46:40 -05:00
Ursula Braun	1042cab862	af_iucv: avoid path quiesce of severed path in shutdown() An af_iucv stress test showed -EPIPE results for sendmsg() calls. They are caused by quiescing a path even though it has been already severed by peer. For IUCV transport shutdown() consists of 2 steps: (1) sending the shutdown message to peer (2) quiescing the iucv path If the iucv path between these 2 steps is severed due to peer closing the path, the quiesce step is no longer needed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-21 20:21:40 -07:00
Fabian Frederick	089b03227a	af_iucv: remove unnecessary break after goto Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:27:00 -07:00
Ursula Braun	4d520f62e0	af_iucv: correct cleanup if listen backlog is full In case of transport HIPER a sock struct is allocated for an incoming connect request. If the backlog queue is full this socket is not needed, but is left in the list of af_iucv sockets. Final socket release posts console message "Attempt to release alive iucv socket". This patch makes sure the new created socket is cleaned up correctly if the backlog queue is full. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:35:23 -07:00
Philipp Hachtmann	53a4b4995e	af_iucv: Add automatic (source) iucv_name to bind If a socket is bound to an address using before calling connect it is usual to leave it to the network system to choose an appropriate outgoing application name respective port address. af_iucv on VM uses a counter and uses simple numbers as unique identifiers. This behaviour was missing when af_iucv is used with HiperSockets. This patch contains a simple approach to harmonize af_iucv's behaviour. Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-30 17:35:23 -07:00
Ursula Braun	f5738e2ef8	af_iucv: wrong mapping of sent and confirmed skbs When sending data through IUCV a MESSAGE COMPLETE interrupt signals that sent data memory can be freed or reused again. With commit `f9c41a62bb` "af_iucv: fix recvmsg by replacing skb_pull() function" the MESSAGE COMPLETE callback iucv_callback_txdone() identifies the wrong skb as being confirmed, which leads to data corruption. This patch fixes the skb mapping logic in iucv_callback_txdone(). Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:38:39 -04:00
David S. Miller	676d23690f	net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:15:36 -04:00
Ursula Braun	2f139a5d82	af_iucv: recvmsg problem for SOCK_STREAM sockets Commit `f9c41a62bb` introduced a problem for SOCK_STREAM sockets, when only part of the incoming iucv message is received by user space. In this case the remaining data of the iucv message is lost. This patch makes sure an incompletely received iucv message is queued back to the receive queue. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reported-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-20 00:06:55 -04:00
Hannes Frederic Sowa	f3d3342602	net: rework recvmsg handler msg_name and msg_namelen logic This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr \|\| msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-20 21:52:30 -05:00
Jiri Pirko	351638e7de	net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-28 13:11:01 -07:00
David S. Miller	6e0895c2ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-22 20:32:51 -04:00
Ursula Braun	f9c41a62bb	af_iucv: fix recvmsg by replacing skb_pull() function When receiving data messages, the "BUG_ON(skb->len < skb->data_len)" in the skb_pull() function triggers a kernel panic. Replace the skb_pull logic by a per skb offset as advised by Eric Dumazet. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-08 17:16:57 -04:00
David S. Miller	d978a6361a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/nfc/microread/mei.c net/netfilter/nfnetlink_queue_core.c Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which some cleanups are going to go on-top. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-07 18:37:01 -04:00
Mathias Krause	a5598bd9c0	iucv: Fix missing msg_namelen update in iucv_sock_recvmsg() The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about iucv_sock_recvmsg() not filling the msg_name in case it was set. Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-07 16:28:01 -04:00
Jacob Keller	8facd5fb73	net: fix smatch warnings inside datagram_poll Commit `7d4c04fc17` ("net: add option to enable error queue packets waking select") has an issue due to operator precedence causing the bit-wise OR to bind to the sock_flags call instead of the result of the terniary conditional. This fixes the *_poll functions to work properly. The old code results in "mask \|= POLLPRI" instead of what was intended, which is to only include POLLPRI when the socket option is enabled. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-02 16:59:16 -04:00
Keller, Jacob E	7d4c04fc17	net: add option to enable error queue packets waking select Currently, when a socket receives something on the error queue it only wakes up the socket on select if it is in the "read" list, that is the socket has something to read. It is useful also to wake the socket if it is in the error list, which would enable software to wait on error queue packets without waking up for regular data on the socket. The main use case is for receiving timestamped transmit packets which return the timestamp to the socket via the error queue. This enables an application to select on the socket for the error queue only instead of for the regular traffic. -v2- * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file * Modified every socket poll function that checks error queue Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Matthew Vick <matthew.vick@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-31 19:44:20 -04:00
Sasha Levin	b67bfe0d42	hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S \| hlist_for_each_entry_continue(a, - b, c) S \| hlist_for_each_entry_from(a, - b, c) S \| hlist_for_each_entry_rcu(a, - b, c, d) S \| hlist_for_each_entry_rcu_bh(a, - b, c, d) S \| hlist_for_each_entry_continue_rcu_bh(a, - b, c) S \| for_each_busy_worker(a, c, - b, d) S \| ax25_uid_for_each(a, - b, c) S \| ax25_for_each(a, - b, c) S \| inet_bind_bucket_for_each(a, - b, c) S \| sctp_for_each_hentry(a, - b, c) S \| sk_for_each(a, - b, c) S \| sk_for_each_rcu(a, - b, c) S \| sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S \| sk_for_each_safe(a, - b, c, d) S \| sk_for_each_bound(a, - b, c) S \| hlist_for_each_entry_safe(a, - b, c, d, e) S \| hlist_for_each_entry_continue_rcu(a, - b, c) S \| nr_neigh_for_each(a, - b, c) S \| nr_neigh_for_each_safe(a, - b, c, d) S \| nr_node_for_each(a, - b, c) S \| nr_node_for_each_safe(a, - b, c, d) S \| - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S \| - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S \| for_each_host(a, - b, c) S \| for_each_host_safe(a, - b, c, d) S \| for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:24 -08:00
Eric Dumazet	62b1a8ab9b	net: remove skb_orphan_try() Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : `fc6055a5ba` net: Introduce skb_orphan_try() `87fd308cfc` net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-15 15:30:15 -07:00
Ursula Braun	82492a355f	af_iucv: add shutdown for HS transport AF_IUCV sockets offer a shutdown function. This patch makes sure shutdown works for HS transport as well. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-07 22:52:24 -08:00
Ursula Braun	9fbd87d413	af_iucv: handle netdev events In case of transport through HiperSockets the underlying network interface may switch to DOWN state or the underlying network device may recover. In both cases the socket must change to IUCV_DISCONN state. If the interface goes down, af_iucv has a chance to notify its connection peer in addition. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-07 22:52:24 -08:00
Ursula Braun	51363b8751	af_iucv: allow retrieval of maximum message size For HS transport the maximum message size depends on the MTU-size of the HS-device bound to the AF_IUCV socket. This patch adds a getsockopt option MSGSIZE returning the maximum message size that can be handled for this AF_IUCV socket. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-08 18:50:19 -05:00
Ursula Braun	800c5eb7b5	af_iucv: change net_device handling for HS transport This patch saves the net_device in the iucv_sock structure during bind in order to fasten skb sending. In addition some other small improvements are made for HS transport: - error checking when sending skbs - locking changes in afiucv_hs_callback_txnotify - skb freeing in afiucv_hs_callback_txnotify And finally it contains code cleanup to get rid of iucv_skb_queue_purge. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-08 18:50:19 -05:00
Ursula Braun	7f1b0ea42a	af_iucv: block writing if msg limit is exceeded When polling on an AF_IUCV socket, writing should be blocked if the number of pending messages exceeds a defined limit. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-08 18:50:19 -05:00
Ursula Braun	7d316b9453	af_iucv: remove IUCV-pathes completely A SEVER is missing in the callback of a receiving SEVERED. This may inhibit z/VM to remove the corresponding IUCV-path completely. This patch adds a SEVER in iucv_callback_connrej (together with additional locking. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-08 18:50:19 -05:00
Ursula Braun	aac6399c6a	af_iucv: get rid of state IUCV_SEVERED af_iucv differs unnecessarily between state IUCV_SEVERED and IUCV_DISCONN. This patch removes state IUCV_SEVERED. While simplifying af_iucv, this patch removes the 2nd invocation of cpcmd as well. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 14:05:03 -05:00
Ursula Braun	9e8ba5f3ec	af_iucv: remove unused timer infrastructure af_iucv contains timer infrastructure which is not exploited. This patch removes the timer related code parts. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 14:05:03 -05:00
Ursula Braun	816abbadf9	af_iucv: release reference to HS device For HiperSockets transport skbs sent are bound to one of the available HiperSockets devices. Add missing release of reference to a HiperSockets device before freeing an skb. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 14:05:03 -05:00
Ursula Braun	42bd48e014	af_iucv: accelerate close for HS transport Closing an af_iucv socket may wait for confirmation of outstanding send requests. This patch adds confirmation code for the new HiperSockets transport. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 14:05:03 -05:00
Ursula Braun	c64d3f8f59	af_iucv: support ancillary data with HS transport The AF_IUCV address family offers support for ancillary data. This patch enables usage of ancillary data with the new HiperSockets transport. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 14:05:03 -05:00
Eric Dumazet	87fb4b7b53	net: more accurate skb truesize skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored. Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting. This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place. At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks. Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before. Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-13 16:05:07 -04:00
Ursula Braun	3881ac441f	af_iucv: add HiperSockets transport The current transport mechanism for af_iucv is the z/VM offered communications facility IUCV. To provide equivalent support when running Linux in an LPAR, HiperSockets transport is added to the AF_IUCV address family. It requires explicit binding of an AF_IUCV socket to a HiperSockets device. A new packet_type ETH_P_AF_IUCV is announced. An af_iucv specific transport header is defined preceding the skb data. A small protocol is implemented for connecting and for flow control/congestion management. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-13 01:10:16 -07:00
Ursula Braun	493d3971a6	af_iucv: cleanup - use iucv_sk(sk) early Code cleanup making make use of local variable for struct iucv_sock. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-13 01:10:16 -07:00
Frank Blaschka	6fcd61f7bf	af_iucv: use loadable iucv interface For future af_iucv extensions the module should be able to run in LPAR mode too. For this we use the new dynamic loading iucv interface. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-13 01:10:16 -07:00
Ursula Braun	9f6298a6ca	af_iucv: get rid of compile warning -Wunused-but-set-variable generates compile warnings. The affected variables are removed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 14:55:21 -04:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Julia Lawall	a56635a56f	net/iucv: Add missing spin_unlock Add a spin_unlock missing on the error path. There seems like no reason why the lock should continue to be held if the kzalloc fail. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-26 21:09:51 -07:00
Joe Perches	3fa21e07e6	net: Remove unnecessary returns from void function()s This patch removes from net/ (but not any netfilter files) all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ \| \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 23:23:14 -07:00
Eric Dumazet	4381548237	net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-01 15:00:15 -07:00
Eric Dumazet	aa39514516	net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t sk_sleep(struct sock sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 16:37:13 -07:00
Alexey Dobriyan	471452104b	const: constify remaining dev_pm_ops Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:25 -08:00
Eric Paris	3f378b6844	net: pass kern to net_proto_family create function The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-05 22:18:14 -08:00
Ursula Braun	9a4ff8d417	af_iucv: remove duplicate sock_set_flag Remove duplicate sock_set_flag(sk, SOCK_ZAPPED) in iucv_sock_close, which has been overlooked in September-commit `7514bab04e`. Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-17 23:57:20 -07:00
Hendrik Brueckner	49f5eba757	af_iucv: use sk functions to modify sk->sk_ack_backlog Instead of modifying sk->sk_ack_backlog directly, use respective socket functions. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-17 23:57:19 -07:00
Stephen Hemminger	ec1b4cf74c	net: mark net_proto_ops as const All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-07 01:10:46 -07:00
David S. Miller	b7058842c9	net: Make setsockopt() optlen be unsigned. This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-30 16:12:20 -07:00
Hendrik Brueckner	bf95d20fdb	af_iucv: fix race when queueing skbs on the backlog queue iucv_sock_recvmsg() and iucv_process_message()/iucv_fragment_skb race for dequeuing an skb from the backlog queue. If iucv_sock_recvmsg() dequeues first, iucv_process_message() calls sock_queue_rcv_skb() with an skb that is NULL. This results in the following kernel panic: <1>Unable to handle kernel pointer dereference at virtual kernel address (null) <4>Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC <4>Modules linked in: af_iucv sunrpc qeth_l3 dm_multipath dm_mod vmur qeth ccwgroup <4>CPU: 0 Not tainted 2.6.30 #4 <4>Process client-iucv (pid: 4787, task: 0000000034e75940, ksp: 00000000353e3710) <4>Krnl PSW : 0704000180000000 000000000043ebca (sock_queue_rcv_skb+0x7a/0x138) <4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 <4>Krnl GPRS: 0052900000000000 000003e0016e0fe8 0000000000000000 0000000000000000 <4> 000000000043eba8 0000000000000002 0000000000000001 00000000341aa7f0 <4> 0000000000000000 0000000000007800 0000000000000000 0000000000000000 <4> 00000000341aa7f0 0000000000594650 000000000043eba8 000000003fc2fb28 <4>Krnl Code: 000000000043ebbe: a7840006 brc 8,43ebca <4> 000000000043ebc2: 5930c23c c %r3,572(%r12) <4> 000000000043ebc6: a724004c brc 2,43ec5e <4> >000000000043ebca: e3c0b0100024 stg %r12,16(%r11) <4> 000000000043ebd0: a7190000 lghi %r1,0 <4> 000000000043ebd4: e310b0200024 stg %r1,32(%r11) <4> 000000000043ebda: c010ffffdce9 larl %r1,43a5ac <4> 000000000043ebe0: e310b0800024 stg %r1,128(%r11) <4>Call Trace: <4>([<000000000043eba8>] sock_queue_rcv_skb+0x58/0x138) <4> [<000003e0016bcf2a>] iucv_process_message+0x112/0x3cc [af_iucv] <4> [<000003e0016bd3d4>] iucv_callback_rx+0x1f0/0x274 [af_iucv] <4> [<000000000053a21a>] iucv_message_pending+0xa2/0x120 <4> [<000000000053b5a6>] iucv_tasklet_fn+0x176/0x1b8 <4> [<000000000014fa82>] tasklet_action+0xfe/0x1f4 <4> [<0000000000150a56>] __do_softirq+0x116/0x284 <4> [<0000000000111058>] do_softirq+0xe4/0xe8 <4> [<00000000001504ba>] irq_exit+0xba/0xd8 <4> [<000000000010e0b2>] do_extint+0x146/0x190 <4> [<00000000001184b6>] ext_no_vtime+0x1e/0x22 <4> [<00000000001fbf4e>] kfree+0x202/0x28c <4>([<00000000001fbf44>] kfree+0x1f8/0x28c) <4> [<000000000044205a>] __kfree_skb+0x32/0x124 <4> [<000003e0016bd8b2>] iucv_sock_recvmsg+0x236/0x41c [af_iucv] <4> [<0000000000437042>] sock_aio_read+0x136/0x160 <4> [<0000000000205e50>] do_sync_read+0xe4/0x13c <4> [<0000000000206dce>] vfs_read+0x152/0x15c <4> [<0000000000206ed0>] SyS_read+0x54/0xac <4> [<0000000000117c8e>] sysc_noemu+0x10/0x16 <4> [<00000042ff8def3c>] 0x42ff8def3c Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-16 20:57:39 -07:00
Hendrik Brueckner	7514bab04e	af_iucv: do not call iucv_sock_kill() twice For non-accepted sockets on the accept queue, iucv_sock_kill() is called twice (in iucv_sock_close() and iucv_sock_cleanup_listen()). This typically results in a kernel oops as shown below. Remove the duplicate call to iucv_sock_kill() and set the SOCK_ZAPPED flag in iucv_sock_close() only. The iucv_sock_kill() function frees a socket only if the socket is zapped and orphaned (sk->sk_socket == NULL): - Non-accepted sockets are always orphaned and, thus, iucv_sock_kill() frees the socket twice. - For accepted sockets or sockets created with iucv_sock_create(), sk->sk_socket is initialized. This caused the first call to iucv_sock_kill() to return immediately. To free these sockets, iucv_sock_release() uses sock_orphan() before calling iucv_sock_kill(). <1>Unable to handle kernel pointer dereference at virtual kernel address 000000003edd3000 <4>Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC <4>Modules linked in: af_iucv sunrpc qeth_l3 dm_multipath dm_mod qeth vmur ccwgroup <4>CPU: 0 Not tainted 2.6.30 #4 <4>Process iucv_sock_close (pid: 2486, task: 000000003aea4340, ksp: 000000003b75bc68) <4>Krnl PSW : 0704200180000000 000003e00168e23a (iucv_sock_kill+0x2e/0xcc [af_iucv]) <4> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 <4>Krnl GPRS: 0000000000000000 000000003b75c000 000000003edd37f0 0000000000000001 <4> 000003e00168ec62 000000003988d960 0000000000000000 000003e0016b0608 <4> 000000003fe81b20 000000003839bb58 00000000399977f0 000000003edd37f0 <4> 000003e00168b000 000003e00168f138 000000003b75bcd0 000000003b75bc98 <4>Krnl Code: 000003e00168e22a: c0c0ffffe6eb larl %r12,3e00168b000 <4> 000003e00168e230: b90400b2 lgr %r11,%r2 <4> 000003e00168e234: e3e0f0980024 stg %r14,152(%r15) <4> >000003e00168e23a: e310225e0090 llgc %r1,606(%r2) <4> 000003e00168e240: a7110001 tmll %r1,1 <4> 000003e00168e244: a7840007 brc 8,3e00168e252 <4> 000003e00168e248: d507d00023c8 clc 0(8,%r13),968(%r2) <4> 000003e00168e24e: a7840009 brc 8,3e00168e260 <4>Call Trace: <4>([<000003e0016b0608>] afiucv_dbf+0x0/0xfffffffffffdea20 [af_iucv]) <4> [<000003e00168ec6c>] iucv_sock_close+0x130/0x368 [af_iucv] <4> [<000003e00168ef02>] iucv_sock_release+0x5e/0xe4 [af_iucv] <4> [<0000000000438e6c>] sock_release+0x44/0x104 <4> [<0000000000438f5e>] sock_close+0x32/0x50 <4> [<0000000000207898>] __fput+0xf4/0x250 <4> [<00000000002038aa>] filp_close+0x7a/0xa8 <4> [<00000000002039ba>] SyS_close+0xe2/0x148 <4> [<0000000000117c8e>] sysc_noemu+0x10/0x16 <4> [<00000042ff8deeac>] 0x42ff8deeac Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-16 20:57:38 -07:00

1 2

96 Commits