linux

Commit Graph

Author	SHA1	Message	Date
Eric W. Biederman	11aa9c28b4	net: Pass kern from net_proto_family.create to sk_alloc In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:17 -04:00
Richard Alpe	b063bc5ea7	tipc: send explicit not supported error in nl compat The legacy netlink API treated EPERM (permission denied) as "operation not supported". Reported-by: Tomi Ollila <tomi.ollila@iki.fi> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 16:40:03 -04:00
Richard Alpe	670f4f8818	tipc: add broadcast link window set/get to nl api Add the ability to get or set the broadcast link window through the new netlink API. The functionality was unintentionally missing from the new netlink API. Adding this means that we also fix the breakage in the old API when coming through the compat layer. Fixes: `37e2d4843f` (tipc: convert legacy nl link prop set to nl compat) Reported-by: Tomi Ollila <tomi.ollila@iki.fi> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 16:40:02 -04:00
Richard Alpe	c3d6fb85b2	tipc: fix default link prop regression in nl compat Default link properties can be set for media or bearer. This functionality was missed when introducing the NL compatibility layer. This patch implements this functionality in the compat netlink layer. It works the same way as it did in the old API. We search for media and bearers matching the "link name". If we find a matching media or bearer the link tolerance, priority or window is used as default for new links on that media or bearer. Fixes: `37e2d4843f` (tipc: convert legacy nl link prop set to nl compat) Reported-by: Tomi Ollila <tomi.ollila@iki.fi> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 16:40:02 -04:00
Ying Xue	90bdfcb76f	tipc: deal with return value of tipc_conn_new callback Once tipc_conn_new() returns NULL, the connection should be shut down immediately, otherwise, oops may happen due to the NULL pointer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 15:04:01 -04:00
Ying Xue	a13683f292	tipc: adjust locking policy of subscription Currently subscriber's lock protects not only subscriber's subscription list but also all subscriptions linked into the list. However, as all members of subscription are never changed after they are initialized, it's unnecessary for subscription to be protected under subscriber's lock. If the lock is used to only protect subscriber's subscription list, the adjustment not only makes the locking policy simpler, but also helps to avoid a deadlock which may happen once creating a subscription is failed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 15:04:01 -04:00
Ying Xue	00bc00a938	tipc: involve reference counter for subscriber At present subscriber's lock is used to protect the subscription list of subscriber as well as subscriptions linked into the list. While one or all subscriptions are deleted through iterating the list, the subscriber's lock must be held. Meanwhile, as deletion of subscription may happen in subscription timer's handler, the lock must be grabbed in the function as well. When subscription's timer is terminated with del_timer_sync() during above iteration, subscriber's lock has to be temporarily released, otherwise, deadlock may occur. However, the temporary release may cause the double free of a subscription as the subscription is not disconnected from the subscription list. Now if a reference counter is introduced to subscriber, subscription's timer can be asynchronously stopped with del_timer(). As a result, the issue is not only able to be fixed, but also relevant code is pretty readable and understandable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 15:04:01 -04:00
Ying Xue	1b764828ad	tipc: introduce tipc_subscrb_create routine Introducing a new function makes the purpose of tipc_subscrb_connect_cb callback routine more clear. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 15:04:01 -04:00
Ying Xue	57f1d1868f	tipc: rename functions defined in subscr.c When a topology server accepts a connection request from its client, it allocates a connection instance and a tipc_subscriber structure object. The former is used to communicate with client, and the latter is often treated as a subscriber which manages all subscription events requested from a same client. When a topology server receives a request of subscribing name services from a client through the connection, it creates a tipc_subscription structure instance which is seen as a subscription recording what name services are subscribed. In order to manage all subscriptions from a same client, topology server links them into the subscrp_list of the subscriber. So subscriber and subscription completely represents different meanings respectively, but function names associated with them make us so confused that we are unable to easily tell which function is against subscriber and which is to subscription. So we want to eliminate the confusion by renaming them. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-04 15:04:00 -04:00
Jon Paul Maloy	0d699f28ee	tipc: fix problem with parallel link synchronization mechanism Currently, we try to accumulate arrived packets in the links's 'deferred' queue during the parallel link syncronization phase. This entails two problems: - With an unlucky combination of arriving packets the algorithm may go into a lockstep with the out-of-sequence handling function, where the synch mechanism is adding a packet to the deferred queue, while the out-of-sequence handling is retrieving it again, thus ending up in a loop inside the node_lock scope. - Even if this is avoided, the link will very often send out unnecessary protocol messages, in the worst case leading to redundant retransmissions. We fix this by just dropping arriving packets on the upcoming link during the synchronization phase, thus relying on the retransmission protocol to resolve the situation once the two links have arrived to a synchronized state. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-29 15:08:59 -04:00
Nicolas Dichtel	f2f67390a4	tipc: remove wrong use of NLM_F_MULTI NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: `35b9dd7607` ("tipc: add bearer get/dump to new netlink api") Fixes: `7be57fc691` ("tipc: add link get/dump to new netlink api") Fixes: `46f15c6794` ("tipc: add media get/dump to new netlink api") CC: Richard Alpe <richard.alpe@ericsson.com> CC: Jon Maloy <jon.maloy@ericsson.com> CC: Ying Xue <ying.xue@windriver.com> CC: tipc-discussion@lists.sourceforge.net Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-29 14:59:17 -04:00
Erik Hugne	73a3173773	tipc: fix node refcount issue When link statistics is dumped over netlink, we iterate over the list of peer nodes and append each links statistics to the netlink msg. In the case where the dump is resumed after filling up a nlmsg, the node refcnt is decremented without having been incremented previously which may cause the node reference to be freed. When this happens, the following info/stacktrace will be generated, followed by a crash or undefined behavior. We fix this by removing the erroneous call to tipc_node_put inside the loop that iterates over nodes. [ 384.312303] INFO: trying to register non-static key. [ 384.313110] the code is fine but needs lockdep annotation. [ 384.313290] turning off the locking correctness validator. [ 384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13 [ 384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 384.313290] ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007 [ 384.313290] ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200 [ 384.313290] ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001 [ 384.313290] Call Trace: [ 384.313290] <IRQ> [<ffffffff8170adf1>] dump_stack+0x4c/0x65 [ 384.313290] [<ffffffff810a6a6d>] __lock_acquire+0xf3d/0xf50 [ 384.313290] [<ffffffff810a7375>] lock_acquire+0xd5/0x290 [ 384.313290] [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc] [ 384.313290] [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc] [ 384.313290] [<ffffffff81712890>] _raw_spin_lock_bh+0x40/0x80 [ 384.313290] [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc] [ 384.313290] [<ffffffffa0043e8c>] link_timeout+0x1c/0x170 [tipc] [ 384.313290] [<ffffffff810c4698>] call_timer_fn+0xb8/0x490 [ 384.313290] [<ffffffff810c45e0>] ? process_timeout+0x10/0x10 [ 384.313290] [<ffffffff810c5a2c>] run_timer_softirq+0x21c/0x420 [ 384.313290] [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc] [ 384.313290] [<ffffffff8105a954>] __do_softirq+0xf4/0x630 [ 384.313290] [<ffffffff8105afdd>] irq_exit+0x5d/0x60 [ 384.313290] [<ffffffff8103ade1>] smp_apic_timer_interrupt+0x41/0x50 [ 384.313290] [<ffffffff817144a0>] apic_timer_interrupt+0x70/0x80 [ 384.313290] <EOI> [<ffffffff8100db10>] ? default_idle+0x20/0x210 [ 384.313290] [<ffffffff8100db0e>] ? default_idle+0x1e/0x210 [ 384.313290] [<ffffffff8100e61a>] arch_cpu_idle+0xa/0x10 [ 384.313290] [<ffffffff81099803>] cpu_startup_entry+0x2c3/0x530 [ 384.313290] [<ffffffff810d2893>] ? clockevents_register_device+0x113/0x200 [ 384.313290] [<ffffffff81038b0f>] start_secondary+0x13f/0x170 Fixes: `8a0f6ebe84` ("tipc: involve reference counter for node structure") Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-23 11:50:34 -04:00
Erik Hugne	9871b27f67	tipc: fix random link reset problem In the function tipc_sk_rcv(), the stack variable 'err' is only initialized to TIPC_ERR_NO_PORT for the first iteration over the link input queue. If a chain of messages are received from a link, failure to lookup the socket for any but the first message will cause the message to bounce back out on a random link. We fix this by properly initializing err. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-23 11:50:34 -04:00
Ying Xue	def81f69bf	tipc: fix topology server broken issue When a new topology server is launched in a new namespace, its listening socket is inserted into the "init ns" namespace's socket hash table rather than the one owned by the new namespace. Although the socket's namespace is forcedly changed to the new namespace later, the socket is still stored in the socket hash table of "init ns" namespace. When a client created in the new namespace connects its own topology server, the connection is failed as its server's socket could not be found from its own namespace's socket table. If __sock_create() instead of original sock_create_kern() is used to create the server's socket through specifying an expected namesapce, the socket will be inserted into the specified namespace's socket table, thereby avoiding to the topology server broken issue. Fixes: `76100a8a64` ("tipc: fix netns refcnt leak") Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-23 11:50:34 -04:00
David Miller	79b16aadea	udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). That was we can make sure the output path of ipv4/ipv6 operate on the UDP socket rather than whatever random thing happens to be in skb->sk. Based upon a patch by Jiri Pirko. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>	2015-04-07 15:29:08 -04:00
Jon Paul Maloy	ed193ece26	tipc: simplify link mtu negotiation When a link is being established, the two endpoints advertise their respective interface MTU in the transmitted RESET and ACTIVATE messages. If there is any difference, the lower of the two MTUs will be selected for use by both endpoints. However, as a remnant of earlier attempts to introduce TIPC level routing. there also exists an MTU discovery mechanism. If an intermediate node has a lower MTU than the two endpoints, they will discover this through a bisectional approach, and finally adopt this MTU for common use. Since there is no TIPC level routing, and probably never will be, this mechanism doesn't make any sense, and only serves to make the link level protocol unecessarily complex. In this commit, we eliminate the MTU discovery algorithm,and fall back to the simple MTU advertising approach. This change is fully backwards compatible. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:27:12 -04:00
Jon Paul Maloy	dff29b1a88	tipc: eliminate delayed link deletion at link failover When a bearer is disabled manually, all its links have to be reset and deleted. However, if there is a remaining, parallel link ready to take over a deleted link's traffic, we currently delay the delete of the removed link until the failover procedure is finished. This is because the remaining link needs to access state from the reset link, such as the last received packet number, and any partially reassembled buffer, in order to perform a successful failover. In this commit, we do instead move the state data over to the new link, so that it can fulfill the procedure autonomously, without accessing any data on the old link. This means that we can now proceed and delete all pertaining links immediately when a bearer is disabled. This saves us from some unnecessary complexity in such situations. We also choose to change the confusing definitions CHANGEOVER_PROTOCOL, ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL, FAILOVER_MSG and SYNCH_MSG respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:27:12 -04:00
Jon Paul Maloy	2da7142516	tipc: drop tunneled packet duplicates at reception In commit `8b4ed8634f` ("tipc: eliminate race condition at dual link establishment") we introduced a parallel link synchronization mechanism that guarentees sequential delivery even for users switching from an old to a newly established link. The new mechanism makes it unnecessary to deliver the tunneled duplicate packets back to the old link, as we are currently doing. It is now sufficient to use the last tunneled packet's inner sequence number as synchronization point between the two parallel links, whereafter it can be dropped. In this commit, we drop the duplicate packets arriving on the new link, after updating the synchronization point at each new arrival. Although it would now have been sufficient for the other endpoint to only tunnel the last packet in its send queue, and not the entire queue, we must still do this to maintain compatibility with older nodes. This commit makes it possible to get rid if some complex interaction between the two parallel links. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:27:12 -04:00
David S. Miller	9f0d34bc34	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/asix_common.c drivers/net/usb/sr9800.c drivers/net/usb/usbnet.c include/linux/usb/usbnet.h net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c The TCP conflicts were overlapping changes. In 'net' we added a READ_ONCE() to the socket cached RX route read, whilst in 'net-next' Eric Dumazet touched the surrounding code dealing with how mini sockets are handled. With USB, it's a case of the same bug fix first going into net-next and then I cherry picked it back into net. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:16:53 -04:00
Ying Xue	7e43690578	tipc: fix a slab object leak When remove TIPC module, there is a warning to remind us that a slab object is leaked like: root@localhost:~# rmmod tipc [ 19.056226] ============================================================================= [ 19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close() [ 19.058736] ----------------------------------------------------------------------------- [ 19.058736] [ 19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080 [ 19.061915] INFO: Object 0xffff880014668000 @offset=0 [ 19.062717] kmem_cache_destroy TIPC: Slab cache still has objects This is because the listening socket of TIPC topology server is not closed before TIPC proto handler is unregistered with proto_unregister(). However, as the socket is closed in tipc_exit_net() which is called by unregister_pernet_subsys() during unregistering TIPC namespace operation, the warning can be eliminated if calling unregister_pernet_subsys() is moved before calling proto_unregister(). Fixes: `e05b31f4bf` ("tipc: make tipc socket support net namespace") Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-31 23:10:08 -04:00
Jon Paul Maloy	d482994fca	tipc: fix two bugs in secondary destination lookup A message sent to a node after a successful name table lookup may still find that the destination socket has disappeared, because distribution of name table updates is non-atomic. If so, the message will be rejected back to the sender with error code TIPC_ERR_NO_PORT. If the source socket of the message has disappeared in the meantime, the message should be dropped. However, in the currrent code, the message will instead be subject to an unwanted tertiary lookup, because the function tipc_msg_lookup_dest() doesn't check if there is an error code present in the message before performing the lookup. In the worst case, the message may now find the old destination again, and be redirected once more, instead of being dropped directly as it should be. A second bug in this function is that the "prev_node" field in the message is not updated after successful lookup, something that may have unpredictable consequences. The problems arising from those bugs occur very infrequently. The third change in this function; the test on msg_reroute_msg_cnt() is purely cosmetic, reflecting that the returned value never can be negative. This commit corrects the two bugs described above. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-29 13:47:36 -07:00
Ying Xue	8a0f6ebe84	tipc: involve reference counter for node structure TIPC node hash node table is protected with rcu lock on read side. tipc_node_find() is used to look for a node object with node address through iterating the hash node table. As the entire process of what tipc_node_find() traverses the table is guarded with rcu read lock, it's safe for us. However, when callers use the node object returned by tipc_node_find(), there is no rcu read lock applied. Therefore, this is absolutely unsafe for callers of tipc_node_find(). Now we introduce a reference counter for node structure. Before tipc_node_find() returns node object to its caller, it first increases the reference counter. Accordingly, after its caller used it up, it decreases the counter again. This can prevent a node being used by one thread from being freed by another thread. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-29 12:40:28 -07:00
Ying Xue	b952b2befb	tipc: fix potential deadlock when all links are reset [ 60.988363] ====================================================== [ 60.988754] [ INFO: possible circular locking dependency detected ] [ 60.989152] 3.19.0+ #194 Not tainted [ 60.989377] ------------------------------------------------------- [ 60.989781] swapper/3/0 is trying to acquire lock: [ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.990743] [ 60.990743] but task is already holding lock: [ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.991738] [ 60.991738] which lock already depends on the new lock. [ 60.991738] [ 60.992174] [ 60.992174] the existing dependency chain (in reverse order) is: [ 60.992174] -> #1 (&(&bclink->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc] [ 60.992174] [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc] [ 60.992174] [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc] [ 60.992174] [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc] [ 60.992174] [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] -> #0 (&(&n_ptr->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0 [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.992174] [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc] [ 60.992174] [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] [ 60.992174] other info that might help us debug this: [ 60.992174] [ 60.992174] Possible unsafe locking scenario: [ 60.992174] [ 60.992174] CPU0 CPU1 [ 60.992174] ---- ---- [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] [ 60.992174] * DEADLOCK * [ 60.992174] [ 60.992174] 3 locks held by swapper/3/0: [ 60.992174] #0: (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980 [ 60.992174] #1: (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc] [ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] The correct the sequence of grabbing n_ptr->lock and bclink->lock should be that the former is first held and the latter is then taken, which exactly happened on CPU1. But especially when the retransmission of broadcast link is failed, bclink->lock is first held in tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure() called by tipc_link_retransmit() subsequently, which is demonstrated on CPU0. As a result, deadlock occurs. If the order of holding the two locks happening on CPU0 is reversed, the deadlock risk will be relieved. Therefore, the node lock taken in link_retransmit_failure() originally is moved to tipc_bclink_rcv() so that it's obtained before bclink lock. But the precondition of the adjustment of node lock is that responding to bclink reset event must be moved from tipc_bclink_unlock() to tipc_node_unlock(). Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-29 12:40:27 -07:00
Jon Paul Maloy	8b4ed8634f	tipc: eliminate race condition at dual link establishment Despite recent improvements, the establishment of dual parallel links still has a small glitch where messages can bypass each other. When the second link in a dual-link configuration is established, part of the first link's traffic will be steered over to the new link. Although we do have a mechanism to ensure that packets sent before and after the establishment of the new link arrive in sequence to the destination node, this is not enough. The arriving messages will still be delivered upwards in different threads, something entailing a risk of message disordering during the transition phase. To fix this, we introduce a synchronization mechanism between the two parallel links, so that traffic arriving on the new link cannot be added to its input queue until we are guaranteed that all pre-establishment messages have been delivered on the old, parallel link. This problem seems to always have been around, but its occurrence is so rare that it has not been noticed until recent intensive testing. Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-25 14:05:56 -04:00
Jon Paul Maloy	3127a0200d	tipc: clean up handling of link congestion After the recent changes in message importance handling it becomes possible to simplify handling of messages and sockets when we encounter link congestion. We merge the function tipc_link_cong() into link_schedule_user(), and simplify the code of the latter. The code should now be easier to follow, especially regarding return codes and handling of the message that caused the situation. In case the scheduling function is unable to pre-allocate a wakeup message buffer, it now returns -ENOBUFS, which is a more correct code than the previously used -EHOSTUNREACH. Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-25 14:05:56 -04:00
Jon Paul Maloy	1f66d161ab	tipc: introduce starvation free send algorithm Currently, we only use a single counter; the length of the backlog queue, to determine whether a message should be accepted to the queue or not. Each time a message is being sent, the queue length is compared to a threshold value for the message's importance priority. If the queue length is beyond this threshold, the message is rejected. This algorithm implies a risk of starvation of low importance senders during very high load, because it may take a long time before the backlog queue has decreased enough to accept a lower level message. We now eliminate this risk by introducing a counter for each importance priority. When a message is sent, we check only the queue level for that particular message's priority. If that is ok, the message can be added to the backlog, irrespective of the queue level for other priorities. This way, each level is guaranteed a certain portion of the total bandwidth, and any risk of starvation is eliminated. Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-25 14:05:56 -04:00
Ying Xue	bc14b8d6a9	tipc: fix a link reset issue due to retransmission failures When a node joins a cluster while we are transmitting a fragment stream over the broadcast link, it's missing the preceding fragments needed to build a meaningful message. As a result, the node has to drop it. However, as the fragment message is not acknowledged to its sender before it's dropped, it accidentally causes link reset of retransmission failure on the node. Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-25 11:43:32 -04:00
Thomas Graf	b5e2c150ac	rhashtable: Disable automatic shrinking by default Introduce a new bool automatic_shrinking to require the user to explicitly opt-in to automatic shrinking of tables. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-24 17:48:40 -04:00
Ying Xue	ed3e852aa5	tipc: fix compile error when IPV6=m and TIPC=y When IPV6=m and TIPC=y, below error will appear during building kernel image: net/tipc/udp_media.c:196: undefined reference to `ip6_dst_lookup' make: *** [vmlinux] Error 1 As ip6_dst_lookup() is implemented in IPV6 and IPV6 is compiled as module, ip6_dst_lookup() is not built-in core kernel image. As a result, compiler cannot find 'ip6_dst_lookup' reference while compiling TIPC code into core kernel image. But with the method introduced by commit `5f81bd2e5d` ("ipv6: export a stub for IPv6 symbols used by vxlan"), we can avoid the compile error through "ipv6_stub" pointer to access ip6_dst_lookup(). Fixes: `d0f91938be` ("tipc: add ip/udp media type") Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-24 15:20:29 -04:00
Sasha Levin	610600c8c5	tipc: validate length of sockaddr in connect() for dgram/rdm Commit `f2f8036` ("tipc: add support for connect() on dgram/rdm sockets") hasn't validated user input length for the sockaddr structure which allows a user to overwrite kernel memory with arbitrary input. Fixes: `f2f8036` ("tipc: add support for connect() on dgram/rdm sockets") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-24 12:50:39 -04:00
Herbert Xu	6d02294981	tipc: Use default rhashtable hashfn This patch removes the explicit jhash value for the hashfn parameter of rhashtable. The default is now jhash so removing the setting makes no difference apart from making one less copy of jhash in the kernel. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-23 22:07:51 -04:00
Herbert Xu	6cca7289d5	tipc: Use inlined rhashtable interface This patch converts tipc to the inlined rhashtable interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-20 16:16:24 -04:00
Marcelo Ricardo Leitner	446981e5fc	tipc: fix build issue when building without IPv6 We can't directly call ipv6_sock_mc_join() but should use the stub instead and protect it around IS_ENABLED. Fixes: `d0f91938be` ("tipc: add ip/udp media type") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-19 16:06:27 -04:00
Erik Hugne	f2f8036e39	tipc: add support for connect() on dgram/rdm sockets Following the example of ip4_datagram_connect, we store the address in the socket structure for dgram/rdm sockets and use that as the default destination for subsequent send() calls. It is allowed to connect to any address types, and the behaviour of send() will be the same as a normal sendto() with this address provided. Binding to an AF_UNSPEC address clears the association. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-19 12:25:54 -04:00
Erik Hugne	3bd88ee7a2	tipc: do not report -EHOSTUNREACH for failed local delivery Since commit `1186adf7df` ("tipc: simplify message forwarding and rejection in socket layer") -EHOSTUNREACH is propagated back to the sending process if we fail to deliver the message to another socket local to the node. This is wrong, host unreachable should only be reported when the destination port/name does not exist in the cluster, and that check is always done before sending the message. Also, this introduces inconsistent sendmsg() behavior for local/remote destinations. Errors occurring on the receiving side should not trickle up to the sender. If message delivery fails TIPC should either discard the packet or reject it back to the sender based on the destination droppable option. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-19 12:25:54 -04:00
Erik Hugne	18d6c58415	tipc: remove redundant call to tipc_node_remove_conn tipc_node_remove_conn may be called twice if shutdown() is called on a socket that have messages in the receive queue. Calling this function twice does no harm, but is unnecessary and we remove the redundant call. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-19 12:25:54 -04:00
Marcelo Ricardo Leitner	54ff9ef36b	ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} in favor of their inner __ ones, which doesn't grab rtnl. As these functions need to operate on a locked socket, we can't be grabbing rtnl by then. It's too late and doing so causes reversed locking. So this patch: - move rtnl handling to callers instead while already fixing some reversed locking situations, like on vxlan and ipvs code. - renames __ ones to not have the __ mark: __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop} Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-18 22:05:09 -04:00
Herbert Xu	446c89ac1f	tipc: Use rhashtable max/min_size instead of max/min_shift This patch converts tipc to use rhashtable max/min_size instead of the obsolete max/min_shift. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-18 12:46:40 -04:00
Ying Xue	2b9bb7f338	tipc: withdraw tipc topology server name when namespace is deleted The TIPC topology server is a per namespace service associated with the tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn before we call sk_release_kernel because the kernel socket release is done in init_net and trying to withdraw a TIPC name published in another namespace will fail with an error as: [ 170.093264] Unable to remove local publication [ 170.093264] (type=1, lower=1, ref=2184244004, key=2184244005) We fix this by breaking the association between the topology server name and socket before calling sk_release_kernel. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-17 22:11:26 -04:00
Ying Xue	8460504bdd	tipc: fix a potential deadlock when nametable is purged [ 28.531768] ============================================= [ 28.532322] [ INFO: possible recursive locking detected ] [ 28.532322] 3.19.0+ #194 Not tainted [ 28.532322] --------------------------------------------- [ 28.532322] insmod/583 is trying to acquire lock: [ 28.532322] (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [ 28.532322] but task is already holding lock: [ 28.532322] (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc] [ 28.532322] [ 28.532322] other info that might help us debug this: [ 28.532322] Possible unsafe locking scenario: [ 28.532322] [ 28.532322] CPU0 [ 28.532322] ---- [ 28.532322] lock(&(&nseq->lock)->rlock); [ 28.532322] lock(&(&nseq->lock)->rlock); [ 28.532322] [ 28.532322] * DEADLOCK * [ 28.532322] [ 28.532322] May be due to missing lock nesting notation [ 28.532322] [ 28.532322] 3 locks held by insmod/583: [ 28.532322] #0: (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50 [ 28.532322] #1: (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc] [ 28.532322] #2: (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc] [ 28.532322] [ 28.532322] stack backtrace: [ 28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194 [ 28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.532322] ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007 [ 28.532322] ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998 [ 28.532322] ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38 [ 28.532322] Call Trace: [ 28.532322] [<ffffffff81792f3e>] dump_stack+0x4c/0x65 [ 28.532322] [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0 [ 28.532322] [<ffffffff810a4df3>] ? __bfs+0x23/0x270 [ 28.532322] [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0 [ 28.532322] [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0 [ 28.532322] [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 28.532322] [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 28.532322] [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc] [ 28.532322] [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc] [ 28.532322] [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc] [ 28.532322] [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc] [ 28.532322] [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc] [ 28.532322] [<ffffffff8163dece>] ops_init+0x4e/0x150 [ 28.532322] [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10 [ 28.532322] [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190 [ 28.532322] [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50 [ 28.532322] [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc] [ 28.532322] [<ffffffffa0024000>] ? 0xffffffffa0024000 [ 28.532322] [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0 [ 28.532322] [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0 [ 28.532322] [<ffffffff810e725b>] ? do_init_module+0x2b/0x200 [ 28.532322] [<ffffffff810e7294>] do_init_module+0x64/0x200 [ 28.532322] [<ffffffff810e9353>] load_module+0x12f3/0x18e0 [ 28.532322] [<ffffffff810e5890>] ? show_initstate+0x50/0x50 [ 28.532322] [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110 [ 28.532322] [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to remove a publication with a name sequence, the name sequence's lock is held. However, when tipc_nametbl_remove_publ() calling tipc_nameseq_remove_publ() to remove the publication, it first tries to query name sequence instance with the publication, and then holds the lock of the found name sequence. But as the lock may be already taken in tipc_purge_publications(), deadlock happens like above scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name sequence's lock, the deadlock can be avoided if it's directly invoked by tipc_purge_publications(). Fixes: `97ede29e80` ("tipc: convert name table read-write lock to RCU") Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-17 22:11:26 -04:00
Ying Xue	76100a8a64	tipc: fix netns refcnt leak When the TIPC module is loaded, we launch a topology server in kernel space, which in its turn is creating TIPC sockets for communication with topology server users. Because both the socket's creator and provider reside in the same module, it is necessary that the TIPC module's reference count remains zero after the server is started and the socket created; otherwise it becomes impossible to perform "rmmod" even on an idle module. Currently, we achieve this by defining a separate "tipc_proto_kern" protocol struct, that is used only for kernel space socket allocations. This structure has the "owner" field set to NULL, which restricts the module reference count from being be bumped when sk_alloc() for local sockets is called. Furthermore, we have defined three kernel-specific functions, tipc_sock_create_local(), tipc_sock_release_local() and tipc_sock_accept_local(), to avoid the module counter being modified when module local sockets are created or deleted. This has worked well until we introduced name space support. However, after name space support was introduced, we have observed that a reference count leak occurs, because the netns counter is not decremented in tipc_sock_delete_local(). This commit remedies this problem. But instead of just modifying tipc_sock_delete_local(), we eliminate the whole parallel socket handling infrastructure, and start using the regular sk_create_kern(), kernel_accept() and sk_release_kernel() calls. Since those functions manipulate the module counter, we must now compensate for that by explicitly decrementing the counter after module local sockets are created, and increment it just before calling sk_release_kernel(). Fixes: `a62fbccecd` ("tipc: make subscriber server support net namespace") Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Cong Wang <cwang@twopensource.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-17 22:11:26 -04:00
Jon Paul Maloy	e3eea1eb47	tipc: clean up handling of message priorities Messages transferred by TIPC are assigned an "importance priority", -an integer value indicating how to treat the message when there is link or destination socket congestion. There is no separate header field for this value. Instead, the message user values have been chosen in ascending order according to perceived importance, so that the message user field can be used for this. This is not a good solution. First, we have many more users than the needed priority levels, so we end up with treating more priority levels than necessary. Second, the user field cannot always accurately reflect the priority of the message. E.g., a message fragment packet should really have the priority of the enveloped user data message, and not the priority of the MSG_FRAGMENTER user. Until now, we have been working around this problem in different ways, but it is now time to implement a consistent way of handling such priorities, although still within the constraint that we cannot allocate any more bits in the regular data message header for this. In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE, that will be the only one used apart from the four (lower) user data levels. All non-data messages map down to this priority. Furthermore, we take some free bits from the MSG_FRAGMENTER header and allocate them to store the priority of the enveloped message. We then adjust the functions msg_importance()/msg_set_importance() so that they read/set the correct header fields depending on user type. This small protocol change is fully compatible, because the code at the receiving end of a link currently reads the importance level only from user data messages, where there is no change. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	05dcc5aa4d	tipc: split link outqueue struct tipc_link contains one single queue for outgoing packets, where both transmitted and waiting packets are queued. This infrastructure is hard to maintain, because we need to keep a number of fields to keep track of which packets are sent or unsent, and the number of packets in each category. A lot of code becomes simpler if we split this queue into a transmission queue, where sent/unacknowledged packets are kept, and a backlog queue, where we keep the not yet sent packets. In this commit we do this separation. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	2cdf3918e4	tipc: eliminate unnecessary call to broadcast ack function The unicast packet header contains a broadcast acknowledge sequence number, that may need to be conveyed to the broadcast link for proper treatment. Currently, the function tipc_rcv(), which is on the most critical data path, calls the function tipc_bclink_acknowledge() to have this done. This call is made for each received packet, and results in the unconditional grabbing of the broadcast link spinlock. This is unnecessary, since we can see directly from tipc_rcv() if the acknowledged number differs from what has been previously acked from the node in question. In the vast majority of cases the numbers won't differ, and there is nothing to update. We now make the call to tipc_bclink_acknowledge() conditional to that the two ack values differ. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	c1336ee472	tipc: extract bundled buffers by cloning instead of copying When we currently extract a bundled buffer from a message bundle in the function tipc_msg_extract(), we allocate a new buffer and explicitly copy the linear data area. This is unnecessary, since we can just clone the buffer and do skb_pull() on the clone to move the data pointer to the correct position. This is what we do in this commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	1149557d64	tipc: eliminate unnecessary linearization of incoming buffers Currently, TIPC linearizes all incoming buffers directly at reception before passing them upwards in the stack. This is clearly a waste of CPU resources, and must be avoided. In this commit, we eliminate this unnecessary linearization. We still ensure that at least the message header is linear, and that the buffer is linearized where this is still needed, i.e. when unbundling and when reversing messages. In addition, we ensure that fragmented messages are validated after reassembly before delivering them upwards in the stack. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	cf2157f88a	tipc: move message validation function to msg.c The function link_buf_validate() is in reality re-entrant and context independent, and will in later commits be called from several locations. Therefore, we move it to msg.c, make it outline and rename the it to tipc_msg_validate(). We also redesign the function to make proper use of pskb_may_pull() Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	7764d6e83d	tipc: add framework for node capabilities exchange The TIPC protocol spec has defined a 13 bit capability bitmap in the neighbor discovery header, as a means to maintain compatibility between different code and protocol generations. Until now this field has been unused. We now introduce the basic framework for exchanging capabilities between nodes at first contact. After exchange, a peer node's capabilities are stored as a 16 bit bitmap in struct tipc_node. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-14 14:38:32 -04:00
Jon Paul Maloy	169bf9121b	tipc: ensure that idle links are deleted when a bearer is disabled commit `afaa3f65f6` (tipc: purge links when bearer is disabled) was an attempt to resolve a problem that turned out to have a more profound reason. When we disable a bearer, we delete all its pertaining links if there is no other bearer to perform failover to, or if the module is shutting down. In case there are dual bearers, we wait with deleting links until the failover procedure is finished. However, this misses the case when a link on the removed bearer was already down, so that there will be no failover procedure to finish the link delete. This causes confusion if a new bearer is added to replace the removed one, and also entails a small memory leak. This commit takes the current state of the link into account when deciding when to delete it, and also reverses the above-mentioned commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-10 18:37:36 -04:00
David S. Miller	3cef5c5b0b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/cadence/macb.c Overlapping changes in macb driver, mostly fixes and cleanups in 'net' overlapping with the integration of at91_ether into macb in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 23:38:02 -04:00
Jon Paul Maloy	e6441bae32	tipc: fix bug in link failover handling In commit `c637c10355` ("tipc: resolve race problem at unicast message reception") we introduced a new mechanism for delivering buffers upwards from link to socket layer. That code contains a bug in how we handle the new link input queue during failover. When a link is reset, some of its users may be blocked because of congestion, and in order to resolve this, we add any pending wakeup pseudo messages to the link's input queue, and deliver them to the socket. This misses the case where the other, remaining link also may have congested users. Currently, the owner node's reference to the remaining link's input queue is unconditionally overwritten by the reset link's input queue. This has the effect that wakeup events from the remaining link may be unduely delayed (but not lost) for a potentially long period. We fix this by adding the pending events from the reset link to the input queue that is currently referenced by the node, whichever one it is. This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 16:20:41 -04:00
Erik Hugne	143fe22f50	tipc: fix inconsistent signal handling regression Commit `9bbb4ecc68` ("tipc: standardize recvmsg routine") changed the sleep/wakeup behaviour for sockets entering recv() or accept(). In this process the order of reporting -EAGAIN/-EINTR was reversed. This caused problems with wrong errno being reported back if the timeout expires. The same problem happens if the socket is nonblocking and recv()/accept() is called when the process have pending signals. If there is no pending data read or connections to accept, -EINTR will be returned instead of -EAGAIN. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reported-by László Benedek <laszlo.benedek@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:42:19 -04:00
Erik Hugne	4fee6be813	tipc: sparse: fix htons conversion warnings Commit `d0f91938be` ("tipc: add ip/udp media type") introduced some new sparse warnings. Clean them up. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-09 15:37:04 -04:00
Erik Hugne	d0f91938be	tipc: add ip/udp media type The ip/udp bearer can be configured in a point-to-point mode by specifying both local and remote ip/hostname, or it can be enabled in multicast mode, where links are established to all tipc nodes that have joined the same multicast group. The multicast IP address is generated based on the TIPC network ID, but can be overridden by using another multicast address as remote ip. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:08:42 -05:00
Erik Hugne	948fa2d115	tipc: increase size of tipc discovery messages The payload area following the TIPC discovery message header is an opaque area defined by the media. INT_H_SIZE was enough for Ethernet/IB/IPv4 but needs to be expanded to carry IPv6 addressing information. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-05 22:08:42 -05:00
David S. Miller	71a83a6db6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/rocker/rocker.c The rocker commit was two overlapping changes, one to rename the ->vport member to ->pport, and another making the bitmask expression use '1ULL' instead of plain '1'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 21:16:48 -05:00
Ying Xue	1b78414047	net: Remove iocb argument from sendmsg and recvmsg After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 13:06:31 -05:00
Ying Xue	39a0295f90	tipc: Don't use iocb argument in socket layer Currently the iocb argument is used to idenfiy whether or not socket lock is hold before tipc_sendmsg()/tipc_send_stream() is called. But this usage prevents iocb argument from being dropped through sendmsg() at socket common layer. Therefore, in the commit we introduce two new functions called __tipc_sendmsg() and __tipc_send_stream(). When they are invoked, it assumes that their callers have taken socket lock, thereby avoiding the weird usage of iocb argument. Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-02 13:06:31 -05:00
Erik Hugne	d76a436d50	tipc: make media address offset a common define With the exception of infiniband media which does not use media offsets, the media address is always located at offset 4 in the media info field as defined by the protocol, so we move the definition to the generic bearer.h Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:48 -05:00
Erik Hugne	91e2eb5684	tipc: rename media/msg related definitions The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names are misleading, as they actually define the size and offset of the whole media info field and not the address part. This patch does not have any functional changes. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:48 -05:00
Erik Hugne	afaa3f65f6	tipc: purge links when bearer is disabled If a bearer is disabled by manual intervention, all links over that bearer should be purged, indicated with the 'shutting_down' flag. Otherwise tipc will get confused if a new bearer is enabled using a different media type. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:47 -05:00
Erik Hugne	7fe8097cef	tipc: fix nullpointer bug when subscribing to events If a subscription request is sent to a topology server connection, and any error occurs (malformed request, oom or limit reached) while processing this request, TIPC should terminate the subscriber connection. While doing so, it tries to access fields in an already freed (or never allocated) subscription element leading to a nullpointer exception. We fix this by removing the subscr_terminate function and terminate the connection immediately upon any subscription failure. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:47 -05:00
Erik Hugne	3622c36f37	tipc: only create header copy for name distr messages The TIPC name distributor pushes topology updates to the cluster neighbors. Currently this is done in a unicast manner, and the skb holding the update is cloned for each cluster member. This is unnecessary, as we only modify the destnode field in the header so we change it to do pskb_copy instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 18:18:47 -05:00
Daniel Borkmann	4c4b52d9b2	rhashtable: remove indirection for grow/shrink decision functions Currently, all real users of rhashtable default their grow and shrink decision functions to rht_grow_above_75() and rht_shrink_below_30(), so that there's currently no need to have this explicitly selectable. It can/should be generic and private inside rhashtable until a real use case pops up. Since we can make this private, we'll save us this additional indirection layer and can improve insertion/deletion time as well. Reference: http://patchwork.ozlabs.org/patch/443040/ Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-27 16:06:02 -05:00
Richard Alpe	941787b829	tipc: remove tipc_snprintf tipc_snprintf() was heavily utilized by the old netlink API which no longer exists (now netlink compat). In this patch we swap tipc_snprintf() to the identical scnprintf() in the only remaining occurrence. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	22ae7cff50	tipc: nl compat add noop and remove legacy nl framework Add TIPC_CMD_NOOP to compat layer and remove the old framework. All legacy nl commands are now converted to the compat layer in netlink_compat.c. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	5a81a6377b	tipc: convert legacy nl stats show to nl compat Convert TIPC_CMD_SHOW_STATS to compat layer. This command does not have any counterpart in the new API, meaning it now solely exists as a function in the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	3c26181c5b	tipc: convert legacy nl net id get to nl compat Convert TIPC_CMD_GET_NETID to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	964f9501c1	tipc: convert legacy nl net id set to nl compat Convert TIPC_CMD_SET_NETID to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	d7cc75d3cb	tipc: convert legacy nl node addr set to nl compat Convert TIPC_CMD_SET_NODE_ADDR to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	4b28cb581d	tipc: convert legacy nl node dump to nl compat Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	5bfc335a63	tipc: convert legacy nl media dump to nl compat Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	487d2a3a13	tipc: convert legacy nl socket dump to nl compat Convert socket (port) listing to compat dumpit call. If a socket (port) has publications a second dumpit call is issued to collect them and format then into the legacy buffer before continuing to process the sockets (ports). Command converted in this patch: TIPC_CMD_SHOW_PORTS Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	44a8ae94fd	tipc: convert legacy nl name table dump to nl compat Add functionality for printing a dump header and convert TIPC_CMD_SHOW_NAME_TABLE to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	1817877b3c	tipc: convert legacy nl link stat reset to nl compat Convert TIPC_CMD_RESET_LINK_STATS to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	37e2d4843f	tipc: convert legacy nl link prop set to nl compat Convert setting of link proprieties to compat doit calls. Commands converted in this patch: TIPC_CMD_SET_LINK_TOL TIPC_CMD_SET_LINK_PRI TIPC_CMD_SET_LINK_WINDOW Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	357ebdbfca	tipc: convert legacy nl link dump to nl compat Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	f2b3b2d4cc	tipc: convert legacy nl link stat to nl compat Add functionality for safely appending string data to a TLV without keeping write count in the caller. Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	9ab154658a	tipc: convert legacy nl bearer enable/disable to nl compat Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	d0796d1ef6	tipc: convert legacy nl bearer dump to nl compat Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	bfb3e5dd8d	tipc: move and rename the legacy nl api to "nl compat" The new netlink API is no longer "v2" but rather the standard API and the legacy API is now "nl compat". We split them into separate start/stop and put them in different files in order to further distinguish them. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Jon Paul Maloy	51a00daf73	tipc: fix bug in socket reception function In commit `c637c10355` ("tipc: resolve race problem at unicast message reception") we introduced a time limit for how long the function tipc_sk_eneque() would be allowed to execute its loop. Unfortunately, the test for when this limit is passed was put in the wrong place, resulting in a lost message when the test is true. We fix this by moving the test to before we dequeue the next buffer from the input queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 13:09:25 -08:00
Jon Paul Maloy	cb1b728096	tipc: eliminate race condition at multicast reception In a previous commit in this series we resolved a race problem during unicast message reception. Here, we resolve the same problem at multicast reception. We apply the same technique: an input queue serializing the delivery of arriving buffers. The main difference is that here we do it in two steps. First, the broadcast link feeds arriving buffers into the tail of an arrival queue, which head is consumed at the socket level, and where destination lookup is performed. Second, if the lookup is successful, the resulting buffer clones are fed into a second queue, the input queue. This queue is consumed at reception in the socket just like in the unicast case. Both queues are protected by the same lock, -the one of the input queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:03 -08:00
Jon Paul Maloy	3c724acdd5	tipc: simplify socket multicast reception The structure 'tipc_port_list' is used to collect port numbers representing multicast destination socket on a receiving node. The list is not based on a standard linked list, and is in reality optimized for the uncommon case that there are more than one multicast destinations per node. This makes the list handling unecessarily complex, and as a consequence, even the socket multicast reception becomes more complex. In this commit, we replace 'tipc_port_list' with a new 'struct tipc_plist', which is based on a standard list. We give the new list stack (push/pop) semantics, someting that simplifies the implementation of the function tipc_sk_mcast_rcv(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:03 -08:00
Jon Paul Maloy	708ac32cb5	tipc: simplify connection abort notifications when links break The new input message queue in struct tipc_link can be used for delivering connection abort messages to subscribing sockets. This makes it possible to simplify the code for such cases. This commit removes the temporary list in tipc_node_unlock() used for transforming abort subscriptions to messages. Instead, the abort messages are now created at the moment of lost contact, and then added to the last failed link's generic input queue for delivery to the sockets concerned. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	c637c10355	tipc: resolve race problem at unicast message reception TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upcall from link to socket no locks are held. It is therefore possible, and we see it happen occasionally, that messages arriving in different threads and delivered in sequence still bypass each other before they reach the destination socket. This must not happen, since it violates the sequentiality guarantee. We solve this by adding a new input buffer queue to the link structure. Arriving messages are added safely to the tail of that queue by the link, while the head of the queue is consumed, also safely, by the receiving socket. Sequentiality is secured per socket by only allowing buffers to be dequeued inside the socket lock. Since there may be multiple simultaneous readers of the queue, we use a 'filter' parameter to reduce the risk that they peek the same buffer from the queue, hence also reducing the risk of contention on the receiving socket locks. This solves the sequentiality problem, and seems to cause no measurable performance degradation. A nice side effect of this change is that lock handling in the functions tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that will enable future simplifications of those functions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	94153e36e7	tipc: use existing sk_write_queue for outgoing packet chain The list for outgoing traffic buffers from a socket is currently allocated on the stack. This forces us to initialize the queue for each sent message, something costing extra CPU cycles in the most critical data path. Later in this series we will introduce a new safe input buffer queue, something that would force us to initialize even the spinlock of the outgoing queue. A closer analysis reveals that the queue always is filled and emptied within the same lock_sock() session. It is therefore safe to use a queue aggregated in the socket itself for this purpose. Since there already exists a queue for this in struct sock, sk_write_queue, we introduce use of that queue in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	e3a77561e7	tipc: split up function tipc_msg_eval() The function tipc_msg_eval() is in reality doing two related, but different tasks. First it tries to find a new destination for named messages, in case there was no first lookup, or if the first lookup failed. Second, it does what its name suggests, evaluating the validity of the message and its destination, and returning an appropriate error code depending on the result. This is confusing, and in this commit we choose to break it up into two functions. A new function, tipc_msg_lookup_dest(), first attempts to find a new destination, if the message is of the right type. If this lookup fails, or if the message should not be subject to a second lookup, the already existing tipc_msg_reverse() is called. This function performs prepares the message for rejection, if applicable. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	d570d86497	tipc: enqueue arrived buffers in socket in separate function The code for enqueuing arriving buffers in the function tipc_sk_rcv() contains long code lines and currently goes to two indentation levels. As a cosmetic preparaton for the next commits, we break it out into a separate function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	1186adf7df	tipc: simplify message forwarding and rejection in socket layer Despite recent improvements, the handling of error codes and return values at reception of messages in the socket layer is still confusing. In this commit, we try to make it more comprehensible. First, we separate between the return values coming from the functions called by tipc_sk_rcv(), -those are TIPC specific error codes, and the return values returned by tipc_sk_rcv() itself. Second, we don't use the returned TIPC error code as indication for whether a buffer should be forwarded/rejected or not; instead we use the buffer pointer passed along with filter_msg(). This separation is necessary because we sometimes want to forward messages even when there is no error (i.e., protocol messages and successfully secondary looked up data messages). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:01 -08:00
Jon Paul Maloy	c5898636c4	tipc: reduce usage of context info in socket and link The most common usage of namespace information is when we fetch the own node addess from the net structure. This leads to a lot of passing around of a parameter of type 'struct net *' between functions just to make them able to obtain this address. However, in many cases this is unnecessary. The own node address is readily available as a member of both struct tipc_sock and tipc_link, and can be fetched from there instead. The fact that the vast majority of functions in socket.c and link.c anyway are maintaining a pointer to their respective base structures makes this option even more compelling. In this commit, we introduce the inline functions tsk_own_node() and link_own_node() to make it easy for functions to fetch the node address from those structs instead of having to pass along and dereference the namespace struct. In particular, we make calls to the msg_xx() functions in msg.{h,c} context independent by directly passing them the own node address as parameter when needed. Those functions should be regarded as leaves in the code dependency tree, and it is hence desirable to keep them namspace unaware. Apart from a potential positive effect on cache behavior, these changes make it easier to introduce the changes that will follow later in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:01 -08:00
David S. Miller	f2683b743f	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs More iov_iter work from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-04 20:46:55 -08:00
Jon Paul Maloy	af9946fde9	tipc: separate link starting event from link timeout event When a new link instance is created, it is trigged to start by sending it a TIPC_STARTING_EVT, whereafter a regular link reset is applied to it. The starting event is codewise treated as a timeout event, and prompts a link RESET message to be sent to the peer node, carrying a link session identifier. The later link_reset() call nudges this session identifier, whereafter all subsequent RESET messages will be sent out with the new identifier. The latter session number overrides the former, causing the peer to unconditionally accept it irrespective of its current working state. We don't think that this causes any problem, but it is not in accordance with the protocol spec, and may cause confusion when debugging TIPC sessions. To avoid this, we make the starting event distinct from the subsequent timeout events, by not allowing the former to send out any RESET message. This eliminates the described problem. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-04 16:09:31 -08:00
Jon Paul Maloy	b45db71b52	tipc: eliminate race during node creation Instances of struct node are created in the function tipc_disc_rcv() under the assumption that there is no race between received discovery messages arriving from the same node. This assumption is wrong. When we use more than one bearer, it is possible that discovery messages from the same node arrive at the same moment, resulting in creation of two instances of struct tipc_node. This may later cause confusion during link establishment, and may result in one of the links never becoming activated. We fix this by making lookup and potential creation of nodes atomic. Instead of first looking up the node, and in case of failure, create it, we now start with looking up the node inside node_link_create(), and return a reference to that one if found. Otherwise, we go ahead and create the node as we did before. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-04 16:09:31 -08:00
Jon Paul Maloy	7d24dcdb3f	tipc: avoid stale link after aborted failover During link failover it may happen that the remaining link goes down while it is still in the process of taking over traffic from a previously failed link. When this happens, we currently abort the failover procedure and reset the first failed link to non-failover mode, so that it will be ready to re-establish contact with its peer when it comes available. However, if the first link goes down because its bearer was manually disabled, it is not enough to reset it; it must also be deleted; which is supposed to happen when the failover procedure is finished. Otherwise it will remain a zombie link: attached to the owner node structure, in mode LINK_STOPPED, and permanently blocking any re- establishing of the link to the peer via the interface in question. We fix this by amending the failover abort procedure. Apart from resetting the link to non-failover state, we test if the link is also in LINK_STOPPED mode. If so, we delete it, using the conditional tipc_link_delete() function introduced in the previous commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-04 16:09:31 -08:00
Jon Paul Maloy	2d72d49553	tipc: add reference count to struct tipc_link When a bearer is disabled, all pertaining links will be reset and deleted. However, if there is a second active link towards a killed link's destination, the delete has to be postponed until the failover is finished. During this interval, we currently put the link in zombie mode, i.e., we take it out of traffic, delete its timer, but leave it attached to the owner node structure until all missing packets have been received. When this is done, we detach the link from its node and delete it, assuming that the synchronous timer deletion that was initiated earlier in a different thread has finished. This is unsafe, as the failover may finish before del_timer_sync() has returned in the other thread. We fix this by adding an atomic reference counter of type kref in struct tipc_link. The counter keeps track of the references kept to the link by the owner node and the timer. We then do a conditional delete, based on the reference counter, both after the failover has been finished and when the timer expires, if applicable. Whoever comes last, will actually delete the link. This approach also implies that we can make the deletion of the timer asynchronous. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-04 16:09:31 -08:00
Al Viro	f25dcc7687	tipc: tipc ->sendmsg() conversion This one needs to copy the same data from user potentially more than once. Sadly, MTU changes can trigger that ;-/ Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-02-04 01:34:15 -05:00
Erik Hugne	3fa9cacd69	tipc: fix excessive network event logging If a large number of namespaces is spawned on a node and TIPC is enabled in each of these, the excessive printk tracing of network events will cause the system to grind down to a near halt. The traces are still of debug value, so instead of removing them completely we fix it by changing the link state and node availability logging debug traces. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-26 16:58:08 -08:00
Richard Alpe	d6e164e321	tipc: fix socket list regression in new nl api Commit `07f6c4bc` (tipc: convert tipc reference table to use generic rhashtable) introduced a problem with port listing in the new netlink API. It broke the resume functionality resulting in a never ending loop. This was caused by starting with the first hash table every time subsequently never returning an empty skb (terminating). This patch fixes the resume mechanism by keeping a logical reference to the last hash table along with a logical reference to the socket (port) that didn't fit in the previous message. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-18 00:27:05 -05:00
Sasha Levin	357c4774b5	tipc: correctly handle releasing a not fully initialized sock Commit `f2f9800d49` "tipc: make tipc node table aware of net namespace" has added a dereference of sock->sk before making sure it's not NULL, which makes releasing a tipc socket NULL pointer dereference for sockets that are not fully initialized. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-13 17:26:27 -05:00
Ying Xue	3721e9c7c1	tipc: remove redundant timer defined in tipc_sock struct Remove the redundant timer defined in tipc_sock structure, instead we can directly reuse the sk_timer defined in sock structure. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-13 16:45:55 -05:00
Ying Xue	d49e204161	tipc: make netlink support net namespace Currently tipc module only allows users sitting on "init_net" namespace to configure it through netlink interface. But now almost each tipc component is able to be aware of net namespace, so it's time to open the permission for users residing in other namespaces, allowing them to configure their own tipc stack instance through netlink interface. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:34 -05:00
Ying Xue	bafa29e341	tipc: make tipc random value aware of net namespace After namespace is supported, each namespace should own its private random value. So the global variable representing the random value must be moved to tipc_net structure. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:33 -05:00
Ying Xue	a62fbccecd	tipc: make subscriber server support net namespace TIPC establishes one subscriber server which allows users to subscribe their interesting name service status. After tipc supports namespace, one dedicated tipc stack instance is created for each namespace, and each instance can be deemed as one independent TIPC node. As a result, subscriber server must be built for each namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:33 -05:00
Ying Xue	3474753954	tipc: make tipc node address support net namespace If net namespace is supported in tipc, each namespace will be treated as a separate tipc node. Therefore, every namespace must own its private tipc node address. This means the "tipc_own_addr" global variable of node address must be moved to tipc_net structure to satisfy the requirement. It's turned out that users also can assign node address for every namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:33 -05:00
Ying Xue	4ac1c8d0ee	tipc: name tipc name table support net namespace TIPC name table is used to store the mapping relationship between TIPC service name and socket port ID. When tipc supports namespace, it allows users to publish service names only owned by a certain namespace. Therefore, every namespace must have its private name table to prevent service names published to one namespace from being contaminated by other service names in another namespace. Therefore, The name table global variable (ie, nametbl) and its lock must be moved to tipc_net structure, and a parameter of namespace must be added for necessary functions so that they can obtain name table variable defined in tipc_net structure. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:33 -05:00
Ying Xue	e05b31f4bf	tipc: make tipc socket support net namespace Now tipc socket table is statically allocated as a global variable. Through it, we can look up one socket instance with port ID, insert a new socket instance to the table, and delete a socket from the table. But when tipc supports net namespace, each namespace must own its specific socket table. So the global variable of socket table must be redefined in tipc_net structure. As a concequence, a new socket table will be allocated when a new namespace is created, and a socket table will be deallocated when namespace is destroyed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:33 -05:00
Ying Xue	1da465683a	tipc: make tipc broadcast link support net namespace TIPC broadcast link is statically established and its relevant states are maintained with the global variables: "bcbearer", "bclink" and "bcl". Allowing different namespace to own different broadcast link instances, these variables must be moved to tipc_net structure and broadcast link instances would be allocated and initialized when namespace is created. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:33 -05:00
Ying Xue	7f9f95d9d9	tipc: make bearer list support net namespace Bearer list defined as a global variable is used to store bearer instances. When tipc supports net namespace, bearers created in one namespace must be isolated with others allocated in other namespaces, which requires us that the bearer list(bearer_list) must be moved to tipc_net structure. As a result, a net namespace pointer has to be passed to functions which access the bearer list. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:32 -05:00
Ying Xue	f2f9800d49	tipc: make tipc node table aware of net namespace Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (node_list_lock) - node number counter (tipc_num_nodes) - node link number counter (tipc_num_links) To make node table support namespace, above global variables must be moved to tipc_net structure in order to keep secret for different namespaces. As a consequence, these variables are allocated and initialized when namespace is created, and deallocated when namespace is destroyed. After the change, functions associated with these variables have to utilize a namespace pointer to access them. So adding namespace pointer as a parameter of these functions is the major change made in the commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:32 -05:00
Ying Xue	c93d3baa24	tipc: involve namespace infrastructure Involve namespace infrastructure, make the "tipc_net_id" global variable aware of per namespace, and rename it to "net_id". In order that the conversion can be successfully done, an instance of networking namespace must be passed to relevant functions, allowing them to access the "net_id" variable of per namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:32 -05:00
Ying Xue	54fef04ad0	tipc: remove unused tipc_link_get_max_pkt routine Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:32 -05:00
Ying Xue	f2f2a96a20	tipc: feed tipc sock pointer to tipc_sk_timeout routine In order to make tipc socket table aware of namespace, a networking namespace instance must be passed to tipc_sk_lookup(), allowing it to look up tipc socket instance with a given port ID from a concrete socket table. However, as now tipc_sk_timeout() only has one port ID parameter and is not namespace aware, it's unable to obtain a correct socket instance through tipc_sk_lookup() just with a port ID, especially after namespace is completely supported. If port ID is replaced with socket instance as tipc_sk_timeout()'s parameter, it's unnecessary to look up socket table. But as the timer handler - tipc_sk_timeout() is run asynchronously, socket reference must be held before its timer is launched, and must be carefully checked to identify whether the socket reference needs to be put or not when its timer is terminated. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:32 -05:00
Ying Xue	859fc7c0ce	tipc: cleanup core.c and core.h files Only the works of initializing and shutting down tipc module are done in core.h and core.c files, so all stuffs which are not closely associated with the two tasks should be moved to appropriate places. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:31 -05:00
Ying Xue	2f55c43788	tipc: remove unnecessary wrapper functions of kernel timer APIs Not only some wrapper function like k_term_timer() is empty, but also some others including k_start_timer() and k_cancel_timer() don't return back any value to its caller, what's more, there is no any component in the kernel world to do such thing. Therefore, these timer interfaces defined in tipc module should be purged. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:31 -05:00
Ying Xue	6b8326ed14	tipc: remove tipc_core_start/stop routines Remove redundant wrapper functions like tipc_core_start() and tipc_core_stop(), and directly move them to their callers, such as tipc_init() and tipc_exit(), having us clearly know what are really done in both initialization and deinitialzation functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:31 -05:00
Jon Maloy	703068eee6	tipc: fix bug in broadcast retransmit code In commit `58dc55f256` ("tipc: use generic SKB list APIs to manage link transmission queue") we replace all list traversal loops with the macros skb_queue_walk() or skb_queue_walk_safe(). While the previous loops were based on the assumption that the list was NULL-terminated, the standard macros stop when the iterator reaches the list head, which is non-NULL. In the function bclink_retransmit_pkt() this macro replacement has lead to a bug. When we receive a BCAST STATE_MSG we unconditionally call the function bclink_retransmit_pkt(), whether there really is anything to retransmit or not, assuming that the sequence number comparisons will lead to the correct behavior. However, if the transmission queue is empty, or if there are no eligible buffers in the transmission queue, we will by mistake pass the list head pointer to the function tipc_link_retransmit(). Since the list head is not a valid sk_buff, this leads to a crash. In this commit we fix this by only calling tipc_link_retransmit() if we actually found eligible buffers in the transmission queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-12 16:24:31 -05:00
Ying Xue	07f6c4bc04	tipc: convert tipc reference table to use generic rhashtable As tipc reference table is statically allocated, its memory size requested on stack initialization stage is quite big even if the maximum port number is just restricted to 8191 currently, however, the number already becomes insufficient in practice. But if the maximum ports is allowed to its theory value - 2^32, its consumed memory size will reach a ridiculously unacceptable value. Apart from this, heavy tipc users spend a considerable amount of time in tipc_sk_get() due to the read-lock on ref_table_lock. If tipc reference table is converted with generic rhashtable, above mentioned both disadvantages would be resolved respectively: making use of the new resizable hash table can avoid locking on the lookup; smaller memory size is required at initial stage, for example, 256 hash bucket slots are requested at the beginning phase instead of allocating the entire 8191 slots in old mode. The hash table will grow if entries exceeds 75% of table size up to a total table size of 1M, and it will automatically shrink if usage falls below 30%, but the minimum table size is allowed down to 256. Also converts ref_table_lock to a separate mutex to protect hash table mutations on write side. Lastly defers the release of the socket reference using call_rcu() to allow using an RCU read-side protected call to rhashtable_lookup(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-01-08 19:47:14 -08:00
Fabian Frederick	886eaa1fe6	tipc: replace 0 by NULL for pointers Fix sparse warning: net/tipc/link.c:1924:40: warning: Using plain integer as NULL pointer Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-31 13:11:39 -05:00
Richard Alpe	340b6e59fb	tipc: fix broadcast wakeup contention after congestion commit `908344cdda` ("tipc: fix bug in multicast congestion handling") introduced a race in the broadcast link wakeup functionality. This patch eliminates this broadcast link wakeup race caused by operation on the wakeup list without proper locking. If this race hit and corrupted the list all subsequent wakeup messages would be lost, resulting in a considerable memory leak. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-10 14:45:33 -05:00
David S. Miller	6e5f59aacb	Merge branch 'for-davem-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs More iov_iter work for the networking from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-10 13:17:23 -05:00
Ying Xue	023160bc8f	tipc: avoid double lock 'spin_lock:&seq->lock' The commit `fb9962f3ce` ("tipc: ensure all name sequences are properly protected with its lock") involves below errors: net/tipc/name_table.c:980 tipc_purge_publications() error: double lock 'spin_lock:&seq->lock' Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-09 18:27:03 -05:00
Al Viro	c0371da604	put iov_iter into msghdr Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-09 16:29:03 -05:00
Erik Hugne	4988bb4a3f	tipc: fix missing spinlock init and nullptr oops commit `908344cdda` ("tipc: fix bug in multicast congestion handling") introduced two bugs with the bclink wakeup function. This commit fixes the missing spinlock init for the waiting_sks list. We also eliminate the race condition between the waiting_sks length check/dequeue operations in tipc_bclink_wakeup_users by simply removing the redundant length check. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Tero Aho <Tero.Aho@coriant.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-09 13:41:54 -05:00
Erik Hugne	88b17b6a22	tipc: drop tx side permission checks Part of the old remote management feature is a piece of code that checked permissions on the local system to see if a certain operation was permitted, and if so pass the command to a remote node. This serves no purpose after the removal of remote management with commit `5902385a24` ("tipc: obsolete the remote management feature") so we remove it. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-09 13:30:13 -05:00
Ying Xue	97ede29e80	tipc: convert name table read-write lock to RCU Convert tipc name table read-write lock to RCU. After this change, a new spin lock is used to protect name table on write side while RCU is applied on read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:57 -05:00
Ying Xue	834caafa3e	tipc: remove unnecessary INIT_LIST_HEAD When a list_head variable is seen as a new entry to be added to a list head, it's unnecessary to be initialized with INIT_LIST_HEAD(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:57 -05:00
Ying Xue	5492390a94	tipc: simplify relationship between name table lock and node lock When tipc name sequence is published, name table lock is released before name sequence buffer is delivered to remote nodes through its underlying unicast links. However, when name sequence is withdrawn, the name table lock is held until the transmission of the removal message of name sequence is finished. During the process, node lock is nested in name table lock. To prevent node lock from being nested in name table lock, while withdrawing name, we should adopt the same locking policy of publishing name sequence: name table lock should be released before message is sent. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:57 -05:00
Ying Xue	3493d25cfb	tipc: any name table member must be protected under name table lock As tipc_nametbl_lock is used to protect name_table structure, the lock must be held while all members of name_table structure are accessed. However, the lock is not obtained while a member of name_table structure - local_publ_count is read in tipc_nametbl_publish(), as a consequence, an inconsistent value of local_publ_count might be got. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:57 -05:00
Ying Xue	fb9962f3ce	tipc: ensure all name sequences are properly protected with its lock TIPC internally created a name table which is used to store name sequences. Now there is a read-write lock - tipc_nametbl_lock to protect the table, and each name sequence saved in the table is protected with its private lock. When a name sequence is inserted or removed to or from the table, its members might need to change. Therefore, in normal case, the two locks must be held while TIPC operates the table. However, there are still several places where we only hold tipc_nametbl_lock without proprerly obtaining name sequence lock, which might cause the corruption of name sequence. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:56 -05:00
Ying Xue	38622f4195	tipc: ensure all name sequences are released when name table is stopped As TIPC subscriber server is terminated before name table, no user depends on subscription list of name sequence when name table is stopped. Therefore, all name sequences stored in name table should be released whatever their subscriptions lists are empty or not, otherwise, memory leak might happen. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:56 -05:00
Ying Xue	993bfe5daf	tipc: make name table allocated dynamically Name table locking policy is going to be adjusted from read-write lock protection to RCU lock protection in the future commits. But its essential precondition is to convert the allocation way of name table from static to dynamic mode. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:56 -05:00
Ying Xue	1b61e70ad1	tipc: remove size variable from publ_list struct The size variable is introduced in publ_list struct to help us exactly calculate SKB buffer sizes needed by publications when all publications in name table are delivered in bulk in named_distribute(). But if publication SKB buffer size is assumed to MTU, the size variable in publ_list struct can be completely eliminated at the cost of wasting a bit memory space for last SKB. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tero Aho <tero.aho@coriant.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:39:55 -05:00
Ying Xue	a6ca109443	tipc: use generic SKB list APIs to manage TIPC outgoing packet chains Use standard SKB list APIs associated with struct sk_buff_head to manage socket outgoing packet chain and name table outgoing packet chain, having relevant code simpler and more readable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	f03273f1e2	tipc: use generic SKB list APIs to manage link receive queue Use standard SKB list APIs associated with struct sk_buff_head to manage link's receive queue to simplify its relevant code cemplexity. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	bc6fecd409	tipc: use generic SKB list APIs to manage deferred queue of link Use standard SKB list APIs associated with struct sk_buff_head to manage link's deferred queue, simplifying relevant code. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	58dc55f256	tipc: use generic SKB list APIs to manage link transmission queue Use standard SKB list APIs associated with struct sk_buff_head to manage link transmission queue, having relevant code more clean. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	58d78b328a	tipc: use skb_queue_walk_safe marco to simplify link_prepare_wakeup routine Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	99315ad43d	tipc: remove unused between routine Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	58311d1690	tipc: eliminate two pseudo message types of BUNDLE_OPEN and BUNDLE_CLOSED The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are used to flag whether or not more messages can be bundled into a data packet in the outgoing transmission queue. Obviously, no more messages can be appended after the packet has been sent and is waiting to be acknowledged and deleted. These message types do in reality represent a send-side local implementation flag, and are not defined as part of the protocol. It is therefore safe to move it to to where it belongs, that is, the control area (TIPC_SKB_CB) of the buffer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:17 -05:00
Ying Xue	47b4c9a82f	tipc: clean up the process of link pushing packets In original tipc_link_push_packet(), it pushes messages from protocol message queue, retransmission queue and next_out queue. But as the two first queues are removed, we can simplify its relevant code through deleting tipc_link_push_queue(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:16 -05:00
Ying Xue	7b6f087f98	tipc: remove retransmission queue TIPC retransmission queue is intended to record which messages should be retransmitted when bearer is not congested. However, as the retransmission queue becomes useless with the removal of bearer congestion mechanism, it should be removed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:16 -05:00
Ying Xue	8965d250c2	tipc: remove protocol message queue TIPC protocol message queue is intended to save one protocol message when bearer is congested so that the message stored in the queue can be immediately transmitted when bearer congestion is released. However, as now the protocol queue has no mission any more with the removal of bearer congestion mechanism, it should be removed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:16 -05:00
Ying Xue	a8f48af587	tipc: remove node subscription infrastructure The node subscribe infrastructure represents a virtual base class, so its users, such as struct tipc_port and struct publication, can derive its implemented functionalities. However, after the removal of struct tipc_port, struct publication is left as its only single user now. So defining an abstract infrastructure for one user becomes no longer reasonable. If corresponding new functions associated with the infrastructure are moved to name_table.c file, the node subscription infrastructure can be removed as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 12:30:16 -05:00
David S. Miller	d3fc6b3fdd	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs More work from Al Viro to move away from modifying iovecs by using iov_iter instead. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-25 20:02:51 -05:00
Richard Alpe	d8182804cf	tipc: fix sparse warnings in new nl api Fix sparse warnings about non-static declaration of static functions in the new tipc netlink API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-24 16:10:23 -05:00
Al Viro	45dcc687f7	tipc_msg_build(): pass msghdr instead of its ->msg_iov Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-24 05:16:41 -05:00
Al Viro	562640f3c3	tipc_sendmsg(): pass msghdr instead of its ->msg_iov Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-24 05:16:40 -05:00
Richard Alpe	1593123a6a	tipc: add name table dump to new netlink api Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API. This command supports dumping the name table of all nodes. Netlink logical layout of name table response message: -> name table -> publication -> type -> lower -> upper -> scope -> node -> ref -> key Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:32 -05:00
Richard Alpe	27c2141672	tipc: add net set to new netlink api Add TIPC_NL_NET_SET command to the new tipc netlink API. This command can set the network id and network (tipc) address. Netlink logical layout of network set message: -> net [ -> id ] [ -> address ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:31 -05:00
Richard Alpe	fd3cf2ad51	tipc: add net dump to new netlink api Add TIPC_NL_NET_GET command to the new tipc netlink API. This command dumps the network id of the node. Netlink logical layout of returned network data: -> net -> id Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:31 -05:00
Richard Alpe	3e4b6ab58d	tipc: add node get/dump to new netlink api Add TIPC_NL_NODE_GET to the new tipc netlink API. This command can dump the address and node status of all nodes in the tipc cluster. Netlink logical layout of returned node/address data: -> node -> address -> up flag Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:31 -05:00
Richard Alpe	1e55417d8f	tipc: add media set to new netlink api Add TIPC_NL_MEDIA_SET command to the new tipc netlink API. This command can set one or more link properties for a particular media. Netlink logical layout of bearer set message: -> media -> name -> link properties [ -> tolerance ] [ -> priority ] [ -> window ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:31 -05:00
Richard Alpe	46f15c6794	tipc: add media get/dump to new netlink api Add TIPC_NL_MEDIA_GET command to the new tipc netlink API. This command supports dumping all information about all defined media as well as getting all information about a specific media. The information about a media includes name and link properties. Netlink logical layout of media get response message: -> media -> name -> link properties -> tolerance -> priority -> window Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:31 -05:00
Richard Alpe	ae36342b50	tipc: add link stat reset to new netlink api Add TIPC_NL_LINK_RESET_STATS command to the new netlink API. This command resets the link statistics for a particular link. Netlink logical layout of link reset message: -> link -> name Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:31 -05:00
Richard Alpe	f96ce7a20d	tipc: add link set to new netlink api Add TIPC_NL_LINK_SET to the new tipc netlink API. This command can set one or more link properties for a particular link. Netlink logical layout of link set message: -> link -> name -> properties [ -> tolerance ] [ -> priority ] [ -> window ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:30 -05:00
Richard Alpe	7be57fc691	tipc: add link get/dump to new netlink api Add TIPC_NL_LINK_GET command to the new tipc netlink API. This command supports dumping all information about all links (including the broadcast link) or getting all information about a specific link (not the broadcast link). The information about a link includes name, transmission info, properties and link statistics. As the tipc broadcast link is special we unfortunately have to treat it specially. It is a deliberate decision not to abstract the broadcast link on this (API) level. Netlink logical layout of link response message: -> port -> name -> MTU -> RX -> TX -> up flag -> active flag -> properties -> priority -> tolerance -> window -> statistics -> rx_info -> rx_fragments -> rx_fragmented -> rx_bundles -> rx_bundled -> tx_info -> tx_fragments -> tx_fragmented -> tx_bundles -> tx_bundled -> msg_prof_tot -> msg_len_cnt -> msg_len_tot -> msg_len_p0 -> msg_len_p1 -> msg_len_p2 -> msg_len_p3 -> msg_len_p4 -> msg_len_p5 -> msg_len_p6 -> rx_states -> rx_probes -> rx_nacks -> rx_deferred -> tx_states -> tx_probes -> tx_nacks -> tx_acks -> retransmitted -> duplicates -> link_congs -> max_queue -> avg_queue Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:30 -05:00
Richard Alpe	1a1a143daf	tipc: add publication dump to new netlink api Add TIPC_NL_PUBL_GET command to the new tipc netlink API. This command supports dumping of all publications for a specific socket. Netlink logical layout of request message: -> socket -> reference Netlink logical layout of response message: -> publication -> type -> lower -> upper Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:30 -05:00
Richard Alpe	34b78a127c	tipc: add sock dump to new netlink api Add TIPC_NL_SOCK_GET command to the new tipc netlink API. This command supports dumping of all available sockets with their associated connection or publication(s). It could be extended to reply with a single socket if the NLM_F_DUMP isn't set. The information about a socket includes reference, address, connection information / publication information. Netlink logical layout of response message: -> socket -> reference -> address [ -> connection -> node -> socket [ -> connected flag -> type -> instance ] ] [ -> publication flag ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:30 -05:00
Richard Alpe	315c00bc9f	tipc: add bearer set to new netlink api Add TIPC_NL_BEARER_SET command to the new tipc netlink API. This command can set one or more link properties for a particular bearer. Netlink logical layout of bearer set message: -> bearer -> name -> link properties [ -> tolerance ] [ -> priority ] [ -> window ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:30 -05:00
Richard Alpe	35b9dd7607	tipc: add bearer get/dump to new netlink api Add TIPC_NL_BEARER_GET command to the new tipc netlink API. This command supports dumping all data about all bearers or getting all information about a specific bearer. The information about a bearer includes name, link priorities and domain. Netlink logical layout of bearer get message: -> bearer -> name Netlink logical layout of returned bearer information: -> bearer -> name -> link properties -> priority -> tolerance -> window -> domain Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:29 -05:00
Richard Alpe	0655f6a863	tipc: add bearer disable/enable to new netlink api A new netlink API for tipc that can disable or enable a tipc bearer. The new API is separated from the old API because of a bug in the user space client (tipc-config). The problem is that older versions of tipc-config has a very low receive limit and adding commands to the legacy genl_opts struct causes the ctrl_getfamily() response message to grow, subsequently breaking the tool. The new API utilizes netlink policies for input validation. Where the top-level netlink attributes are tipc-logical entities, like bearer. The top level entities then contain nested attributes. In this case a name, nested link properties and a domain. Netlink commands implemented in this patch: TIPC_NL_BEARER_ENABLE TIPC_NL_BEARER_DISABLE Netlink logical layout of bearer enable message: -> bearer -> name [ -> domain ] [ -> properties -> priority ] Netlink logical layout of bearer disable message: -> bearer -> name Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 15:01:29 -05:00
Holger Brunck	0372bf5c09	tipc: allow one link per bearer to neighboring nodes There is no reason to limit the amount of possible links to a neighboring node to 2. If we have more then two bearers we can also establish more links. Signed-off-by: Holger Brunck <holger.brunck@keymile.com> Reviewed-By: Jon Maloy <jon.maloy@ericsson.com> cc: Ying Xue <ying.xue@windriver.com> cc: Erik Hugne <erik.hugne@ericsson.com> cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-16 14:27:17 -05:00
David S. Miller	51f3d02b98	net: Add and use skb_copy_datagram_msg() helper. This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:46:40 -05:00
stephen hemminger	b2ad5e5fcc	tipc: spelling errors Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-30 16:56:41 -04:00
Ying Xue	1a194c2d59	tipc: fix lockdep warning when intra-node messages are delivered When running tipcTC&tipcTS test suite, below lockdep unsafe locking scenario is reported: [ 1109.997854] [ 1109.997988] ================================= [ 1109.998290] [ INFO: inconsistent lock state ] [ 1109.998575] 3.17.0-rc1+ #113 Not tainted [ 1109.998762] --------------------------------- [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] {SOFTIRQ-ON-W} state was registered at: [ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80 [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc] [ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc] [ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc] [ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0 [ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10 [ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200 [ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f [ 1109.998762] irq event stamp: 241060 [ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0 [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0 [ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50 [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [ 1109.998762] other info that might help us debug this: [ 1109.998762] Possible unsafe locking scenario: [ 1109.998762] [ 1109.998762] CPU0 [ 1109.998762] ---- [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] <Interrupt> [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] [ 1109.998762] * DEADLOCK * [ 1109.998762] [ 1109.998762] 2 locks held by swapper/7/0: [ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [ 1109.998762] stack backtrace: [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113 [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007 [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001 [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000 [ 1109.998762] Call Trace: [ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68 [ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202 [ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50 [ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0 [ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0 [ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80 [ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10 [ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc] [ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc] [ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70 [ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0 [ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70 [ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0 [ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240 [ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530 [ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310 [ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150 [ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0 [ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110 [ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f [ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250 [ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250 [ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20 [ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0 [ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0 When intra-node messages are delivered from one process to another process, tipc_link_xmit() doesn't disable BH before it directly calls tipc_sk_rcv() on process context to forward messages to destination socket. Meanwhile, if messages delivered by remote node arrive at the node and their destinations are also the same socket, tipc_sk_rcv() running on process context might be preempted by tipc_sk_rcv() running BH context. As a result, the latter cannot obtain the socket lock as the lock was obtained by the former, however, the former has no chance to be run as the latter is owning the CPU now, so headlock happens. To avoid it, BH should be always disabled in tipc_sk_rcv(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Ying Xue	7b8613e0a1	tipc: fix a potential deadlock Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-21 15:28:15 -04:00
Jon Paul Maloy	643566d4b4	tipc: fix bug in bundled buffer reception In commit `ec8a2e5621` ("tipc: same receive code path for connection protocol and data messages") we omitted the the possiblilty that an arriving message extracted from a bundle buffer may be a multicast message. Such messages need to be to be delivered to the socket via a separate function, tipc_sk_mcast_rcv(). As a result, small multicast messages arriving as members of a bundle buffer will be silently dropped. This commit corrects the error by considering this case in the function tipc_link_bundle_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-17 23:50:53 -04:00
Jon Maloy	908344cdda	tipc: fix bug in multicast congestion handling One aim of commit `50100a5e39` ("tipc: use pseudo message to wake up sockets after link congestion") was to handle link congestion abatement in a uniform way for both unicast and multicast transmit. However, the latter doesn't work correctly, and has been broken since the referenced commit was applied. If a user now sends a burst of multicast messages that is big enough to cause broadcast link congestion, it will be put to sleep, and not be waked up when the congestion abates as it should be. This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS, is set correctly, but in the wrong field. Instead of setting it in the 'action_flags' field of the arrival node struct, it is by mistake set in the dummy node struct that is owned by the broadcast link, where it will never tested for. Second, we cannot use the same flag for waking up unicast and multicast users, since the function tipc_node_unlock() needs to pick the wakeup pseudo messages to deliver from different queues. It must hence be able to distinguish between the two cases. This commit solves this problem by adding a new flag TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user(). The latter is to be called by tipc_node_unlock() when the named flag, now set in the correct field, is encountered. v2: using explicit 'unsigned int' declaration instead of 'uint', as per comment from David Miller. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 14:50:15 -04:00
Erik Hugne	0fc4dffad1	tipc: fix sparse warnings This fixes the following sparse warnings: sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static? Also, the function is changed to return bool upon success, rather than a potentially freed pointer. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:00:58 -07:00
Erik Hugne	a5325ae5b8	tipc: add name distributor resiliency queue TIPC name table updates are distributed asynchronously in a cluster, entailing a risk of certain race conditions. E.g., if two nodes simultaneously issue conflicting (overlapping) publications, this may not be detected until both publications have reached a third node, in which case one of the publications will be silently dropped on that node. Hence, we end up with an inconsistent name table. In most cases this conflict is just a temporary race, e.g., one node is issuing a publication under the assumption that a previous, conflicting, publication has already been withdrawn by the other node. However, because of the (rtt related) distributed update delay, this may not yet hold true on all nodes. The symptom of this failure is a syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error". In this commit we add a resiliency queue at the receiving end of the name table distributor. When insertion of an arriving publication fails, we retain it in this queue for a short amount of time, assuming that another update will arrive very soon and clear the conflict. If so happens, we insert the publication, otherwise we drop it. The (configurable) retention value defaults to 2000 ms. Knowing from experience that the situation described above is extremely rare, there is no risk that the queue will accumulate any large number of items. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
Erik Hugne	f4ad8a4b8b	tipc: refactor name table updates out of named packet receive routine We need to perform the same actions when processing deferred name table updates, so this functionality is moved to a separate function. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-01 17:51:48 -07:00
Ying Xue	cc086fcf92	tipc: fix a potential oops Commit `6c9808ce09` ("tipc: remove port_lock") accidentally involves a potential bug: when tipc socket instance(tsk) is not got with given reference number in tipc_sk_get(), tsk is set to NULL. Subsequently we jump to exit label where to decrease socket reference counter pointed by tsk pointer in tipc_sk_put(). However, As now tsk is NULL, oops may happen because of touching a NULL pointer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-29 20:22:43 -07:00
Jon Paul Maloy	301bae56f2	tipc: merge struct tipc_port into struct tipc_sock We complete the merging of the port and socket layer by aggregating the fields of struct tipc_port directly into struct tipc_sock, and moving the combined structure into socket.c. We also move all functions and macros that are not any longer exposed to the rest of the stack into socket.c, and rename them accordingly. Despite the size of this commit, there are no functional changes. We have only made such changes that are necessary due of the removal of struct tipc_port. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	808d90f9c5	tipc: remove files ref.h and ref.c The reference table is now 'socket aware' instead of being generic, and has in reality become a socket internal table. In order to be able to minimize the API exposed by the socket layer towards the rest of the stack, we now move the reference table definitions and functions into the file socket.c, and rename the functions accordingly. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	2e84c60b77	tipc: remove include file port.h We move the inline functions in the file port.h to socket.c, and modify their names accordingly. We move struct tipc_port and some macros to socket.h. Finally, we remove the file port.h. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	0fc87aaebd	tipc: remove source file port.c In this commit, we move the remaining functions in port.c to socket.c, and give them new names that correspond to their new location. We then remove the file port.c. There are only cosmetic changes to the moved functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:35 -07:00
Jon Paul Maloy	6c9808ce09	tipc: remove port_lock In previous commits we have reduced usage of port_lock to a minimum, and complemented it with usage of bh_lock_sock() at the remaining locations. The purpose has been to remove this lock altogether, since it largely duplicates the role of bh_lock_sock. We are now ready to do this. However, we still need to protect the BH callers from inadvertent release of the socket while they hold a reference to it. We do this by replacing port_lock by a combination of a rw-lock protecting the reference table as such, and updating the socket reference counter while the socket is referenced from BH. This technique is more standard and comprehensible than the previous approach, and turns out to have a positive effect on overall performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	9b50fd087a	tipc: replace port pointer with socket pointer in registry In order to make tipc_sock the only entity referencable from other parts of the stack, we add a tipc_sock pointer instead of a tipc_port pointer to the registry. As a consequence, we also let the function tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port. We keep the function's name for now, since the lock still is owned by the port. This is another step in the direction of eliminating port_lock, replacing its usage with lock_sock() and bh_lock_sock(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5a9ee0be33	tipc: use registry when scanning sockets The functions tipc_port_get_ports() and tipc_port_reinit() scan over all sockets/ports to access each of them. This is done by using a dedicated linked list, 'tipc_socks' where all sockets are members. The list is in turn protected by a spinlock, 'port_list_lock', while each socket is locked by using port_lock at the moment of access. In order to reduce complexity and risk of deadlock, we want to get rid of the linked list and the accompanying spinlock. This is what we do in this commit. Instead of the linked list, we use the port registry to scan across the sockets. We also add usage of bh_lock_sock() inside the scope of port_lock in both functions, as a preparation for the complete removal of port_lock. Finally, we move the functions from port.c to socket.c, and rename them to tipc_sk_sock_show() and tipc_sk_reinit() repectively. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5b8fa7ce82	tipc: eliminate functions tipc_port_init and tipc_port_destroy After the latest changes to the socket/port layer the existence of the functions tipc_port_init() and tipc_port_destroy() cannot be justified. They are both called only once, from tipc_sk_create() and tipc_sk_delete() respectively, and their functionality can better be merged into the latter two functions. This also entails that all remaining references to port_lock now are made from inside socket.c, something that will make it easier to remove this lock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	739f5e4efc	tipc: redefine message acknowledge function The function tipc_acknowledge() is a remnant from the obsolete native API. Currently, it grabs port_lock, before building an acknowledge message and sending it to the peer. Since all access to socket members now is protected by the socket lock, it has become unnecessary to grab port_lock here. In this commit, we remove the usage of port_lock, simplify the function, and move it to socket.c, renaming it to tipc_sk_send_ack(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	dadebc0029	tipc: eliminate port_connect()/port_disconnect() functions tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete native API. Their only task is to grab port_lock and call the functions __tipc_port_connect()/__tipc_port_disconnect() respectively, which will perform the actual state change. Since socket/port exection now is single-threaded the use of port_lock is not needed any more, so we can safely replace the two functions with their lock-free counterparts. In this commit, we remove the two functions. Furthermore, the contents of __tipc_port_disconnect() is so trivial that we choose to eliminate that function too, expanding its functionality into tipc_shutdown(). __tipc_port_connect() is simplified, moved to socket.c, and given the more correct name tipc_sk_finish_conn(). Finally, we eliminate the function auto_connect(), and expand its contents into filter_connect(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	80e44c2225	tipc: eliminate function tipc_port_shutdown() tipc_port_shutdown() is a remnant from the now obsolete native interface. As such it grabs port_lock in order to protect itself from concurrent BH processing. However, after the recent changes to the port/socket upcalls, sockets are now basically single-threaded, and all execution, except the read-only tipc_sk_timer(), is executing within the protection of lock_sock(). So the use of port_lock is not needed here. In this commit we eliminate the whole function, and merge it into its only caller, tipc_shutdown(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:34 -07:00
Jon Paul Maloy	5728901581	tipc: clean up socket timer function The last remaining BH upcall to the socket, apart for the message reception function tipc_sk_rcv(), is the timer function. We prefer to let this function continue executing in BH, since it only does read-acces to semi-permanent data, but we make three changes to it: 1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope of port_lock. This is a preparation for replacing port_lock with bh_lock_sock() at the locations where it is still used. 2) We move the function from port.c to socket.c, as a further step of eliminating the port code level altogether. 3) We let it make use of the newly introduced tipc_msg_create() function. This enables us to get rid of three context specific functions (port_create_self_abort_msg() etc.) in port.c Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	02be61a981	tipc: use message to abort connections when losing contact to node In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	50100a5e39	tipc: use pseudo message to wake up sockets after link congestion The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
Jon Paul Maloy	1dd0bd2b14	tipc: introduce new function tipc_msg_create() The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-23 11:18:33 -07:00
David S. Miller	02784f1b05	tipc: Fix build. Missing semicolon in range check fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-19 11:16:38 -07:00
Erik Hugne	ac32c7f705	tipc: fix message importance range check Commit `3b4f302d85` ("tipc: eliminate redundant locking") introduced a bug by removing the sanity check for message importance, allowing programs to assign any value to the msg_user field. This will mess up the packet reception logic and may cause random link resets. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-16 20:17:34 -07:00
Wei Yongjun	ad025a56a5	tipc: remove duplicated include from socket.c Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-29 15:51:14 -07:00
Jon Paul Maloy	13e9b9972f	tipc: make tipc_buf_append() more robust As per comment from David Miller, we try to make the buffer reassembly function more resilient to user errors than it is today. - We check that the "buf" parameter always is set, since this is mandatory input. - We ensure that buf->next always is set to NULL before linking in the buffer, instead of relying of the caller to have done this. - We ensure that the "tail" pointer in the head buffer's control block is initialized to NULL when the first fragment arrives. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-28 18:34:01 -07:00
Wei Yongjun	52f50ce556	tipc: fix sparse non static symbol warnings Fixes the following sparse warnings: net/tipc/socket.c:545:5: warning: symbol 'tipc_sk_proto_rcv' was not declared. Should it be static? net/tipc/socket.c:2015:5: warning: symbol 'tipc_ioctl' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 22:19:04 -07:00
Jon Paul Maloy	6f92ee54b3	tipc: ensure sequential message delivery across dual bearers When we run broadcast packets over dual bearers/interfaces, the current transmission code is flipping bearers between each sent packet, with the purpose of leveraging the double bandwidth available. The receiving bclink is resequencing the packets if needed, so all messages are delivered upwards from the broadcast link in the correct order, even if they may arrive in concurrent interrupts. However, at the moment of delivery upwards to the socket, we release all spinlocks (bclink_lock, node_lock), so it is still possible that arriving messages bypass each other before they reach the socket queue. We fix this by applying the same technique we are using for unicast traffic. We use a link selector (i.e., the last bit of sending port number) to ensure that messages from the same sender socket always are sent over the same bearer. This guarantees sequential delivery between socket pairs, which is sufficient to satisfy the protocol spec, as well as all known user requirements. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	9fbfb8b120	tipc: rename temporarily named functions After the previous commit, we can now give the functions with temporary names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper names. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	c4116e1057	tipc: remove unreferenced functions We can now remove a number of functions which have become obsolete and unreferenced through this commit series. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:19 -07:00
Jon Paul Maloy	0abd8ff21f	tipc: start using the new multicast functions In this commit, we convert the socket multicast send function to directly call the new multicast/broadcast function (tipc_bclink_xmit2()) introduced in the previous commit. We do this instead of letting the call go via the now obsolete tipc_port_mcast_xmit(), hence saving a call level and some code complexity. We also remove the initial destination lookup at the message sending side, and replace that with an unconditional lookup at the receiving side, including on the sending node itself. This makes the destination lookup and message transfer more uniform than before. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	078bec826f	tipc: add new functions for multicast and broadcast distribution We add a new broadcast link transmit function in bclink.c and a new receive function in socket.c. The purpose is to move the branching between external and internal destination down to the link layer, just as we have done with unicast in earlier commits. We also make use of the new link-independent fragmentation support that was introduced in an earlier commit series. This gives a shorter and simpler code path, and makes it possible to obtain copy-free buffer delivery to all node local destination sockets. The new transmission code is added in parallel with the existing one, and will be used by the socket multicast send function in the next commit in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	25b660c7e2	tipc: let internal link users call the new link send function We convert the link internal users (changeover protocol, broadcast synchronization) to use the new packet send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
Jon Paul Maloy	dbdf6d24ad	tipc: make name table distributor use new send function In a previous commit series ("tipc: new unicast transmission code") we introduced a new message sending function, tipc_link_xmit2(), and moved the unicast data users over to use that function. We now let the internal name table distributor do the same. The interaction between the name distributor and the node/link layer also becomes significantly simpler, so we can eliminate the function tipc_link_names_xmit(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 21:38:18 -07:00
David S. Miller	1a98c69af1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-16 14:09:34 -07:00
Fabian Frederick	7ceaa583be	tipc: remove unnecessary break after return Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:26:59 -07:00
Jon Paul Maloy	999417549c	tipc: clear 'next'-pointer of message fragments before reassembly If the 'next' pointer of the last fragment buffer in a message is not zeroed before reassembly, we risk ending up with a corrupt message, since the reassembly function itself isn't doing this. Currently, when a buffer is retrieved from the deferred queue of the broadcast link, the next pointer is not cleared, with the result as described above. This commit corrects this, and thereby fixes a bug that may occur when long broadcast messages are transmitted across dual interfaces. The bug has been present since `40ba3cdf54` ("tipc: message reassembly using fragment chain") This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-11 15:02:10 -07:00
Erik Hugne	70452dcb6d	tipc: fix a memleak when sending data This fixes a regression bug caused by: `067608e9d0` ("tipc: introduce direct iovec to buffer chain fragmentation function") If data is sent on a nonblocking socket and the destination link is congested, the buffer chain is leaked. We fix this by freeing the chain in this case. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 16:10:01 -07:00
Jon Paul Maloy	29322d0db9	tipc: fix bug in multicast/broadcast message reassembly Since commit `37e22164a8` ("tipc: rename and move message reassembly function") reassembly of long broadcast messages has been broken. This is because we test for a non-NULL return value of the *buf parameter as criteria for succesful reassembly. However, this parameter is left defined even after reception of the first fragment, when reassebly is still incomplete. This leads to a kernel crash as soon as a the first fragment of a long broadcast message is received. We fix this with this commit, by implementing a stricter behavior of the function and its return values. This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 15:55:09 -07:00
Erik Hugne	3f53bd8f8b	tipc: fix link acknowledge logic in receive path Link state acks triggered from the receive path is done before the last received packet have been processed by the link layer. The effect of this is that the last received packet will not be included in the ack. This causes problems if the link window is set to TIPC_MIN_LINK_WIN, where the ack interval will be equal to the link tolerance, and the link enters a stop-and-go behavior. We move the ack logic to after link state processing, just before the packet is delivered to higher layers. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Carl Sigurjonsson <carl.sigurjonsson@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:55:07 -07:00
Erik Hugne	7ae934bebe	tipc: refactor message delivery out of tipc_rcv This is a cosmetic change, separating message delivery from the link state processing. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:55:07 -07:00
Jon Paul Maloy	60120526c2	tipc: simplify connection congestion handling As a consequence of the recently introduced serialized access to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa ("tipc: same receive code path for connection protocol and data messages") we can make a number of simplifications in the detection and handling of connection congestion situations. - We don't need to keep two counters, one for sent messages and one for acked messages. There is no longer any risk for races between acknowledge messages arriving in BH and data message sending running in user context. So we merge this into one counter, 'sent_unacked', which is incremented at sending and subtracted from at acknowledge reception. - We don't need to set the 'congested' field in tipc_port to true before we sent the message, and clear it when sending is successful. (As a matter of fact, it was never necessary; the field was set in link_schedule_port() before any wakeup could arrive anyway.) - We keep the conditions for link congestion and connection connection congestion separated. There would otherwise be a risk that an arriving acknowledge message may wake up a user sleeping because of link congestion. - We can simplify reception of acknowledge messages. We also make some cosmetic/structural changes: - We rename the 'congested' field to the more correct 'link_cong´. - We rename 'conn_unacked' to 'rcv_unacked' - We move the above mentioned fields from struct tipc_port to struct tipc_sock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	ac0074ee70	tipc: clean up connection protocol reception function We simplify the code for receiving connection probes, leveraging the recently introduced tipc_msg_reverse() function. We also stick to the principle of sending a possible response message directly from the calling (tipc_sk_rcv or backlog_rcv) functions, hence making the call chain shallower and easier to follow. We make one small protocol change here, allowed according to the spec. If a protocol message arrives from a remote socket that is not the one we are connected to, we are currently generating a connection abort message and send it to the source. This behavior is unnecessary, and might even be a security risk, so instead we now choose to only ignore the message. The consequnce for the sender is that he will need longer time to discover his mistake (until the next timeout), but this is an extreme corner case, and may happen anyway under other circumstances, so we deem this change acceptable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	ec8a2e5621	tipc: same receive code path for connection protocol and data messages As a preparation to eliminate port_lock we need to bring reception of connection protocol messages under proper protection of bh_lock_sock or socket owner. We fix this by letting those messages follow the same code path as incoming data messages. As a side effect of this change, the last reference to the function net_route_msg() disappears, and we can eliminate that function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:56 -07:00
Jon Paul Maloy	b786e2b0fa	tipc: let port protocol senders use new link send function Several functions in port.c, related to the port protocol and connection shutdown, need to send messages. We now convert them to use the new link send function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4ccfe5e041	tipc: connection oriented transport uses new send functions We move the message sending across established connections to use the message preparation and send functions introduced earlier in this series. We now do the message preparation and call to the link send function directly from the socket, instead of going via the port layer. As a consequence of this change, the functions tipc_send(), tipc_port_iovec_rcv(), tipc_port_iovec_reject() and tipc_reject_msg() become unreferenced and can be eliminated from port.c. For the same reason, the functions tipc_link_xmit_fast(), tipc_link_iovec_xmit_long() and tipc_link_iovec_fast() can be eliminated from link.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	e2dafe87d3	tipc: RDM/DGRAM transport uses new fragmenting and sending functions We merge the code for sending port name and port identity addressed messages into the corresponding send functions in socket.c, and start using the new fragmenting and transmit functions we just have introduced. This saves a call level and quite a few code lines, as well as making this part of the code easier to follow. As a consequence, the functions tipc_send2name() and tipc_send2port() in port.c can be removed. For practical reasons, we break out the code for sending multicast messages from tipc_sendmsg() and move it into a separate function, tipc_sendmcast(), but we do not yet convert it into using the new build/send functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	5a379074a7	tipc: introduce message evaluation function When a message arrives in a node and finds no destination socket, we may need to drop it, reject it, or forward it after a secondary destination lookup. The latter two cases currently results in a code path that is perceived as complex, because it follows a deep call chain via obscure functions such as net_route_named_msg() and net_route_msg(). We now introduce a function, tipc_msg_eval(), that takes the decision about whether such a message should be rejected or forwarded, but leaves it to the caller to actually perform the indicated action. If the decision is 'reject', it is still the task of the recently introduced function tipc_msg_reverse() to take the final decision about whether the message is rejectable or not. In the latter case it drops the message. As a result of this change, we can finally eliminate the function net_route_named_msg(), and hence become independent of net_route_msg(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	8db1bae30b	tipc: separate building and sending of rejected messages The way we build and send rejected message is currenty perceived as hard to follow, partly because we let the transmission go via deep call chains through functions such as tipc_reject_msg() and net_route_msg(). We want to remove those functions, and make the call sequences shallower and simpler. For this purpose, we separate building and sending of rejected messages. We build the reject message using the new function tipc_msg_reverse(), and let the transmission go via the newly introduced tipc_link_xmit2() function, as all transmission eventually will do. We also ensure that all calls to tipc_link_xmit2() are made outside port_lock/bh_lock_sock. Finally, we replace all calls to tipc_reject_msg() with the two new calls at all locations in the code that we want to keep. The remaining calls are made from code that we are planning to remove, along with tipc_reject_msg() itself. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	067608e9d0	tipc: introduce direct iovec to buffer chain fragmentation function Fragmentation at message sending is currently performed in two places in link.c, depending on whether data to be transmitted is delivered in the form of an iovec or as a big sk_buff. Those functions are also tightly entangled with the send functions that are using them. We now introduce a re-entrant, standalone function, tipc_msg_build2(), that builds a packet chain directly from an iovec. Each fragment is sized according to the MTU value given by the caller, and is prepended with a correctly built fragment header, when needed. The function is independent from who is calling and where the chain will be delivered, as long as the caller is able to indicate a correct MTU. The function is tested, but not called by anybody yet. Since it is incompatible with the existing tipc_msg_build(), and we cannot yet remove that function, we have given it a temporary name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	16e166b88c	tipc: make link mtu easily accessible from socket Message fragmentation is currently performed at link level, inside the protection of node_lock. This potentially binds up the sending link structure for a long time, instead of letting it do other tasks, such as handle reception of new packets. In this commit, we make the MTUs of each active link become easily accessible from the socket level, i.e., without taking any spinlock or dereferencing the target link pointer. This way, we make it possible to perform fragmentation in the sending socket, before sending the whole fragment chain to the link for transport. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:55 -07:00
Jon Paul Maloy	4f1688b2c6	tipc: introduce send functions for chained buffers in link The current link implementation provides several different transmit functions, depending on the characteristics of the message to be sent: if it is an iovec or an sk_buff, if it needs fragmentation or not, if the caller holds the node_lock or not. The permutation of these options gives us an unwanted amount of unnecessarily complex code. As a first step towards simplifying the send path for all messages, we introduce two new send functions at link level, tipc_link_xmit2() and __tipc_link_xmit2(). The former looks up a link to the message destination, and if one is found, it grabs the node lock and calls the second function, which works exclusively inside the node lock protection. If no link is found, and the destination is on the same node, it delivers the message directly to the local destination socket. The new functions take a buffer chain where all packet headers are already prepared, and the correct MTU has been used. These two functions will later replace all other link-level transmit functions. The functions are not backwards compatible, so we have added them as new functions with temporary names. They are tested, but have no users yet. Those will be added later in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Jon Paul Maloy	e4de5fab80	tipc: use negative error return values in functions In some places, TIPC functions returns positive integers as return codes. This goes against standard Linux coding practice, and may even cause problems in some cases. We now change the return values of the functions filter_rcv() and filter_connect() to become signed integers, and return negative error codes when needed. The codes we use in these particular cases are still TIPC specific, since they are both part of the TIPC API and have no correspondence in errno.h Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Jon Paul Maloy	3d09fc4244	tipc: eliminate case of writing to freed memory In the function tipc_nodesub_notify() we call a function pointer aggregated into the object to be notified, whereafter we set the function pointer to NULL. However, in some cases the function pointed to will free the struct containing the function pointer, resulting in a write to already freed memory. This bug seems to always have been there, without causing any notable harm. In this commit we fix the problem by inverting the order of the zeroing and the function call. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 12:50:54 -07:00
Octavian Purdila	bad93e9d4e	net: add __pskb_copy_fclone and pskb_copy_for_clone There are several instances where a pskb_copy or __pskb_copy is immediately followed by an skb_clone. Add a couple of new functions to allow the copy skb to be allocated from the fclone cache and thus speed up subsequent skb_clone calls. Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:38:02 -07:00
Jon Paul Maloy	02c00c2ab0	tipc: fix potential bug in function tipc_backlog_rcv In commit `4f4482dcd9` ("tipc: compensate for double accounting in socket rcv buffer") we access 'truesize' of a received buffer after it might have been released by the function filter_rcv(). In this commit we correct this by reading the value of 'truesize' to the stack before delivering the buffer to filter_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:01:30 -07:00
Arnaldo Carvalho de Melo	85d3fc9418	tipc: Don't reset the timeout when restarting As it may then take longer than what the user specified using setsockopt(SO_RCVTIMEO). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-24 14:11:41 -04:00
Jon Paul Maloy	9816f0615d	tipc: merge port message reception into socket reception function In order to reduce complexity and save a call level during message reception at port/socket level, we remove the function tipc_port_rcv() and merge its functionality into tipc_sk_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	c82910e2a8	tipc: clean up neigbor discovery message reception The function tipc_disc_rcv(), which is handling received neighbor discovery messages, is perceived as messy, and it is hard to verify its correctness by code inspection. The fact that the task it is set to resolve is fairly complex does not make the situation better. In this commit we try to take a more systematic approach to the problem. We define a decision machine which takes three state flags as input, and produces three action flags as output. We then walk through all permutations of the state flags, and for each of them we describe verbally what is going on, plus that we set zero or more of the action flags. The action flags indicate what should be done once the decision machine has finished its job, while the last part of the function deals with performing those actions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	38504c28a2	tipc: improve and extend media address conversion functions TIPC currently handles two media specific addresses: Ethernet MAC addresses and InfiniBand addresses. Those are kept in three different formats: 1) A "raw" format as obtained from the device. This format is known only by the media specific adapter code in eth_media.c and ib_media.c. 2) A "generic" internal format, in the form of struct tipc_media_addr, which can be referenced and passed around by the generic media- unaware code. 3) A serialized version of the latter, to be conveyed in neighbor discovery messages. Conversion between the three formats can only be done by the media specific code, so we have function pointers for this purpose in struct tipc_media. Here, the media adapters can install their own conversion functions at startup. We now introduce a new such function, 'raw2addr()', whose purpose is to convert from format 1 to format 2 above. We also try to as far as possible uniform commenting, variable names and usage of these functions, with the purpose of making them more comprehensible. We can now also remove the function tipc_l2_media_addr_set(), whose job is done better by the new function. Finally, we expand the field for serialized addresses (format 3) in discovery messages from 20 to 32 bytes. This is permitted according to the spec, and reduces the risk of problems when we add new media in the future. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	37e22164a8	tipc: rename and move message reassembly function The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	5074ab89c5	tipc: mark head of reassembly buffer as non-linear The message reassembly function does not update the 'len' and 'data_len' fields of the head skbuff correctly when fragments are chained to it. This may sometimes lead to obsure errors, such as fragment reordering when we receive fragments which are cloned buffers. This commit fixes this, by ensuring that the two fields are updated correctly. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	ec37dcd382	tipc: don't record link RESET or ACTIVATE messages as traffic In the current code, all incoming LINK_PROTOCOL messages, irrespective of type, nudge the "last message received" checkpoint, informing the link state machine that a message was received from the peer since last supervision timeout event. This inhibits the link from starting probing the peer unnecessarily. However, not only STATE messages are recorded as legitimate incoming traffic this way, but even RESET and ACTIVATE messages, which in reality are there to inform the link that the peer endpoint has been reset. At the same time, some RESET messages may be dropped instead of causing a link reset. This happens when the link endpoint thinks it is fully up and working, and the session number of the RESET is lower than or equal to the current link session. In such cases the RESET is perceived as a delayed remnant from an earlier session, or the current one, and dropped. Now, if a TIPC module is removed and then immediately reinserted, e.g. when using a script, RESET messages may arrive at the peer link endpoint before this one has had time to discover the failure. The RESET may be dropped because of the session number, but only after it has been recorded as a legitimate traffic event. Hence, the receiving link will not start probing, and not discover that the peer endpoint is down, at the same time ignoring the periodic RESET messages coming from that endpoint. We have ended up in a stale state where a failed link cannot be re-established. In this commit, we remedy this by nudging the checkpoint only for received STATE messages, not for RESET or ACTIVATE messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:48 -04:00
Jon Paul Maloy	4f4482dcd9	tipc: compensate for double accounting in socket rcv buffer The function net/core/sock.c::__release_sock() runs a tight loop to move buffers from the socket backlog queue to the receive queue. As a security measure, sk_backlog.len of the receiving socket is not set to zero until after the loop is finished, i.e., until the whole backlog queue has been transferred to the receive queue. During this transfer, the data that has already been moved is counted both in the backlog queue and the receive queue, hence giving an incorrect picture of the available queue space for new arriving buffers. This leads to unnecessary rejection of buffers by sk_add_backlog(), which in TIPC leads to unnecessarily broken connections. In this commit, we compensate for this double accounting by adding a counter that keeps track of it. The function socket.c::backlog_rcv() receives buffers one by one from __release_sock(), and adds them to the socket receive queue. If the transfer is successful, it increases a new atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the transferred buffer. If a new buffer arrives during this transfer and finds the socket busy (owned), we attempt to add it to the backlog. However, when sk_add_backlog() is called, we adjust the 'limit' parameter with the value of the new counter, so that the risk of inadvertent rejection is eliminated. It should be noted that this change does not invalidate the original purpose of zeroing 'sk_backlog.len' after the full transfer. We set an upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that doesn't respect the send window) keeps pumping in buffers to sk_add_backlog(), he will eventually reach an upper limit, (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added to the backlog, and the connection will be broken. Ordinary, well- behaved senders will never reach this buffer limit at all. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
Jon Paul Maloy	6163a194e0	tipc: decrease connection flow control window Memory overhead when allocating big buffers for data transfer may be quite significant. E.g., truesize of a 64 KB buffer turns out to be 132 KB, 2 x the requested size. This invalidates the "worst case" calculation we have been using to determine the default socket receive buffer limit, which is based on the assumption that 1024x64KB = 67MB buffers may be queued up on a socket. Since TIPC connections cannot survive hitting the buffer limit, we have to compensate for this overhead. We do that in this commit by dividing the fix connection flow control window from 1024 (2512) messages to 512 (2256). Since older version nodes send out acks at 512 message intervals, compatibility with such nodes is guaranteed, although performance may be non-optimal in such cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-14 15:19:47 -04:00
David S. Miller	5f013c9bc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 13:19:14 -04:00
Ying Xue	ca9cf06a06	tipc: don't directly overwrite node action_flags Each node action flag should be set or cleared separately, instead we now set the whole flags variable in one shot, and it's turned out to be hard to see which other flags are affected. Therefore, for instance, we explicitly clear TIPC_WAIT_OWN_LINKS_DOWN bit in node_lost_contact(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 01:41:01 -04:00
Ying Xue	aecb9bb89c	tipc: rename enum names of node flags Rename node flags to action_flags as well as its enum names so that they can reflect its real meanings. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-09 01:41:01 -04:00
Ying Xue	52ff872055	tipc: purge signal handler infrastructure In the previous commits of this series, we removed all asynchronous actions which were based on the tasklet handler - "tipc_k_signal()". So the moment has now come when we can completely remove the tasklet handler infrastructure. That is done with this commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	3f5a12bd9f	tipc: avoid to asynchronously reset all links Postpone the actions of resetting all links until after bclink lock is released, avoiding to asynchronously reset all links. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	eb8b00f5f2	tipc: convert allocations of global variables associated with bclink Convert allocations of global variables associated with bclink from static way to dynamical way for the convenience of bclink instance initialisation. Meanwhile, this also helps TIPC support name space in the future easily. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:45 -04:00
Ying Xue	d69afc90b8	tipc: define new functions to operate bc_lock As we are going to do more jobs when bc_lock is released, the two operations of holding/releasing the lock should be encapsulated with functions. In addition, we move bc_lock spin lock into tipc_bclink structure avoiding to define the global variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	ca0c42732c	tipc: avoid to asynchronously deliver name tables to peer node Postpone the actions of delivering name tables until after node lock is released, avoiding to do it under asynchronous context. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	9d56194968	tipc: remove TIPC_NAMES_GONE node flag Since previously what all publications pertaining to the lost node were removed from name table was finished in tasklet context asynchronously, we need to TIPC_NAMES_GONE flag indicating whether the node cleanup work is finished or not. But now as the cleanup work has been finished when node lock is released, the flag becomes meaningless for us. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	9db9fdd198	tipc: avoid to asynchronously notify subscriptions Postpone the actions of notifying subscriptions until after node lock is released, avoiding to asynchronously execute registered handlers when node is lost. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	10f465c496	tipc: rename setup_blocked variable of node struct to flags Rename setup_blocked variable of node struct to a more common name called "flags", which will be used to represent kinds of node states. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	486f930ac5	tipc: adjust order of variables in tipc_node structure Move more frequently used variables up to the head of tipc_node structure, hopefully improving a bit performance. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:44 -04:00
Ying Xue	5356f3d7d4	tipc: always use tipc_node_lock() to hold node lock Although we obtain node lock with tipc_node_lock() in most time, there are still places where we directly use native spin lock interface to grab node lock. But as we will do more jobs in the future when node lock is released, we should ensure that tipc_node_lock() is always called when node lock is taken. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-05 17:26:43 -04:00
Ying Xue	1621b94d2a	tipc: fix memory leak of publications Commit `1bb8dce57f` ("tipc: fix memory leak during module removal") introduced a memory leak issue: when name table is stopped, it's forgotten that publication instances are freed properly. Additionally the useless "continue" statement in tipc_nametbl_stop() is removed as well. Reported-by: Jason <huzhijiang@gmail.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-30 13:31:26 -04:00
Ying Xue	eab8c04573	tipc: move the delivery of named messages out of nametbl lock Commit `a89778d8ba` ("tipc: add support for link state subscriptions") introduced below possible deadlock scenario: CPU0 CPU1 T0: tipc_publish() link_timeout() T1: tipc_nametbl_publish() [grab node lock]* T2: [grab nametbl write lock]* link_state_event() T3: named_cluster_distribute() link_activate() T4: [grab node lock]* tipc_node_link_up() T5: tipc_nametbl_publish() T6: [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the delivery of named messages via link out of name nametbl lock, the reverse order of holding locks will be eliminated, as a result, the deadlock will be killed as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:49:54 -04:00
Erik Hugne	d7bb74c38c	tipc: fix out of bounds indexing Commit `78acb1f9b8` ("tipc: add ioctl to fetch link names") introduced a buffer overflow bug where specially crafted ioctl requests could cause out-of-bounds indexing of the node->links array. This was caused by an incorrect check vs MAX_BEARERS, and the static code checker complaint is: net/tipc/node.c:459 tipc_node_get_linkname() error: buffer overflow 'node->links' 2 <= 2 Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-28 14:43:35 -04:00
Ying Xue	22e7987ae7	tipc: fix a possible memory leak The commit `a8b9b96e95` ("tipc: fix race in disc create/delete") leads to the following static checker warning: net/tipc/discover.c:352 tipc_disc_create() warn: possible memory leak of 'req' The risk of memory leak really exists in practice. Especially when it's failed to allocate memory for "req->buf", tipc_disc_create() doesn't free its allocated memory, instead just directly returns with ENOMEM error code. In this situation, memory leak, of course, happens. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:08:06 -04:00
Erik Hugne	78acb1f9b8	tipc: add ioctl to fetch link names We add a new ioctl for AF_TIPC that can be used to fetch the logical name for a link to a remote node on a given bearer. This should be used in combination with link state subscriptions. The logical name size limit definitions are moved to tipc.h, as they are now also needed by the new ioctl. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Erik Hugne	a89778d8ba	tipc: add support for link state subscriptions When links are established over a bearer plane, we create a node local publication containing information about the peer node and bearer plane. This allows TIPC applications to use the standard TIPC topology server subscription mechanism to get notifications when a link goes up or down. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-26 12:13:24 -04:00
Eric W. Biederman	90f62cf30a	net: Use netlink_ns_capable to verify the permisions of netlink messages It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-24 13:44:54 -04:00
Ying Xue	a8b9b96e95	tipc: fix race in disc create/delete Commit `a21a584d67` (tipc: fix neighbor detection problem after hw address change) introduces a race condition involving tipc_disc_delete() and tipc_disc_add/remove_dest that can cause TIPC to dereference the pointer to the bearer discovery request structure after it has been freed since a stray pointer is left in the bearer structure. In order to fix the issue, the process of resetting the discovery request handler is optimized: the discovery request handler and request buffer are just reset instead of being freed, allocated and initialized. As the request point is always valid and the request's lock is taken while the request handler is reset, the race doesn't happen any more. Reported-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	28dd94187a	tipc: use bc_lock to protect node map in bearer structure The node map variable - 'nodes' in bearer structure is only used by bclink. When bclink accesses it, bc_lock is held. But when change it, for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest() the bc_lock is not taken at all. To avoid any inconsistent data, we should always grab bc_lock while accessing node map variable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	4ae88c94d3	tipc: use bearer_disable to disable bearer in tipc_l2_device_event As bearer pointer is known in tipc_l2_device_event(), it's unnecessary to search it again in tipc_disable_bearer(). If tipc_disable_bearer() is replaced with bearer_disable() in tipc_l2_device_event(), this will help us save a bit time when bearer is disabled. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f1c8d8cb82	tipc: make media_ptr pointed netdevice valid The 'media_ptr' pointer in bearer structure which points to network device, is protected by RCU. So, before netdevice is released, synchronize_net() should be involved to prevent no any user of the netdevice on read side from accessing it after it is freed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7216cd949c	tipc: purge tipc_net_lock lock Now tipc routing hierarchy comprises the structures 'node', 'link'and 'bearer'. The whole hierarchy is protected by a big read/write lock, tipc_net_lock, to ensure that nothing is added or removed while code is accessing any of these structures. Obviously the locking policy makes node, link and bearer components closely bound together so that their relationship becomes unnecessarily complex. In the worst case, such locking policy not only has a negative influence on performance, but also it's prone to lead to deadlock occasionally. In order o decouple the complex relationship between bearer and node as well as link, the locking policy is adjusted as follows: - Bearer level RTNL lock is used on update side, and RCU is used on read side. Meanwhile, all bearer instances including broadcast bearer are saved into bearer_list array. - Node and link level All node instances are saved into two tipc_node_list and node_htable lists. The two lists are protected by node_list_lock on write side, and they are guarded with RCU lock on read side. All members in node structure including link instances are protected by node spin lock. - The relationship between bearer and node When link accesses bearer, it first needs to find the bearer with its bearer identity from the bearer_list array. When bearer accesses node, it can iterate the node_htable hash list with the node address to find the corresponding node. In the new locking policy, every component has its private locking solution and the relationship between bearer and node is very simple, that is, they can find each other with node address or bearer identity from node_htable hash list or bearer_list array. Until now above all changes have been done, so tipc_net_lock can be removed safely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	2231c5af45	tipc: use RCU to protect media_ptr pointer Now the media_ptr pointer is protected with tipc_net_lock write lock on write side; tipc_net_lock read lock is used to read side. As the part of effort of eliminating tipc_net_lock, we decide to adjust the locking policy of media_ptr pointer protection: on write side, RTNL lock is use while on read side RCU read lock is applied. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	7a2f7d18e7	tipc: decouple the relationship between bearer and link Currently on both paths of message transmission and reception, the read lock of tipc_net_lock must be held before bearer is accessed, while the write lock of tipc_net_lock has to be taken before bearer is configured. Although it can ensure that bearer is always valid on the two data paths, link and bearer is closely bound together. So as the part of effort of removing tipc_net_lock, the locking policy of bearer protection will be adjusted as below: on the two data paths, RCU is used, and on the configuration path of bearer, RTNL lock is applied. Now RCU just covers the path of message reception. To make it possible to protect the path of message transmission with RCU, link should not use its stored bearer pointer to access bearer, but it should use the bearer identity of its attached bearer as index to get bearer instance from bearer_list array, which can help us decouple the relationship between bearer and link. As a result, bearer on the path of message transmission can be safely protected by RCU when we access bearer_list array within RCU lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:53 -04:00
Ying Xue	f8322dfce5	tipc: convert bearer_list to RCU list Convert bearer_list to RCU list. It's protected by RTNL lock on update side, and RCU read lock is applied to read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	f97e455abf	tipc: use RTNL lock to protect tipc_net_stop routine As the tipc network initialization(ie, tipc_net_start routine) is under RTNL protection, its corresponding deinitialization part(ie, tipc_net_stop routine) should be protected by RTNL too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ca07fb07c9	tipc: adjust locking policy of protecting tipc_ptr pointer of net_device Currently the 'tipc_ptr' pointer is protected by tipc_net_lock write lock on write side, and RCU read lock is applied to read side. In addition, there have two paths on write side where we may change variables pointed by the 'tipc_ptr' pointer: one is to configure bearer by tipc-config tool and another one is that bearer status is changed by notification events of its attached interface. But on the latter path, we improperly deem that accessing 'tipc_ptr' pointer happens on read side with rcu_read_lock() although some variables pointed by the 'tipc_ptr' pointer are changed possibly. Moreover, as now the both paths are guarded by RTNL lock, it's better to adjust the locking policy of 'tipc_ptr' pointer protection, allowing RTNL instead of tipc_net_lock write lock to protect it on write side, which will help us purge tipc_net_lock in the future. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
Ying Xue	ef13a262c3	tipc: replace config_mutex lock with RTNL lock There have two paths where we can configure or change bearer status: one is that bearer is configured from user space with tipc-config tool; another one is that bearer is changed by notification events from its attached interface. On the first path, one dedicated config_mutex lock is guarded; on the latter path, RTNL lock has been placed to serialize the process of dealing with interface events. So, if RTNL lock is also used to protect the first path, this will not only extremely help us simplify current locking policy, but also config_mutex lock can be deleted as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-22 21:17:52 -04:00
David S. Miller	676d23690f	net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:15:36 -04:00
Geert Uytterhoeven	065d7e3956	tipc: Let tipc_release() return 0 net/tipc/socket.c: In function ‘tipc_release’: net/tipc/socket.c:352: warning: ‘res’ is used uninitialized in this function Introduced by commit `24be34b5a0` ("tipc: eliminate upcall function pointers between port and socket"), which removed the sole initializer of "res". Just return 0 to fix it. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-07 15:08:17 -04:00
Erik Hugne	a5e7ac5ce1	tipc: fix regression bug where node events are not being generated Commit `5902385a24` ("tipc: obsolete the remote management feature") introduces a regression where node topology events are not being generated because the publication that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available. This will break applications that rely on node events to discover when nodes join/leave a cluster. We fix this by advertising the node publication when TIPC enters networking mode, and withdraws it upon shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-03 16:03:57 -04:00
Erik Hugne	16470111ed	tipc: make discovery domain a bearer attribute The node discovery domain is assigned when a bearer is enabled. In the previous commit we reflect this attribute directly in the bearer structure since it's needed to reinitialize the node discovery mechanism after a hardware address change. There's no need to replicate this attribute anywhere else, so we remove it from the tipc_link_req structure. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 14:46:29 -04:00
Erik Hugne	a21a584d67	tipc: fix neighbor detection problem after hw address change If the hardware address of a underlying netdevice is changed, it is not enough to simply reset the bearer/links over this device. We also need to reflect this change in the TIPC bearer and node discovery structures aswell. This patch adds the necessary reinitialization of the node disovery mechanism following a hardware address change so that the correct originating media address is advertised in the discovery messages. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dong Liu <dliu.cn@gmail.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 14:46:29 -04:00
Ying Xue	dde2026608	tipc: use node list lock to protect tipc_num_links variable Without properly implicit or explicit read memory barrier, it's unsafe to read an atomic variable with atomic_read() from another thread which is different with the thread of changing the atomic variable with atomic_inc() or atomic_dec(). So a stale tipc_num_links may be got with atomic_read() in tipc_node_get_links(). If the tipc_num_links variable type is converted from atomic to unsigned integer and node list lock is used to protect it, the issue would be avoided. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:38 -04:00
Ying Xue	2220646a53	tipc: use node_list_lock to protect tipc_num_nodes variable As tipc_node_list is protected by rcu read lock on read side, it's unnecessary to hold node_list_lock to protect tipc_node_list in tipc_node_get_links(). Instead, node_list_lock should just protects tipc_num_nodes in the function. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	6c7a762e70	tipc: tipc: convert node list and node hlist to RCU lists Convert tipc_node_list list and node_htable hash list to RCU lists. On read side, the two lists are protected with RCU read lock, and on update side, node_list_lock is applied to them. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	46651c59c4	tipc: rename node create lock to protect node list and hlist When a node is created, tipc_net_lock read lock is first held and then node_create_lock is grabbed in order to prevent the same node from being created and inserted into both node list and hlist twice. But when we query node from the two node lists, we only hold tipc_net_lock read lock without grabbing node_create_lock. Obviously this locking policy is unable to guarantee that the two node lists are always synchronized especially when the operation of changing and accessing them occurs in different contexts like currently doing. Therefore, rename node_create_lock to node_list_lock to protect the two node lists, that is, whenever node is inserted into them or node is queried from them, the node_list_lock should be always held. As a result, tipc_net_lock read lock becomes redundant and then can be removed from the node query functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	987b58be37	tipc: make broadcast bearer store in bearer_list array Now unicast bearer is dynamically allocated and placed into its identity specified slot of bearer_list array. When we search bearer_list array with a bearer identity, the corresponding bearer instance can be found. But broadcast bearer is statically allocated and it is not located in the bearer_list array yet. So we decide to enlarge bearer_list array into MAX_BEARERS + 1 slots, and its last slot stores the broadcast bearer so that the broadcast bearer can be found from bearer_list array with MAX_BEARERS as index. The change will help us reduce the complex relationship between bearer and link in the future. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	f47de12b06	tipc: remove active flag from tipc_bearer structure After the allocation of tipc_bearer structure instance is converted from statical way to dynamical way, we identify whether a certain tipc_bearer structure pointer is valid by checking whether the pointer is NULL or not. So the active flag in tipc_bearer structure becomes redundant. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	3874ccbba8	tipc: convert tipc_bearers array to pointer list As part of the effort to introduce RCU protection for the bearer list, we first need to change it to a list of pointers. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:37 -04:00
Ying Xue	78dfb789b6	tipc: acquire necessary locks in named_cluster_distribute routine The 'tipc_node_list' is guarded by tipc_net_lock and 'links' array defined in 'tipc_node' structure is protected by node lock as well. Without acquiring the two locks in named_cluster_distribute() a fatal oops may happen in case that a destroyed link might be got and then accessed. Therefore, above mentioned two locks must be held in named_cluster_distribute() to prevent the issue from happening accidentally. As 'links' array in node struct must be protected by node lock, we have to move the code of selecting an active link from tipc_link_xmit() to named_cluster_distribute() and then call __tipc_link_xmit() with the selected link to deliver name messages. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:36 -04:00
Ying Xue	5902385a24	tipc: obsolete the remote management feature Due to the lacking of any credential, it's allowed to accept commands requested from remote nodes to query the local node status, which is prone to involve potential security risks. Instead, if we login to a remote node with ssh command, this approach is not only more safe than the remote management feature, but also it can give us more permissions like changing the remote node configuration. So it's reasonable for us to obsolete the remote management feature now. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:36 -04:00
Ying Xue	76d7882420	tipc: remove unnecessary checking for node object tipc_node_create routine doesn't need to check whether a node object specified with a node address exists or not because its caller(ie, tipc_disc_recv_msg routine) has checked this before calling it. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 13:08:36 -04:00
David S. Miller	04f58c8854	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/devicetree/bindings/net/micrel-ks8851.txt net/core/netpoll.c The net/core/netpoll.c conflict is a bug fix in 'net' happening to code which is completely removed in 'net-next'. In micrel-ks8851.txt we simply have overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-25 20:29:20 -04:00
Erik Hugne	a5d0e7c037	tipc: fix spinlock recursion bug for failed subscriptions If a topology event subscription fails for any reason, such as out of memory, max number reached or because we received an invalid request the correct behavior is to terminate the subscribers connection to the topology server. This is currently broken and produces the following oops: [27.953662] tipc: Subscription rejected, illegal request [27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6 [27.957066] lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1 [27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5 [27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc] [27.961430] ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050 [27.962292] ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a [27.963152] ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520 [27.964023] Call Trace: [27.964292] [<ffffffff815c0207>] dump_stack+0x45/0x56 [27.964874] [<ffffffff815beec5>] spin_dump+0x8c/0x91 [27.965420] [<ffffffff815beeeb>] spin_bug+0x21/0x26 [27.965995] [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140 [27.966631] [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20 [27.967256] [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc] [27.968051] [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc] [27.968722] [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc] [27.969436] [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc] [27.970209] [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc] [27.970972] [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc] [27.971633] [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0 [27.972267] [<ffffffff8105e869>] worker_thread+0x119/0x3a0 [27.972896] [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0 [27.973622] [<ffffffff810648af>] kthread+0xdf/0x100 [27.974168] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 [27.974893] [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0 [27.975466] [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0 The recursion occurs when subscr_terminate tries to grab the subscriber lock, which is already taken by subscr_conn_msg_event. We fix this by checking if the request to establish a new subscription was successful, and if not we initiate termination of the subscriber after we have released the subscriber lock. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-24 15:36:56 -04:00
David S. Miller	85dcce7a73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/r8152.c drivers/net/xen-netback/netback.c Both the r8152 and netback conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:31:55 -04:00
Jon Paul Maloy	5c311421a2	tipc: eliminate redundant lookups in registry As an artefact from the native interface, the message sending functions in the port takes a port ref as first parameter, and then looks up in the registry to find the corresponding port pointer. This despite the fact that the only currently existing caller, tipc_sock, already knows this pointer. We change the signature of these functions to take a struct tipc_port* argument, and remove the redundant lookups. We also remove an unmotivated extra lookup in the function socket.c:auto_connect(), and, as the lookup functions tipc_port_deref() and ref_deref() now become unused, we remove these two functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	58ed944241	tipc: align usage of variable names and macros in socket The practice of naming variables in TIPC is inconistent, sometimes even within the same file. In this commit we align variable names and declarations within socket.c, and function and macro names within socket.h. We also reduce the number of conversion macros to two, in order to make usage less obsure. These changes are purely cosmetic. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	3b4f302d85	tipc: eliminate redundant locking The three functions tipc_portimportance(), tipc_portunreliable() and tipc_portunreturnable() and their corresponding tipc_set* functions, are all grabbing port_lock when accessing the targeted port. This is unnecessary in the current code, since these calls only are made from within socket downcalls, already protected by sock_lock. We remove the redundant locking. Also, since the functions now become trivial one-liners, we move them to port.h and make them inline. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	24be34b5a0	tipc: eliminate upcall function pointers between port and socket Due to the original one-to-many relation between port and user API layers, upcalls to the API have been performed via function pointers, installed in struct tipc_port at creation. Since this relation now always is one-to-one, we can instead use ordinary function calls. We remove the function pointers 'dispatcher' and ´wakeup' from struct tipc_port, and replace them with calls to the renamed functions tipc_sk_rcv() and tipc_sk_wakeup(). At the same time we change the name and signature of the functions tipc_createport() and tipc_deleteport() to reflect their new role as mere initialization/destruction functions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	8826cde655	tipc: aggregate port structure into socket structure After the removal of the tipc native API the relation between a tipc_port and its API types is strictly one-to-one, i.e, the latter can now only be a socket API. There is therefore no need to allocate struct tipc_port and struct sock independently. In this commit, we aggregate struct tipc_port into struct tipc_sock, hence saving both CPU cycles and structure complexity. There are no functional changes in this commit, except for the elimination of the separate allocation/freeing of tipc_port. All other changes are just adaptatons to the new data structure. This commit also opens up for further code simplifications and code volume reduction, something we will do in later commits. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	f9fef18c6d	tipc: remove redundant 'peer_name' field in struct tipc_sock The field 'peer_name' in struct tipc_sock is redundant, since this information already is available from tipc_port, to which tipc_sock has a reference. We remove the field, and ensure that peer node and peer port info instead is fetched via the functions that already exist for this purpose. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Jon Paul Maloy	978813ee89	tipc: replace reference table rwlock with spinlock The lock for protecting the reference table is declared as an RWLOCK, although it is only used in write mode, never in read mode. We redefine it to become a spinlock. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:53:49 -04:00
Joe Perches	184593c734	tipc: Convert uses of __constant_<foo> to <foo> The use of __constant_<foo> has been unnecessary for quite awhile now. Make these uses consistent with the rest of the kernel. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-12 15:28:06 -04:00
Erik Hugne	2892505ea1	tipc: don't log disabled tasklet handler errors Failure to schedule a TIPC tasklet with tipc_k_signal because the tasklet handler is disabled is not an error. It means TIPC is currently in the process of shutting down. We remove the error logging in this case. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:24 -05:00
Erik Hugne	1bb8dce57f	tipc: fix memory leak during module removal When the TIPC module is removed, the tasklet handler is disabled before all other subsystems. This will cause lingering publications in the name table because the node_down tasklets responsible to clean up publications from an unreachable node will never run. When the name table is shut down, these publications are detected and an error message is logged: tipc: nametbl_stop(): orphaned hash chain detected This is actually a memory leak, introduced with commit `993b858e37` ("tipc: correct the order of stopping services at rmmod") Instead of just logging an error and leaking memory, we free the orphaned entries during nametable shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:24 -05:00
Erik Hugne	edcc0511b5	tipc: drop subscriber connection id invalidation When a topology server subscriber is disconnected, the associated connection id is set to zero. A check vs zero is then done in the subscription timeout function to see if the subscriber have been shut down. This is unnecessary, because all subscription timers will be cancelled when a subscriber terminates. Setting the connection id to zero is actually harmful because id zero is the identity of the topology server listening socket, and can cause a race that leads to this socket being closed instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
Ying Xue	fe8e464939	tipc: avoid to unnecessary process switch under non-block mode When messages are received via tipc socket under non-block mode, schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is, the process of receiving messages will be scheduled once although timeout value passed to schedule_timeout() is 0. The same issue exists in accept()/wait_for_accept(). To avoid this unnecessary process switch, we only call schedule_timeout() if the timeout value is non-zero. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
Ying Xue	4652edb70e	tipc: fix connection refcount leak When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a connection instance, its reference count value is increased if it's found. But subsequently if it's found that the connection is closed, the work of sending message is not queued into its server send workqueue, and the connection reference count is not decreased. This will cause a reference count leak. To reproduce this problem, an application would need to open and closes topology server connections with high intensity. We fix this by immediately decrementing the connection reference count if a send fails due to the connection being closed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
Ying Xue	6d4ebeb4df	tipc: allow connection shutdown callback to be invoked in advance Currently connection shutdown callback function is called when connection instance is released in tipc_conn_kref_release(), and receiving packets and sending packets are running in different threads. Even if connection is closed by the thread of receiving packets, its shutdown callback may not be called immediately as the connection reference count is non-zero at that moment. So, although the connection is shut down by the thread of receiving packets, the thread of sending packets doesn't know it. Before its shutdown callback is invoked to tell the sending thread its connection has been closed, the sending thread may deliver messages by tipc_conn_sendmsg(), this is why the following error information appears: "Sending subscription event failed, no memory" To eliminate it, allow connection shutdown callback function to be called before connection id is removed in tipc_close_conn(), which makes the sending thread know the truth in time that its socket is closed so that it doesn't send message to it. We also remove the "Sending XXX failed..." error reporting for topology and config services. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 14:46:23 -05:00
David S. Miller	67ddc87f16	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-05 20:32:02 -05:00
Ying Xue	970122fdf4	tipc: make bearer set up in module insertion stage Accidentally a side effect is involved by commit 6e967adf7(tipc: relocate common functions from media to bearer). Now tipc stack handler of receiving packets from netdevices as well as netdevice notification handler are registered when bearer is enabled rather than tipc module initialization stage, but the two handlers are both unregistered in tipc module exit phase. If tipc module is inserted and then immediately removed, the following warning message will appear: "dev_remove_pack: ffffffffa0380940 not found" This is because in module insertion stage tipc stack packet handler is not registered at all, but in module exit phase dev_remove_pack() needs to remove it. Of course, dev_remove_pack() cannot find tipc protocol handler from the kernel protocol handler list so that the warning message is printed out. But if registering the two handlers is adjusted from enabling bearer phase into inserting module stage, the warning message will be eliminated. Due to this change, tipc_core_start_net() and tipc_core_stop_net() can be deleted as well. Reported-by: Wang Weidong <wangweidong1@huawei.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:00:15 -05:00
Ying Xue	9fe7ed4749	tipc: remove all enabled flags from all tipc components When tipc module is inserted, many tipc components are initialized one by one. During the initialization period, if one of them is failed, tipc_core_stop() will be called to stop all components whatever corresponding components are created or not. To avoid to release uncreated ones, relevant components have to add necessary enabled flags indicating whether they are created or not. But in the initialization stage, if one component is unsuccessfully created, we will just destroy successfully created components before the failed component instead of all components. All enabled flags defined in components, in turn, become redundant. Additionally it's also unnecessary to identify whether table.types is NULL in tipc_nametbl_stop() because name stable has been definitely created successfully when tipc_nametbl_stop() is called. Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-22 00:00:15 -05:00
Erik Hugne	63fa01c147	tipc: failed transmissions should return error When a message could not be sent out because the destination node or link could not be found, the full message size is returned from sendmsg() as if it had been sent successfully. An application will then get a false indication that it's making forward progress. This problem has existed since the initial commit in 2.6.16. We change this to return -ENETUNREACH if the message cannot be delivered due to the destination node/link being unavailable. We also get rid of the redundant tipc_reject_msg call since freeing the buffer and doing a tipc_port_iovec_reject accomplishes exactly the same thing. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-19 16:40:57 -05:00
David S. Miller	1e8d6421cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_3ad.h drivers/net/bonding/bond_main.c Two minor conflicts in bonding, both of which were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-19 01:24:22 -05:00
Ying Xue	247f0f3c31	tipc: align tipc function names with common naming practice in the network Rename the following functions, which are shorter and more in line with common naming practice in the network subsystem. tipc_bclink_send_msg->tipc_bclink_xmit tipc_bclink_recv_pkt->tipc_bclink_rcv tipc_disc_recv_msg->tipc_disc_rcv tipc_link_send_proto_msg->tipc_link_proto_xmit link_recv_proto_msg->tipc_link_proto_rcv link_send_sections_long->tipc_link_iovec_long_xmit tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast tipc_link_send_sync->tipc_link_sync_xmit tipc_link_recv_sync->tipc_link_sync_rcv tipc_link_send_buf->__tipc_link_xmit tipc_link_send->tipc_link_xmit tipc_link_send_names->tipc_link_names_xmit tipc_named_recv->tipc_named_rcv tipc_link_recv_bundle->tipc_link_bundle_rcv tipc_link_dup_send_queue->tipc_link_dup_queue_xmit link_send_long_buf->tipc_link_frag_xmit tipc_multicast->tipc_port_mcast_xmit tipc_port_recv_mcast->tipc_port_mcast_rcv tipc_port_reject_sections->tipc_port_iovec_reject tipc_port_recv_proto_msg->tipc_port_proto_rcv tipc_connect->tipc_port_connect __tipc_connect->__tipc_port_connect __tipc_disconnect->__tipc_port_disconnect tipc_disconnect->tipc_port_disconnect tipc_shutdown->tipc_port_shutdown tipc_port_recv_msg->tipc_port_rcv tipc_port_recv_sections->tipc_port_iovec_rcv release->tipc_release accept->tipc_accept bind->tipc_bind get_name->tipc_getname poll->tipc_poll send_msg->tipc_sendmsg send_packet->tipc_send_packet send_stream->tipc_send_stream recv_msg->tipc_recvmsg recv_stream->tipc_recv_stream connect->tipc_connect listen->tipc_listen shutdown->tipc_shutdown setsockopt->tipc_setsockopt getsockopt->tipc_getsockopt Above changes have no impact on current users of the functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-18 17:31:59 -05:00
Jon Paul Maloy	a11607f5a1	tipc: correct usage of spin_lock() vs spin_lock_bh() I commit `e099e86c9e` ("tipc: add node_lock protection to link lookup function") we are calling spin_lock(&node->lock) directly instead of indirectly via the tipc_node_lock(node) function. However, tipc_node_lock() is using spin_lock_bh(), not spin_lock(), something leading to unbalanced usage in one place, and a smatch warning. We fix this by consistently using tipc_node_lock()/unlock() in in the places touched by the mentioned commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:26:34 -05:00
Jon Paul Maloy	074bb43e9e	tipc: fix a loop style problem In commit `7d33939f47` ("tipc: delay delete of link when failover is needed") we introduced a loop for finding and removing a link pointer in an array. The removal is done after we have left the loop, giving the impression that one may remove the wrong pointer if no matching element is found. This is not really a bug, since we know that there will always be a matching element, but it looks wrong, and causes a smatch warning. We fix this loop with this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-17 00:26:26 -05:00
Jon Paul Maloy	e099e86c9e	tipc: add node_lock protection to link lookup function In an earlier commit, ("tipc: remove links list from bearer struct") we described three issues that need to be pre-emptively resolved before we can remove tipc_net_lock. Here we resolve issue a) described in that commit: "a) In access method #2, we access the link before taking the protecting node_lock. This will not work once net_lock is gone, so we will have to change the access order. We will deal with this in a later commit in this series." Here, we change that access order, by ensuring that the function link_find_link() returns only a safe reference for finding the link, i.e., a node pointer and an index into its 'links' array, not the link pointer itself. We also change all callers of this function to first take the node lock before they can check if there still is a valid link pointer at the returned index. Since the function now returns a node pointer rather than a link pointer, we rename it to the more appropriate 'tipc_link_find_owner(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:07 -05:00
Ying Xue	a83045292d	tipc: remove bearer_lock from tipc_bearer struct After the earlier commits ("tipc: remove 'links' list from tipc_bearer struct") and ("tipc: introduce new spinlock to protect struct link_req"), there is no longer any need to protect struct link_req or or any link list by use of bearer_lock. Furthermore, we have eliminated the need for using bearer_lock during downcalls (send) from the link to the bearer, since we have ensured that bearers always have a longer life cycle that their associated links, and always contain valid data. So, the only need now for a lock protecting bearers is for guaranteeing consistency of the bearer list itself. For this, it is sufficient, at least for the time being, to continue applying 'net_lock´ in write mode. By removing bearer_lock we also pre-empt introduction of issue b) descibed in the previous commit "tipc: remove 'links' list from tipc_bearer struct": "b) When the outer protection from net_lock is gone, taking bearer_lock and node_lock in opposite order of method 1) and 2) will become an obvious deadlock hazard". Therefore, we now eliminate the bearer_lock spinlock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:07 -05:00
Jon Paul Maloy	7d33939f47	tipc: delay delete of link when failover is needed When a bearer is disabled, all its attached links are deleted. Ideally, we should do link failover to redundant links on other bearers, if there are any, in such cases. This would be consistent with current behavior when a link is reset, but not deleted. However, due to the complexity involved, and the (wrongly) perceived low demand for this feature, it was never implemented until now. We mark the doomed link for deletion with a new flag, but wait until the failover process is finished before we actually delete it. With the improved link tunnelling/failover code introduced earlier in this commit series, it is now easy to identify a spot in the code where the failover is finished and it is safe to delete the marked link. Moreover, the test for the flag and the deletion can be done synchronously, and outside the most time critical data path. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:07 -05:00
Jon Paul Maloy	a5377831eb	tipc: changes to general packet reception algorithm We change the order of checking for destination users when processing incoming packets. By placing the checks for users that may potentially replace the processed buffer, i.e., CHANGEOVER_PROTOCOL and MSG_FRAGMENTER, in a separate step before we check for the true end users, we get rid of a label and a 'goto', at the same time making the code more comprehensible and easy to follow. This commit does not change any functionality, it is just a cosmetic code reshuffle. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	02842f718d	tipc: rename stack variables in function tipc_link_tunnel_rcv After the previous redesign of the tunnel reception algorithm and functions, we finalize it by renaming a couple of stack variables in tipc_tunnel_rcv(). This makes it more consistent with the naming scheme elsewhere in this part of the code. This change is purely cosmetic, with no functional changes. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	1e9d47a948	tipc: more cleanup of tunnelling reception function We simplify and slim down the code in function tipc_tunnel_rcv() No impact on the users of this function. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	3bb533800c	tipc: change signature of tunnelling reception function After the earlier commits in this series related to the function tipc_link_tunnel_rcv(), we can now go further and simplify its signature. The function now consumes all DUPLICATE packets, and only returns such ORIGINAL packets that are ready for immediate delivery, i.e., no more link level protocol processing needs to be done by the caller. As a consequence, the the caller, tipc_rcv(), does not access the link pointer after call return, and it becomes unnecessary to pass a link pointer reference in the call. Instead, we now only pass it the tunnel link's owner node, which is sufficient to find the destination link for the tunnelled packet. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	f006c9c70f	tipc: change reception of tunnelled failover packets When a link is reset, and there is a redundant link available, all sender sockets will steer their subsequent traffic through the remaining link. In order to guarantee preserved packet order and cardinality during the transition, we tunnel the failing link's send queue through the remaining link before we allow any sockets to use it. In this commit, we change the algorithm for receiving failover ("ORIGINAL_MSG") packets in tipc_link_tunnel_rcv(), at the same time delegating it to a new subfuncton, tipc_link_failover_rcv(). Instead of directly returning an extracted inner packet to the packet reception loop in tipc_rcv(), we first check if it is a message fragment, in which case we append it to the reset link's fragment chain. If the fragment chain is complete, we return the whole chain instead of the individual buffer, eliminating any need for the tipc_rcv() loop to do reassembly of tunneled packets. This change makes it possible to further simplify tipc_link_tunnel_rcv(), as well as the calling tipc_rcv() loop. We will do that in later commits. It also makes it possible to identify a single spot in the code where we can tell that a failover procedure is finished, something that is useful when we are deleting links after a failover. This will also be done in a later commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Jon Paul Maloy	1dab3d5ac2	tipc: change reception of tunnelled duplicate packets When a second link to a destination comes up, some sender sockets will steer their subsequent traffic through the new link. In order to guarantee preserved packet order and cardinality for those sockets, we tunnel a duplicate of the old link's send queue through the new link before we open it for regular traffic. The last arriving packet copy, on whichever link, will be dropped at the receiving end based on the original sequence number, to ensure that only one copy is delivered to the end receiver. In this commit, we change the algorithm for receiving DUPLICATE_MSG packets, at the same time delegating it to a new subfunction, tipc_link_dup_rcv(). Instead of returning an extracted inner packet to the packet reception loop in tipc_rcv(), we just add it to the receiving (new) link's deferred packet queue. The packet will then be processed by that link when it receives its first non-tunneled packet, i.e., at latest when the changeover procedure is finished. Because tipc_link_tunnel_rcv()/tipc_link_dup_rcv() now is consuming all packets of type DUPLICATE_MSG, the calling tipc_rcv() function can omit testing for this. This in turn means that the current conditional jump to the label 'protocol_check' becomes redundant, and we can remove that label. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:06 -05:00
Ying Xue	c61dd61dec	tipc: remove 'links' list from tipc_bearer struct In our ongoing effort to simplify the TIPC locking structure, we see a need to remove the linked list for tipc_links in the bearer. This can be explained as follows. Currently, we have three different ways to access a link, via three different lists/tables: 1: Via a node hash table: Used by the time-critical outgoing/incoming data paths. (e.g. link_send_sections_fast() and tipc_recv_msg() ): grab net_lock(read) find node from node hash table grab node_lock select link grab bearer_lock send_msg() release bearer_lock release node lock release net_lock 2: Via a global linked list for nodes: Used by configuration commands (link_cmd_set_value()) grab net_lock(read) find node and link from global node list (using link name) grab node_lock update link release node lock release net_lock (Same locking order as above. No problem.) 3: Via the bearer's linked link list: Used by notifications from interface (e.g. tipc_disable_bearer() ) grab net_lock(write) grab bearer_lock get link ptr from bearer's link list get node from link grab node_lock delete link release node lock release bearer_lock release net_lock (Different order from above, but works because we grab the outer net_lock in write mode first, excluding all other access.) The first major goal in our simplification effort is to get rid of the "big" net_lock, replacing it with rcu-locks when accessing the node list and node hash array. This will come in a later patch series. But to get there we first need to rewrite access methods ##2 and 3, since removal of net_lock would introduce three major problems: a) In access method #2, we access the link before taking the protecting node_lock. This will not work once net_lock is gone, so we will have to change the access order. We will deal with this in a later commit in this series, "tipc: add node lock protection to link found by link_find_link()". b) When the outer protection from net_lock is gone, taking bearer_lock and node_lock in opposite order of method 1) and 2) will become an obvious deadlock hazard. This is fixed in the commit ("tipc: remove bearer_lock from tipc_bearer struct") later in this series. c) Similar to what is described in problem a), access method #3 starts with using a link pointer that is unprotected by node_lock, in order to via that pointer find the correct node struct and lock it. Before we remove net_lock, this access order must be altered. This is what we do with this commit. We can avoid introducing problem problem c) by even here using the global node list to find the node, before accessing its links. When we loop though the node list we use the own bearer identity as search criteria, thus easily finding the links that are associated to the resetting/disabling bearer. It should be noted that although this method is somewhat slower than the current list traversal, it is in no way time critical. This is only about resetting or deleting links, something that must be considered relatively infrequent events. As a bonus, we can get rid of the mutual pointers between links and bearers. After this commit, pointer dependency go in one direction only: from the link to the bearer. This commit pre-empts introduction of problem c) as described above. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Ying Xue	135daee6d3	tipc: redefine 'started' flag in struct link to bitmap Currently, the 'started' field in struct tipc_link represents only a binary state, 'started' or 'not started'. We need it to represent more link execution states in the coming commits in this series. Hence, we rename the field to 'flags', and define the current started/non-started state to be represented by the LSB bit of that field. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Ying Xue	8d8439b686	tipc: move code for deleting links from bearer.c to link.c We break out the code for deleting attached links in the function bearer_disable(), and define a new function named tipc_link_delete_list() to do this job. This commit incurs no functional changes, but makes the code of function bearer_disable() cleaner. It is also a preparation for a more important change to the bearer code, in a subsequent commit in this series. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Ying Xue	e0ca2c30b1	tipc: move code for resetting links from bearer.c to link.c We break out the code for resetting attached links in the function tipc_reset_bearer(), and define a new function named tipc_link_reset_list() to do this job. This commit incurs no functional changes, but makes the code of function tipc_reset_bearer() cleaner. It is also a preparation for a more important change to the bearer code, in a subsequent commit in this series. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Jon Paul Maloy	03b9201793	tipc: stricter behavior of message reassembly function The function tipc_link_recv_fragment(struct sk_buff **buf) currently leaves the value of the input buffer pointer undefined when it returns, except when the return code indicates that the reassembly is complete. This despite the fact that it always consumes the input buffer. Here, we enforce a stricter behavior by this function, ensuring that the returned buffer pointer is non-NULL if and only if the reassembly is complete. This makes it possible to test for the buffer pointer as criteria for successful reassembly. We also rename the function to tipc_link_frag_rcv(), which is both shorter and more in line with common naming practice in the network subsystem. Apart from the new name, these changes have no impact on current users of the function, but makes it more practical for use in some planned future commits. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:57:05 -05:00
Andreas Bofjäll	b3f0f5c357	tipc: explicitly include core.h in addr.h The inline functions in addr.h uses tipc_own_addr which is exported by core.h, but addr.h never actually includes it. It works because it is explicitly included where this is used, but it looks a bit strange. Include core.h in addr.h explicitly to make the dependency clearer. Signed-off-by: Andreas Bofjäll <andreas.bofjall@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 17:49:13 -05:00
Erik Hugne	64380a04de	tipc: fix message corruption bug for deferred packets If a packet received on a link is out-of-sequence, it will be placed on a deferred queue and later reinserted in the receive path once the preceding packets have been processed. The problem with this is that it will be subject to the buffer adjustment from link_recv_buf_validate twice. The second adjustment for 20 bytes header space will corrupt the packet. We solve this by tagging the deferred packets and bail out from receive buffer validation for packets that have already been subjected to this. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-13 16:35:05 -05:00
Steffen Hurrle	342dfc306f	net: add build-time checks for msg->msg_name size This is a follow-up patch to `f3d3342602` ("net: rework recvmsg handler msg_name and msg_namelen logic"). DECLARE_SOCKADDR validates that the structure we use for writing the name information to is not larger than the buffer which is reserved for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR consistently in sendmsg code paths. Signed-off-by: Steffen Hurrle <steffen@hurrle.net> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-18 23:04:16 -08:00
Ying Xue	9bbb4ecc68	tipc: standardize recvmsg routine Standardize the behaviour of waiting for events in TIPC recvmsg() so that all variables of socket or port structures are protected within socket lock, allowing the process of calling recvmsg() to be woken up at appropriate time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	391a6dd1da	tipc: standardize sendmsg routine of connected socket Standardize the behaviour of waiting for events in TIPC send_packet() so that all variables of socket or port structures are protected within socket lock, allowing the process of calling sendmsg() to be woken up at appropriate time. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	3f40504f7e	tipc: standardize sendmsg routine of connectionless socket Comparing the behaviour of how to wait for events in TIPC sendmsg() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, sk_sleep() and tport->congested variables associated with socket are exposed without socket lock protection while wait_event_interruptible_timeout() accesses them. So standardizing it with similar implementation in other stacks can help us correct these errors which the process of calling sendmsg() cannot be woken up event if an expected event arrive at socket or improperly woken up although the wake condition doesn't match. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	6398e23cdb	tipc: standardize accept routine Comparing the behaviour of how to wait for events in TIPC accept() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. As sk_sleep() and sk->sk_receive_queue variables associated with socket are not protected by socket lock, the process of calling accept() may be woken up improperly or sometimes cannot be woken up at all. After standardizing it with inet_csk_wait_for_connect routine, we can get benefits including: avoiding 'thundering herd' phenomenon, adding a timeout mechanism for accept(), coping with a pending signal, and having sk_sleep() and sk->sk_receive_queue being always protected within socket lock scope and so on. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
Ying Xue	78eb3a5379	tipc: standardize connect routine Comparing the behaviour of how to wait for events in TIPC connect() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, as both sock->state and sk_sleep() are directly fed to wait_event_interruptible_timeout() as its arguments, and socket lock has to be released before we call wait_event_interruptible_timeout(), the two variables associated with socket are exposed out of socket lock protection, thereby probably getting stale values so that the process of calling connect() cannot be woken up exactly even if correct event arrives or it is woken up improperly even if the wake condition is not satisfied in practice. Therefore, standardizing its behaviour with sk_stream_wait_connect routine can avoid these risks. Additionally the implementation of connect routine is simplified as a whole, allowing it to return correct values in all different cases. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-16 19:10:34 -08:00
stephen hemminger	963a185539	tipc: spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-14 18:18:22 -08:00
David S. Miller	0a379e21c5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-01-14 14:42:42 -08:00
Jon Paul Maloy	581465fa28	tipc: make link start event synchronous When a link is created we delay the start event by launching it to be executed later in a tasklet. As we hold all the necessary locks at the moment of creation, and there is no risk of deadlock or contention, this delay serves no purpose in the current code. We remove this obsolete indirection step, and the associated function link_start(). At the same time, we rename the function tipc_link_stop() to the more appropriate tipc_link_purge_queues(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:26 -05:00
Ying Xue	f9a2c80b8b	tipc: introduce new spinlock to protect struct link_req Currently, only 'bearer_lock' is used to protect struct link_req in the function disc_timeout(). This is unsafe, since the member fields 'num_nodes' and 'timer_intv' might be accessed by below three different threads simultaneously, none of them grabbing bearer_lock in the critical region: link_activate() tipc_bearer_add_dest() tipc_disc_add_dest() req->num_nodes++; tipc_link_reset() tipc_bearer_remove_dest() tipc_disc_remove_dest() req->num_nodes-- disc_update() read req->num_nodes write req->timer_intv disc_timeout() read req->num_nodes read/write req->timer_intv Without lock protection, the only symptom of a race is that discovery messages occasionally may not be sent out. This is not fatal, since such messages are best-effort anyway. On the other hand, since discovery messages are not time critical, adding a protecting lock brings no serious overhead either. So we add a new, dedicated spinlock in order to guarantee absolute data consistency in link_req objects. This also helps reduce the overall role of the bearer_lock, which we want to remove completely in a later commit series. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Jon Paul Maloy	b9d4c33935	tipc: remove 'has_redundant_link' flag from STATE link protocol messages The flag 'has_redundant_link' is defined only in RESET and ACTIVATE protocol messages. Due to an ambiguity in the protocol specification it is currently also transferred in STATE messages. Its value is used to initialize a link state variable, 'permit_changeover', which is used to inhibit futile link failover attempts when it is known that the peer node has no working links at the moment, although the local node may still think it has one. The fact that 'has_redundant_link' incorrectly is read from STATE messages has the effect that 'permit_changeover' sometimes gets a wrong value, and permanently blocks any links from being re-established. Such failures can only occur in in dual-link systems, and are extremely rare. This bug seems to have always been present in the code. Furthermore, since commit `b4b5610223` ("tipc: Ensure both nodes recognize loss of contact between them"), the 'permit_changeover' field serves no purpose any more. The task of enforcing 'lost contact' cycles at both peer endpoints is now taken by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN in struct tipc_node to abort unnecessary failover attempts. We therefore remove the 'has_redundant_link' flag from STATE messages, as well as the now redundant 'permit_changeover' variable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Jon Paul Maloy	170b3927b4	tipc: rename functions related to link failover and improve comments The functionality related to link addition and failover is unnecessarily hard to understand and maintain. We try to improve this by renaming some of the functions, at the same time adding or improving the explanatory comments around them. Names such as "tipc_rcv()" etc. also align better with what is used in other networking components. The changes in this commit are purely cosmetic, no functional changes are made. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 18:44:25 -05:00
Erik Hugne	732256b933	tipc: correctly unlink packets from deferred packet queue When we pull a received packet from a link's 'deferred packets' queue for processing, its 'next' pointer is not cleared, and still refers to the next packet in that queue, if any. This is incorrect, but caused no harm before commit `40ba3cdf54` ("tipc: message reassembly using fragment chain") was introduced. After that commit, it may sometimes lead to the following oops: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: tipc CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W 3.13.0-rc2+ #6 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880017af4880 ti: ffff880017aee000 task.ti: ffff880017aee000 RIP: 0010:[<ffffffff81710694>] [<ffffffff81710694>] skb_try_coalesce+0x44/0x3d0 RSP: 0018:ffff880016603a78 EFLAGS: 00010212 RAX: 6b6b6b6bd6d6d6d6 RBX: ffff880013106ac0 RCX: ffff880016603ad0 RDX: ffff880016603ad7 RSI: ffff88001223ed00 RDI: ffff880013106ac0 RBP: ffff880016603ab8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001223ed00 R13: ffff880016603ad0 R14: 000000000000058c R15: ffff880012297650 FS: 0000000000000000(0000) GS:ffff880016600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000805b000 CR3: 0000000011f5d000 CR4: 00000000000006e0 Stack: ffff880016603a88 ffffffff810a38ed ffff880016603aa8 ffff88001223ed00 0000000000000001 ffff880012297648 ffff880016603b68 ffff880012297650 ffff880016603b08 ffffffffa0006c51 ffff880016603b08 00ffffffa00005fc Call Trace: <IRQ> [<ffffffff810a38ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffa0006c51>] tipc_link_recv_fragment+0xd1/0x1b0 [tipc] [<ffffffffa0007214>] tipc_recv_msg+0x4e4/0x920 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffffa000177c>] tipc_l2_rcv_msg+0xcc/0x250 [tipc] [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc] [<ffffffff8171e65b>] __netif_receive_skb_core+0x80b/0xd00 [<ffffffff8171df94>] ? __netif_receive_skb_core+0x144/0xd00 [<ffffffff8171eb76>] __netif_receive_skb+0x26/0x70 [<ffffffff8171ed6d>] netif_receive_skb+0x2d/0x200 [<ffffffff8171fe70>] napi_gro_receive+0xb0/0x130 [<ffffffff815647c2>] e1000_clean_rx_irq+0x2c2/0x530 [<ffffffff81565986>] e1000_clean+0x266/0x9c0 [<ffffffff81985f7b>] ? notifier_call_chain+0x2b/0x160 [<ffffffff8171f971>] net_rx_action+0x141/0x310 [<ffffffff81051c1b>] __do_softirq+0xeb/0x480 [<ffffffff819817bb>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810b8c42>] ? handle_fasteoi_irq+0x72/0x100 [<ffffffff81052346>] irq_exit+0x96/0xc0 [<ffffffff8198cbc3>] do_IRQ+0x63/0xe0 [<ffffffff81981def>] common_interrupt+0x6f/0x6f <EOI> This happens when the last fragment of a message has passed through the the receiving link's 'deferred packets' queue, and at least one other packet was added to that queue while it was there. After the fragment chain with the complete message has been successfully delivered to the receiving socket, it is released. Since 'next' pointer of the last fragment in the released chain now is non-NULL, we get the crash shown above. We fix this by clearing the 'next' pointer of all received packets, including those being pulled from the 'deferred' queue, before they undergo any further processing. Fixes: `40ba3cdf54` ("tipc: message reassembly using fragment chain") Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-07 16:15:24 -05:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
stephen hemminger	eec73f1c96	tipc: remove unused code Remove dead code; tipc_bearer_find_interface tipc_node_redundant_links This may break out of tree version of TIPC if there still is one. But that maybe a good thing :-) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:18:50 -05:00
stephen hemminger	9805696399	tipc: make local function static Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:18:50 -05:00
wangweidong	b055597697	tipc: make the code look more readable In commit `3b8401fe9d` ("tipc: kill unnecessary goto's") didn't make the code look most readable, so fix it. This patch is cosmetic and does not change the operation of TIPC in any way. Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-02 03:30:35 -05:00
Ying Xue	84602761ca	tipc: fix deadlock during socket release A deadlock might occur if name table is withdrawn in socket release routine, and while packets are still being received from bearer. CPU0 CPU1 T0: recv_msg() release() T1: tipc_recv_msg() tipc_withdraw() T2: [grab node lock] [grab port lock] T3: tipc_link_wakeup_ports() tipc_nametbl_withdraw() T4: [grab port lock]* named_cluster_distribute() T5: wakeupdispatch() tipc_link_send() T6: [grab node lock]* The opposite order of holding port lock and node lock on above two different paths may result in a deadlock. If socket lock instead of port lock is used to protect port instance in tipc_withdraw(), the reverse order of holding port lock and node lock will be eliminated, as a result, the deadlock is killed as well. Reported-by: Lars Everbrand <lars.everbrand@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-29 22:24:07 -05:00
David S. Miller	143c905494	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-18 16:42:06 -05:00
wangweidong	d3fbccf2b0	tipc: change lock_sock order in connect() Instead of reaquiring the socket lock and taking the normal exit path when a connection times out, we bail out early with a return -ETIMEDOUT. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-16 12:48:35 -05:00
wangweidong	776a74ce07	tipc: Use <linux/uaccess.h> instead of <asm/uaccess.h> As warned by checkpatch.pl, use #include <linux/uaccess.h> instead of <asm/uaccess.h> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-16 12:48:35 -05:00
wangweidong	3b8401fe9d	tipc: kill unnecessary goto's Remove a number of needless 'goto exit' in send_stream when the socket is in an unconnected state. This patch is cosmetic and does not alter the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-16 12:48:35 -05:00
wangweidong	0cee6bbe06	tipc: remove unnecessary variables and conditions We remove a number of unnecessary variables and branches in TIPC. This patch is cosmetic and does not change the operation of TIPC in any way. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-16 12:48:35 -05:00
Ying Xue	77a7e07a78	tipc: remove unused 'blocked' flag from tipc_link struct In early versions of TIPC it was possible to administratively block individual links through the use of the member flag 'blocked'. This functionality was deemed redundant, and since commit 7368dd ("tipc: clean out all instances of #if 0'd unused code"), this flag has been unused. In the current code, a link only needs to be blocked for sending and reception if it is subject to an ongoing link failover. In that case, it is sufficient to check if the number of expected failover packets is non-zero, something which is done via the funtion 'link_blocked()'. This commit finally removes the redundant 'blocked' flag completely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:43 -05:00
Ying Xue	e4d050cbf7	tipc: eliminate code duplication in media layer Currently TIPC supports two L2 media types, Ethernet and Infiniband. Because both these media are accessed through the common net_device API, several functions in the two media adaptation files turn out to be fully or almost identical, leading to unnecessary code duplication. In this commit we extract this common code from the two media files and move them to the generic bearer.c. Additionally, we change the function names to reflect their real role: to access L2 media, irrespective of type. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:43 -05:00
Ying Xue	6e967adf79	tipc: relocate common functions from media to bearer Currently, registering a TIPC stack handler in the network device layer is done twice, once for Ethernet (eth_media) and Infiniband (ib_media) repectively. But, as this registration is not media specific, we can avoid some code duplication by moving the registering function to the generic bearer layer, to the file bearer.c, and call it only once. The same is true for the network device event notifier. As a side effect, the two workqueues we are using for for setting up/ cleaning up media can now be eliminated. Furthermore, the array for storing the specific media type structs, media_array[], can be entirely deleted. Note that the eth_started and ib_started flags were removed during the code relocation. There is now only one call to bearer_setup and bearer_cleanup, and these can logically not race against each other. Despite its size, this cleanup work incurs no functional changes in TIPC. In particular, it should be noted that the sequence ordering of received packets is unaffected by this change, since packet reception never was subject to any work queue handling in the first place. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:43 -05:00
Ying Xue	37cb062007	tipc: remove TIPC usage of field af_packet_priv in struct net_device TIPC is currently using the field 'af_packet_priv' in struct net_device as a handle to find the bearer instance associated to the given network device. But, by doing so it is blocking other networking cleanups, such as the one discussed here: http://patchwork.ozlabs.org/patch/178044/ This commit removes this usage from TIPC. Instead, we introduce a new field, 'tipc_ptr', to the net_device structure, to serve this purpose. When TIPC bearer is enabled, the bearer object is associated to 'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall from a networking device, the bearer object can now be obtained from 'tipc_ptr'. When a bearer is disabled, the bearer object is detached from its underlying network device by setting 'tipc_ptr' to NULL. Additionally, an RCU lock is used to protect the new pointer. Henceforth, the existing tipc_net_lock is used in write mode to serialize write accesses to this pointer, while the new RCU lock is applied on the read side to ensure that the pointer is 100% valid within its wrapped area for all readers. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:42 -05:00
Jon Paul Maloy	ef72a7e02a	tipc: improve naming and comment consistency in media layer struct 'tipc_media' represents the specific info that the media layer adaptors (eth_media and ib_media) expose to the generic bearer layer. We clarify this by improved commenting, and by giving the 'media_list' array the more appropriate name 'media_info_array'. There are no functional changes in this commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:42 -05:00
Jon Paul Maloy	5702dbab68	tipc: initiate media type array at compile time Communication media types are abstracted through the struct 'tipc_media', one per media type. These structs are allocated statically inside their respective media file. Furthermore, in order to be able to reach all instances from a central location, we keep a static array with pointers to these structs. This array is currently initialized at runtime, under protection of tipc_net_lock. However, since the contents of the array itself never changes after initialization, we can just as well initialize it at compile time and make it 'const', at the same time making it obvious that no lock protection is needed here. This commit makes the array constant and removes the redundant lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:42 -05:00
Ying Xue	d77b3831f7	tipc: eliminate redundant code with kfree_skb_list routine sk_buff lists are currently relased by looping over the list and explicitly releasing each buffer. We replace all occurrences of this loop with a call to kfree_skb_list(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-11 00:17:42 -05:00
Ying Xue	00ede97709	tipc: protect handler_enabled variable with qitem_lock spin lock 'handler_enabled' is a global flag indicating whether the TIPC signal handling service is enabled or not. The lack of lock protection for this flag incurs a risk for contention, so that a tipc_k_signal() call might queue a signal handler to a destroyed signal queue, with unpredictable results. To correct this, we let the already existing 'qitem_lock' protect the flag, as it already does with the queue itself. This way, we ensure that the flag always is consistent across all cores. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 22:35:49 -05:00
Jon Paul Maloy	993b858e37	tipc: correct the order of stopping services at rmmod The 'signal handler' service in TIPC is a mechanism that makes it possible to postpone execution of functions, by launcing them into a job queue for execution in a separate tasklet, independent of the launching execution thread. When we do rmmod on the tipc module, this service is stopped after the network service. At the same time, the stopping of the network service may itself launch jobs for execution, with the risk that these functions may be scheduled for execution after the data structures meant to be accessed by the job have already been deleted. We have seen this happen, most often resulting in an oops. This commit ensures that the signal handler is the very first to be stopped when TIPC is shut down, so there are no surprises during the cleanup of the other services. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 22:35:49 -05:00

... 5 6 7 8 9 ...

1258 Commits