linux

Commit Graph

Author	SHA1	Message	Date
Jon Paul Maloy	dc8d1eb305	tipc: fix node reference count bug Commit `5405ff6e15` ("tipc: convert node lock to rwlock") introduced a bug to the node reference counter handling. When a message is successfully sent in the function tipc_node_xmit(), we return directly after releasing the node lock, instead of continuing and decrementing the node reference counter as we should do. This commit fixes this bug. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-03 15:19:40 -05:00
Jon Paul Maloy	1a90632da8	tipc: eliminate remnants of hungarian notation The number of variables with Hungarian notation (l_ptr, n_ptr etc.) has been significantly reduced over the last couple of years. We now root out the last traces of this practice. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	38206d5939	tipc: narrow down interface towards struct tipc_link We move the definition of struct tipc_link from link.h to link.c in order to minimize its exposure to the rest of the code. When needed, we define new functions to make it possible for external entities to access and set data in the link. Apart from the above, there are no functional changes. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	5be9c08671	tipc: narrow down exposure of struct tipc_node In our effort to have less code and include dependencies between entities such as node, link and bearer, we try to narrow down the exposed interface towards the node as much as possible. In this commit, we move the definition of struct tipc_node, along with many of its associated function declarations, from node.h to node.c. We also move some function definitions from link.c and name_distr.c to node.c, since they access fields in struct tipc_node that should not be externally visible. The moved functions are renamed according to new location, and made static whenever possible. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	5405ff6e15	tipc: convert node lock to rwlock According to the node FSM a node in state SELF_UP_PEER_UP cannot change state inside a lock context, except when a TUNNEL_PROTOCOL (SYNCH or FAILOVER) packet arrives. However, the node's individual links may still change state. Since each link now is protected by its own spinlock, we finally have the conditions in place to convert the node spinlock to an rwlock_t. If the node state and arriving packet type are rigth, we can let the link directly receive the packet under protection of its own spinlock and the node lock in read mode. In all other cases we use the node lock in write mode. This enables full concurrent execution between parallel links during steady-state traffic situations, i.e., 99+ % of the time. This commit implements this change. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	2312bf61ae	tipc: introduce per-link spinlock As a preparation to allow parallel links to work more independently from each other we introduce a per-link spinlock, to be stored in the struct nodes's link entry area. Since the node lock still is a regular spinlock there is no increase in parallellism at this stage. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	1d7e1c2595	tipc: reduce code dependency between binding table and node layer The file name_distr.c currently contains three functions, named_cluster_distribute(), tipc_publ_subcscribe() and tipc_publ_unsubscribe() that all directly access fields in struct tipc_node. We want to eliminate such dependencies, so we move those functions to the file node.c and rename them to tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Jon Paul Maloy	5c10e97940	tipc: small cleanup of function tipc_node_check_state() The function tipc_node_check_state() contains the core logics for handling link synchronization and failover. For this reason, it is important to keep it as comprehensible as possible. In this commit, we make three small cleanups. 1) If the node is in state SELF_DOWN_PEER_LEAVING and the received packet confirms that the peer has lost contact, there will be no further action in this function. To make this clearer, we return from the function directly after the state change. 2) Since commit `0f8b8e28fb` ("tipc: eliminate risk of stalled link synchronization") only the logically first TUNNEL_PROTO/SYNCH packet can alter the link state and set the synch point, independently of arrival order. Hence, there is not any longer any need to adjust the synch value in case such packets arrive in disorder. We remove this adjustment. 3) It is the intention that any message arriving on any of the links may trig a check for and possible termination of a node SYNCH state. A redundant and unnoticed check for tipc_link_is_synching() obviously beats this purpose, with the effect that only packets arriving on the synching link may currently end the synch state. We remove this check. This change will further shorten the synchronization period between parallel links. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-20 14:06:10 -05:00
Wu Fengguang	742e038330	tipc: link_is_bc_sndlink() can be static TO: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org CC: Jon Maloy <jon.maloy@ericsson.com> CC: Ying Xue <ying.xue@windriver.com> CC: tipc-discussion@lists.sourceforge.net CC: linux-kernel@vger.kernel.org Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-25 06:31:52 -07:00
Jon Paul Maloy	2af5ae372a	tipc: clean up unused code and structures After the previous changes in this series, we can now remove some unused code and structures, both in the broadcast, link aggregation and link code. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:47 -07:00
Jon Paul Maloy	c49a0a8439	tipc: ensure binding table initial distribution is sent via first link Correct synchronization of the broadcast link at first contact between two nodes is dependent on the assumption that the binding table "bulk" update passes via the same link as the initial broadcast syncronization message, i.e., via the first link that is established. This is not guaranteed in the current implementation. If two link come up very close to each other in time, the "bulk" may quite well pass via the second link, and hence void the guarantee of a correct initial synchronization before the broadcast link is opened. This commit makes two small changes to strengthen this guarantee. 1) We let the second established link occupy slot 1 of the "active_links" array, while the first link will retain slot 0. (This is in reality a cosmetic change, we could just as well keep the current, opposite order) 2) We let the name distributor always use link selector/slot 0 when it sends it binding table updates. The extra traffic bias on the first link caused by this change should be negligible, since binding table updates constitutes a very small fraction of the total traffic. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:46 -07:00
Jon Paul Maloy	c72fa872a2	tipc: eliminate link's reference to owner node With the recent commit series, we have established a one-way dependency between the link aggregation (struct tipc_node) instances and their pertaining tipc_link instances. This has enabled quite significant code and structure simplifications. In this commit, we eliminate the field 'owner', which points to an instance of struct tipc_node, from struct tipc_link, and replace it with a pointer to struct net, which is the only external reference now needed by a link instance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:45 -07:00
Jon Paul Maloy	b06b281e79	tipc: simplify bearer level broadcast Until now, we have been keeping track of the exact set of broadcast destinations though the help structure tipc_node_map. This leads us to have to maintain a whole infrastructure for supporting this, including a pseudo-bearer and a number of functions to manipulate both the bearers and the node map correctly. Apart from the complexity, this approach is also limiting, as struct tipc_node_map only can support cluster local broadcast if we want to avoid it becoming excessively large. We want to eliminate this limitation, in order to enable introduction of scoped multicast in the future. A closer analysis reveals that it is unnecessary maintaining this "full set" overview; it is sufficient to keep a counter per bearer, indicating how many nodes can be reached via this bearer at the moment. The protocol is now robust enough to handle transitional discrepancies between the nominal number of reachable destinations, as expected by the broadcast protocol itself, and the number which is actually reachable at the moment. The initial broadcast synchronization, in conjunction with the retransmission mechanism, ensures that all packets will eventually be acknowledged by the correct set of destinations. This commit introduces these changes. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:39 -07:00
Jon Paul Maloy	5266698661	tipc: let broadcast packet reception use new link receive function The code path for receiving broadcast packets is currently distinct from the unicast path. This leads to unnecessary code and data duplication, something that can be avoided with some effort. We now introduce separate per-peer tipc_link instances for handling broadcast packet reception. Each receive link keeps a pointer to the common, single, broadcast link instance, and can hence handle release and retransmission of send buffers as if they belonged to the own instance. Furthermore, we let each unicast link instance keep a reference to both the pertaining broadcast receive link, and to the common send link. This makes it possible for the unicast links to easily access data for broadcast link synchronization, as well as for carrying acknowledges for received broadcast packets. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:37 -07:00
Jon Paul Maloy	fd556f209a	tipc: introduce capability bit for broadcast synchronization Until now, we have tried to support both the newer, dedicated broadcast synchronization mechanism along with the older, less safe, RESET_MSG/ ACTIVATE_MSG based one. The latter method has turned out to be a hazard in a highly dynamic cluster, so we find it safer to disable it completely when we find that the former mechanism is supported by the peer node. For this purpose, we now introduce a new capabability bit, TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast syncronization is supported by the present node. The new bit is conveyed between peers in the 'capabilities' field of neighbor discovery messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:35 -07:00
Jon Paul Maloy	0e05498e9e	tipc: make link implementation independent from struct tipc_bearer In reality, the link implementation is already independent from struct tipc_bearer, in that it doesn't store any reference to it. However, we still pass on a pointer to a bearer instance in the function tipc_link_create(), just to have it extract some initialization information from it. I later commits, we need to create instances of tipc_link without having any associated struct tipc_bearer. To facilitate this, we want to extract the initialization data already in the creator function in node.c, before calling tipc_link_create(), and pass this info on as individual parameters in the call. This commit introduces this change. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:56:30 -07:00
David S. Miller	26440c835f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-20 06:08:27 -07:00
Jon Paul Maloy	c819930090	tipc: update node FSM when peer RESET message is received The change made in the previous commit revealed a small flaw in the way the node FSM is updated. When the function tipc_node_link_down() is called for the last link to a node, we should check whether this was caused by a local reset or by a received RESET message from the peer. In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to the node FSM, so that it is ready to re-establish contact. If this is not done, the peer node will sometimes have to go through a second establish cycle before the link becomes stable. We fix this in this commit by conditionally issuing the mentioned event in the function tipc_node_link_down(). We also move LINK_RESET FSM even away from the link_reset() function and into the caller function, partially because it is easier to follow the code when state changes are gathered at a limited number of locations, partially because there will be cases in future commits where we don't want the link to go RESET mode when link_reset() is called. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:23 -07:00
Jon Paul Maloy	282b3a0562	tipc: send out RESET immediately when link goes down When a link is taken down because of a node local event, such as disabling of a bearer or an interface, we currently leave it to the peer node to discover the broken communication. The default time for such failure discovery is 1.5-2 seconds. If we instead allow the terminating link endpoint to send out a RESET message at the moment it is reset, we can achieve the impression that both endpoints are going down instantly. Since this is a very common scenario, we find it worthwhile to make this small modification. Apart from letting the link produce the said message, we also have to ensure that the interface is able to transmit it before TIPC is detached. We do this by performing the disabling of a bearer in three steps: 1) Disable reception of TIPC packets from the interface in question. 2) Take down the links, while allowing them so send out a RESET message. 3) Disable transmission of TIPC packets on the interface. Apart from this, we now have to react on the NETDEV_GOING_DOWN event, instead of as currently the NEDEV_DOWN event, to ensure that such transmission is possible during the teardown phase. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:22 -07:00
Jon Paul Maloy	73f646cec3	tipc: delay ESTABLISH state event when link is established Link establishing, just like link teardown, is a non-atomic action, in the sense that discovering that conditions are right to establish a link, and the actual adding of the link to one of the node's send slots is done in two different lock contexts. The link FSM is designed to help bridging the gap between the two contexts in a safe manner. We have now discovered a weakness in the implementaton of this FSM. Because we directly let the link go from state LINK_ESTABLISHING to state LINK_ESTABLISHED already in the first lock context, we are unable to distinguish between a fully established link, i.e., a link that has been added to its slot, and a link that has not yet reached the second lock context. It may hence happen that a manual intervention, e.g., when disabling an interface, causes the function tipc_node_link_down() to try removing the link from the node slots, decrementing its active link counter etc, although the link was never added there in the first place. We solve this by delaying the actual state change until we reach the second lock context, inside the function tipc_node_link_up(). This makes it possible for potentail callers of __tipc_node_link_down() to know if they should proceed or not, and the problem is solved. Unforunately, the situation described above also has a second problem. Since there by necessity is a tipc_node_link_up() call pending once the node lock has been released, we must defuse that call by setting the link back from LINK_ESTABLISHING to LINK_RESET state. This forces us to make a slight modification to the link FSM, which will now look as follows. +------------------------------------+ \|RESET_EVT \| \| \| \| +--------------+ \| +-----------------\| SYNCHING \|-----------------+ \| \|FAILURE_EVT +--------------+ PEER_RESET_EVT\| \| \| A \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|SYNCH_ \|SYNCH_ \| \| \| \|BEGIN_EVT \|END_EVT \| \| \| \| \| \| \| V \| V V \| +-------------+ +--------------+ +------------+ \| \| RESETTING \|<---------\| ESTABLISHED \|--------->\| PEER_RESET \| \| +-------------+ FAILURE_ +--------------+ PEER_ +------------+ \| \| EVT \| A RESET_EVT \| \| \| \| \| \| \| \| +----------------+ \| \| \| RESET_EVT\| \|RESET_EVT \| \| \| \| \| \| \| \| \| \| \|ESTABLISH_EVT \| \| \| \| +-------------+ \| \| \| \| \| \| RESET_EVT \| \| \| \| \| \| \| \| \| \| \| V V V \| \| \| \| +-------------+ +--------------+ RESET_EVT\| +--->\| RESET \|--------->\| ESTABLISHING \|<----------------+ +-------------+ PEER_ +--------------+ \| A RESET_EVT \| \| \| \| \| \| \| \|FAILOVER_ \|FAILOVER_ \|FAILOVER_ \|BEGIN_EVT \|END_EVT \|BEGIN_EVT \| \| \| V \| \| +-------------+ \| \| FAILINGOVER \|<----------------+ +-------------+ Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 23:55:21 -07:00
Jon Paul Maloy	0f8b8e28fb	tipc: eliminate risk of stalled link synchronization In commit `6e498158a8` ("tipc: move link synch and failover to link aggregation level") we introduced a new mechanism for performing link failover and synchronization. We have now detected a bug in this mechanism. During link synchronization we use the arrival of any packet on the tunnel link to trig a check for whether it has reached the synchronization point or not. This has turned out to be too permissive, since it may cause an arriving non-last SYNCH packet to end the synch state, just to see the next SYNCH packet initiate a new synch state with a new, higher synch point. This is not fatal, but should be avoided, because it may significantly extend the synchronization period, while at the same time we are not allowed to send NACKs if packets are lost. In the worst case, a low-traffic user may see its traffic stall until a LINK_PROTOCOL state message trigs the link to leave synchronization state. At the same time, LINK_PROTOCOL packets which happen to have a (non- valid) sequence number lower than the tunnel link's rcv_nxt value will be consistently dropped, and will never be able to resolve the situation described above. We fix this by exempting LINK_PROTOCOL packets from the sequence number check, as they should be. We also reduce (but don't completely eliminate) the risk of entering multiple synchronization states by only allowing the (logically) first SYNCH packet to initiate a synchronization state. This works independently of actual packet arrival order. Fixes: commit `6e498158a8` ("tipc: move link synch and failover to link aggregation level") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 06:06:40 -07:00
Jon Paul Maloy	2be80c2d87	tipc: fix stale link problem during synchronization Recent changes to the link synchronization means that we can now just drop packets arriving on the synchronizing link before the synch point is reached. This has lead to significant simplifications to the implementation, but also turns out to have a flip side that we need to consider. Under unlucky circumstances, the two endpoints may end up repeatedly dropping each other's packets, while immediately asking for retransmission of the same packets, just to drop them once more. This pattern will eventually be broken when the synch point is reached on the other link, but before that, the endpoints may have arrived at the retransmission limit (stale counter) that indicates that the link should be broken. We see this happen at rare occasions. The fix for this is to not ask for retransmissions when a link is in state LINK_SYNCHING. The fact that the link has reached this state means that it has already received the first SYNCH packet, and that it knows the synch point. Hence, it doesn't need any more packets until the other link has reached the synch point, whereafter it can go ahead and ask for the missing packets. However, because of the reduced traffic on the synching link that follows this change, it may now take longer to discover that the synch point has been reached. We compensate for this by letting all packets, on any of the links, trig a check for synchronization termination. This is possible because the packets themselves don't contain any information that is needed for discovering this condition. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-23 16:14:45 -07:00
Jon Paul Maloy	5ae2f8e685	tipc: interrupt link synchronization when a link goes down When we introduced the new link failover/synch mechanism in commit `6e498158a8` ("tipc: move link synch and failover to link aggregation level"), we missed the case when the non-tunnel link goes down during the link synchronization period. In this case the tunnel link will remain in state LINK_SYNCHING, something leading to unpredictable behavior when the failover procedure is initiated. In this commit, we ensure that the node and remaining link goes back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED) when one of the parallel links goes down. We also ensure that we don't re-enter synch mode if subsequent SYNCH packets arrive on the remaining link. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-23 16:14:45 -07:00
Jon Paul Maloy	17b2063077	tipc: eliminate risk of premature link setup during failover When a link goes down, and there is still a working link towards its destination node, a failover is initiated, and the failed link is not allowed to re-establish until that procedure is finished. To ensure this, the concerned link endpoints are set to state LINK_FAILINGOVER, and the node endpoints to NODE_FAILINGOVER during the failover period. However, if the link reset is due to a disabled bearer, the corres- ponding link endpoint is deleted, and only the node endpoint knows about the ongoing failover. Now, if the disabled bearer is re-enabled during the failover period, the discovery mechanism may create a new link endpoint that is ready to be established, despite that this is not permitted. This situation may cause both the ongoing failover and any subsequent link synchronization to fail. In this commit, we ensure that a newly created link goes directly to state LINK_FAILINGOVER if the corresponding node state is NODE_FAILINGOVER. This eliminates the problem described above. Furthermore, we tighten the criteria for which packets are allowed to end a failover state in the function tipc_node_check_state(). By checking that the receiving link is up and running, instead of just checking that it is not in failover mode, we eliminate the risk that protocol packets from the re-created link may cause the failover to be prematurely terminated. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-23 16:14:45 -07:00
Jon Paul Maloy	440d8963cd	tipc: clean up link creation We simplify the link creation function tipc_link_create() and the way the link struct it is connected to the node struct. In particular, we remove the duplicate initialization of some fields which are anyway set in tipc_link_reset(). Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:15 -07:00
Jon Paul Maloy	23d8335d78	tipc: remove implicit message delivery in node_unlock() After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	598411d70f	tipc: make resetting of links non-atomic In order to facilitate future improvements to the locking structure, we want to make resetting and establishing of links non-atomic. I.e., the functions tipc_node_link_up() and tipc_node_link_down() should be called from outside the node lock context, and grab/release the node lock themselves. This requires that we can freeze the link state from the moment it is set to RESETTING or PEER_RESET in one lock context until it is set to RESET or ESTABLISHING in a later context. The recently introduced link FSM makes this possible, so we are now ready to introduce the above change. This commit implements this. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	cf148816ac	tipc: move received discovery data evaluation inside node.c The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	662921cd0a	tipc: merge link->exec_mode and link->state into one FSM Until now, we have been handling link failover and synchronization by using an additional link state variable, "exec_mode". This variable is not independent of the link FSM state, something causing a risk of inconsistencies, apart from the fact that it clutters the code. The conditions are now in place to define a new link FSM that covers all existing use cases, including failover and synchronization, and eliminate the "exec_mode" field altogether. The FSM must also support non-atomic resetting of links, which will be introduced later. The new link FSM is shown below, with 7 states and 8 events. Only events leading to state change are shown as edges. +------------------------------------+ \|RESET_EVT \| \| \| \| +--------------+ \| +-----------------\| SYNCHING \|-----------------+ \| \|FAILURE_EVT +--------------+ PEER_RESET_EVT\| \| \| A \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|SYNCH_ \|SYNCH_ \| \| \| \|BEGIN_EVT \|END_EVT \| \| \| \| \| \| \| V \| V V \| +-------------+ +--------------+ +------------+ \| \| RESETTING \|<---------\| ESTABLISHED \|--------->\| PEER_RESET \| \| +-------------+ FAILURE_ +--------------+ PEER_ +------------+ \| \| EVT \| A RESET_EVT \| \| \| \| \| \| \| \| \| \| \| \| \| +--------------+ \| \| \| RESET_EVT\| \|RESET_EVT \|ESTABLISH_EVT \| \| \| \| \| \| \| \| \| \| \| \| V V \| \| \| +-------------+ +--------------+ RESET_EVT\| +--->\| RESET \|--------->\| ESTABLISHING \|<----------------+ +-------------+ PEER_ +--------------+ \| A RESET_EVT \| \| \| \| \| \| \| \|FAILOVER_ \|FAILOVER_ \|FAILOVER_ \|BEGIN_EVT \|END_EVT \|BEGIN_EVT \| \| \| V \| \| +-------------+ \| \| FAILINGOVER \|<----------------+ +-------------+ These changes are fully backwards compatible. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	5045f7b900	tipc: move protocol message sending away from link FSM The implementation of the link FSM currently takes decisions about and sends out link protocol messages. This is unnecessary, since such actions are not the result of any link state change, and are even decided based on non-FSM state information ("silent_intv_cnt"). We now move the sending of unicast link protocol messages to the function tipc_link_timeout(), and the initial broadcast synchronization message to tipc_node_link_up(). The latter is done because a link instance should not need to know whether it is the first or second link to a destination. Such information is now restricted to and handled by the link aggregation layer in node.c Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	6e498158a8	tipc: move link synch and failover to link aggregation level Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	66996b6c47	tipc: extend node FSM In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ \| PEER_DOWN_EVT\| \| \| +------------------------+----------------+ \| \|SELF_DOWN_EVT \| \| \| \| \| \| \| \| +-----------+ +-----------+ \| \| \|NODE_ \| \|NODE_ \| \| \| +----------\|FAILINGOVER\|<---------\|SYNCHING \|------------+ \| \| \|SELF_ +-----------+ FAILOVER_+-----------+ PEER_ \| \| \| \|DOWN_EVT \| A BEGIN_EVT A \| DOWN_EVT\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|FAILOVER_\|FAILOVER_ \|SYNCH_ \|SYNCH_ \| \| \| \| \|END_EVT \|BEGIN_EVT \|BEGIN_EVT\|END_EVT \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| +--------------+ \| \| \| \| \| +------->\| SELF_UP_ \|<-------+ \| \| \| \| +----------------\| PEER_UP \|------------------+ \| \| \| \| \|SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT\| \| \| \| \| \| A A \| \| \| \| \| \| \| \| \| \| \| \| \| \| PEER_UP_EVT\| \|SELF_UP_EVT \| \| \| \| \| \| \| \| \| \| \| V V V \| \| V V V +------------+ +-----------+ +-----------+ +------------+ \|SELF_DOWN_ \| \|SELF_UP_ \| \|PEER_UP_ \| \|PEER_DOWN \| \|PEER_LEAVING\|<------\|PEER_COMING\| \|SELF_COMING\|------>\|SELF_LEAVING\| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ \| DOWN_EVT A A DOWN_EVT \| \| \| \| \| \| \| \| \| \| SELF_UP_EVT\| \|PEER_UP_EVT \| \| \| \| \| \| \| \| \| \|PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT\| +------------------->\| SELF_DOWN_ \|<--------------------+ \| PEER_DOWN \| +--------------+ Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	655fb243b8	tipc: reverse call order for link_reset()->node_link_down() In many cases the call order when a link is reset goes as follows: tipc_node_xx()->tipc_link_reset()->tipc_node_link_down() This is not the right order if we want the node to be in control, so in this commit we change the order to: tipc_node_xx()->tipc_node_link_down()->tipc_link_reset() The fact that tipc_link_reset() now is called from only one location with a well-defined state will also facilitate later simplifications of tipc_link_reset() and the link FSM. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	6144a996a6	tipc: move all link_reset() calls to link aggregation level In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	cbeb83ca68	tipc: eliminate function tipc_link_activate() The function tipc_link_activate() is redundant, since it mostly performs settings that have already been done in a preceding tipc_link_reset(). There are three exceptions to this: - The actual state change to TIPC_LINK_WORKING. This should anyway be done in the FSM, and not in a separate function. - Registration of the link with the bearer. This should be done by the node, since we don't want the link to have any knowledge about its specific bearer. - Call to tipc_node_link_up() for user access registration. With the new role distribution between link aggregation and link level this becomes the wrong call order; tipc_node_link_up() should instead be called directly as a result of a TIPC_LINK_UP event, hence by the node itself. This commit implements those changes. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	d999297c3d	tipc: reduce locking scope during packet reception We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	1a20cc254e	tipc: introduce node contact FSM The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ \| SELF_UP/ \| +---------------->\| PEER_COMING \|-----------------+ SELF_ \| +--------------+ \|PEER_ ESTBL_ \| \| \|ESTBL_ CONTACT\| SELF_LOST_CONTACT \| \|CONTACT \| v \| \| +--------------+ \| \| PEER_ \| SELF_DOWN/ \| SELF_ \| \| LOST_ +--\| PEER_LEAVING \|<--+ LOST_ v +-------------+ CONTACT \| +--------------+ \| CONTACT +-----------+ \| SELF_DOWN/ \|<----------+ +----------\| SELF_UP/ \| \| PEER_DOWN \|<----------+ +----------\| PEER_UP \| +-------------+ SELF_ \| +--------------+ \| PEER_ +-----------+ \| LOST_ +--\| SELF_LEAVING/\|<--+ LOST_ A \| CONTACT \| PEER_DOWN \| CONTACT \| \| +--------------+ \| \| A \| PEER_ \| PEER_LOST_CONTACT \| \|SELF_ ESTBL_ \| \| \|ESTBL_ CONTACT\| +--------------+ \|CONTACT +---------------->\| PEER_UP/ \|-----------------+ \| SELF_COMING \| +--------------+ Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	8a1577c96f	tipc: move link supervision timer to node level In our effort to move control of the links to the link aggregation layer, we move the perodic link supervision timer to struct tipc_node. The new timer is shared between all links belonging to the node, thus saving resources, while still kicking the FSM on both its pertaining links at each expiration. The current link timer and corresponding functions are removed. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	d3504c3449	tipc: clean up definitions and usage of link flags The status flag LINK_STOPPED is not needed any more, since the mechanism for delayed deletion of links has been removed. Likewise, LINK_STARTED and LINK_START_EVT are unnecessary, because we can just as well start the link timer directly from inside tipc_link_create(). We eliminate these flags in this commit. Instead of the above flags, we now introduce three new link modes, TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages the link is allowed to receive in this state. TIPC_LINK_BLOCKED also blocks timer-driven protocol messages to be sent out, and any change to the link FSM. Since the modes are mutually exclusive, we convert them to state values, and rename the 'flags' field in struct tipc_link to 'exec_mode'. Finally, we move the #defines for link FSM states and events from link.h into enums inside the file link.c, which is the real usage scope of these definitions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	af9b028e27	tipc: make media xmit call outside node spinlock context Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	36e78a463b	tipc: use bearer index when looking up active links struct tipc_node currently holds two arrays of link pointers; one, indexed by bearer identity, which contains all links irrespective of current state, and one two-slot array for the currently active link or links. The latter array contains direct pointers into the elements of the former. This has the effect that we cannot know the bearer id of a link when accessing it via the "active_links[]" array without actually dereferencing the pointer, something we want to avoid in some cases. In this commit, we do instead store the bearer identity in the "active_links" array, and use this as an index to find the right element in the overall link entry array. This change should be seen as a preparation for the later commits in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	d39bbd445d	tipc: move link input queue to tipc_node At present, the link input queue and the name distributor receive queues are fields aggregated in struct tipc_link. This is a hazard, because a link might be deleted while a receiving socket still keeps reference to one of the queues. This commit fixes this bug. However, rather than adding yet another reference counter to the critical data path, we move the two queues to safe ground inside struct tipc_node, which is already protected, and let the link code only handle references to the queues. This is also in line with planned later changes in this area. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	d3a43b907a	tipc: move link creation from neighbor discoverer to node As a step towards turning links into node internal entities, we move the creation of links from the neighbor discovery logics to the node's link control logics. We also create an additional entry for the link's media address in the newly introduced struct tipc_link_entry, since this is where it is needed in the upcoming commits. The current copy in struct tipc_link is kept for now, but will be removed later. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	9d13ec65ed	tipc: introduce link entry structure to struct tipc_node struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	dd3f9e70f5	tipc: add packet sequence number at instant of transmission Currently, the packet sequence number is updated and added to each packet at the moment a packet is added to the link backlog queue. This is wasteful, since it forces the code to traverse the send packet list packet by packet when adding them to the backlog queue. It would be better to just splice the whole packet list into the backlog queue when that is the right action to do. In this commit, we do this change. Also, since the sequence numbers cannot now be assigned to the packets at the moment they are added the backlog queue, we do instead calculate and add them at the moment of transmission, when the backlog queue has to be traversed anyway. We do this in the function tipc_link_push_packet(). Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-14 12:24:46 -04:00
Jon Paul Maloy	a6bf70f792	tipc: simplify include dependencies When we try to add new inline functions in the code, we sometimes run into circular include dependencies. The main problem is that the file core.h, which really should be at the root of the dependency chain, instead is a leaf. I.e., core.h includes a number of header files that themselves should be allowed to include core.h. In reality this is unnecessary, because core.h does not need to know the full signature of any of the structs it refers to, only their type declaration. In this commit, we remove all dependencies from core.h towards any other tipc header file. As a consequence of this change, we can now move the function tipc_own_addr(net) from addr.c to addr.h, and make it inline. There are no functional changes in this commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-14 12:24:45 -04:00
Jon Paul Maloy	ed193ece26	tipc: simplify link mtu negotiation When a link is being established, the two endpoints advertise their respective interface MTU in the transmitted RESET and ACTIVATE messages. If there is any difference, the lower of the two MTUs will be selected for use by both endpoints. However, as a remnant of earlier attempts to introduce TIPC level routing. there also exists an MTU discovery mechanism. If an intermediate node has a lower MTU than the two endpoints, they will discover this through a bisectional approach, and finally adopt this MTU for common use. Since there is no TIPC level routing, and probably never will be, this mechanism doesn't make any sense, and only serves to make the link level protocol unecessarily complex. In this commit, we eliminate the MTU discovery algorithm,and fall back to the simple MTU advertising approach. This change is fully backwards compatible. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:27:12 -04:00
Jon Paul Maloy	dff29b1a88	tipc: eliminate delayed link deletion at link failover When a bearer is disabled manually, all its links have to be reset and deleted. However, if there is a remaining, parallel link ready to take over a deleted link's traffic, we currently delay the delete of the removed link until the failover procedure is finished. This is because the remaining link needs to access state from the reset link, such as the last received packet number, and any partially reassembled buffer, in order to perform a successful failover. In this commit, we do instead move the state data over to the new link, so that it can fulfill the procedure autonomously, without accessing any data on the old link. This means that we can now proceed and delete all pertaining links immediately when a bearer is disabled. This saves us from some unnecessary complexity in such situations. We also choose to change the confusing definitions CHANGEOVER_PROTOCOL, ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL, FAILOVER_MSG and SYNCH_MSG respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:27:12 -04:00
Ying Xue	8a0f6ebe84	tipc: involve reference counter for node structure TIPC node hash node table is protected with rcu lock on read side. tipc_node_find() is used to look for a node object with node address through iterating the hash node table. As the entire process of what tipc_node_find() traverses the table is guarded with rcu read lock, it's safe for us. However, when callers use the node object returned by tipc_node_find(), there is no rcu read lock applied. Therefore, this is absolutely unsafe for callers of tipc_node_find(). Now we introduce a reference counter for node structure. Before tipc_node_find() returns node object to its caller, it first increases the reference counter. Accordingly, after its caller used it up, it decreases the counter again. This can prevent a node being used by one thread from being freed by another thread. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-29 12:40:28 -07:00
Ying Xue	b952b2befb	tipc: fix potential deadlock when all links are reset [ 60.988363] ====================================================== [ 60.988754] [ INFO: possible circular locking dependency detected ] [ 60.989152] 3.19.0+ #194 Not tainted [ 60.989377] ------------------------------------------------------- [ 60.989781] swapper/3/0 is trying to acquire lock: [ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.990743] [ 60.990743] but task is already holding lock: [ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.991738] [ 60.991738] which lock already depends on the new lock. [ 60.991738] [ 60.992174] [ 60.992174] the existing dependency chain (in reverse order) is: [ 60.992174] -> #1 (&(&bclink->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc] [ 60.992174] [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc] [ 60.992174] [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc] [ 60.992174] [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc] [ 60.992174] [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] -> #0 (&(&n_ptr->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0 [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.992174] [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc] [ 60.992174] [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] [ 60.992174] other info that might help us debug this: [ 60.992174] [ 60.992174] Possible unsafe locking scenario: [ 60.992174] [ 60.992174] CPU0 CPU1 [ 60.992174] ---- ---- [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] [ 60.992174] * DEADLOCK * [ 60.992174] [ 60.992174] 3 locks held by swapper/3/0: [ 60.992174] #0: (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980 [ 60.992174] #1: (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc] [ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] The correct the sequence of grabbing n_ptr->lock and bclink->lock should be that the former is first held and the latter is then taken, which exactly happened on CPU1. But especially when the retransmission of broadcast link is failed, bclink->lock is first held in tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure() called by tipc_link_retransmit() subsequently, which is demonstrated on CPU0. As a result, deadlock occurs. If the order of holding the two locks happening on CPU0 is reversed, the deadlock risk will be relieved. Therefore, the node lock taken in link_retransmit_failure() originally is moved to tipc_bclink_rcv() so that it's obtained before bclink lock. But the precondition of the adjustment of node lock is that responding to bclink reset event must be moved from tipc_bclink_unlock() to tipc_node_unlock(). Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-29 12:40:27 -07:00

1 2 3 4

173 Commits