linux

Commit Graph

Author	SHA1	Message	Date
David S. Miller	fe2d55d295	Merge branch 'ibmvnic-Update-TX-pool-and-TX-routines' Thomas Falcon says: ==================== ibmvnic: Update TX pool and TX routines This patch restructures the TX pool data structure and provides a separate TX pool array for TSO transmissions. This is already used in some way due to our unique DMA situation, namely that we cannot use single DMA mappings for packet data. Previously, both buffer arrays used the same pool entry. This restructuring allows for some additional cleanup in the driver code, especially in some places in the device transmit routine. In addition, it allows us to more easily track the consumer and producer indexes of a particular pool. This has been further improved by better tracking of in-use buffers to prevent possible data corruption in case an invalid buffer entry is used. v5: Fix bisectability mistake in the first patch. Removed TSO-specific data in a later patch when it is no longer used. v4: Fix error in 7th patch that causes an oops by using the older fixed value for number of buffers instead of the respective field in the tx pool data structure v3: Forgot to update TX pool cleaning function to handle new data structures. Included 7th patch for that. v2: Fix typo in 3/6 commit subject line ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:40 -04:00
Thomas Falcon	76c15c911b	ibmvnic: Remove unused TSO resources in TX pool structure Finally, remove the TSO-specific fields in the TX pool strcutures. These are no longer needed with the introduction of separate buffer pools for TSO transmissions. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	e9e1e97884	ibmvnic: Update TX pool cleaning routine Update routine that cleans up any outstanding transmits that have not received completions when the device needs to close. Introduces a helper function that cleans one TX pool to make code more readable. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	86b61a5f2e	ibmvnic: Improve TX buffer accounting Improve TX pool buffer accounting to prevent the producer index from overruning the consumer. First, set the next free index to an invalid value if it is in use. If next buffer to be consumed is in use, drop the packet. Finally, if the transmit fails for some other reason, roll back the consumer index and set the free map entry to its original value. This should also be done if the DMA map fails. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	06b3e35788	ibmvnic: Update TX and TX completion routines Update TX and TX completion routines to account for TX pool restructuring. TX routine first chooses the pool depending on whether a packet is GSO or not, then uses it accordingly. For the completion routine to know which pool it needs to use, set the most significant bit of the correlator index to one if the packet uses the TSO pool. On completion, unset the bit and use the correlator index to release the buffer pool entry. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	3205306c6b	ibmvnic: Update TX pool initialization routine Introduce function that initializes one TX pool. Use that to create each pool entry in both the standard TX pool and TSO pool arrays. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	fb79421c3c	ibmvnic: Update release TX pool routine Introduce function that frees one TX pool. Use that to release each pool in both the standard TX pool and TSO pool arrays. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	e26dc25bc0	ibmvnic: Update and clean up reset TX pool routine Update TX pool reset routine to accommodate new TSO pool array. Introduce a function that resets one TX pool, and use that function to initialize each pool in both pool arrays. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Thomas Falcon	4bd95a51b6	ibmvnic: Generalize TX pool structure Remove some unused fields in the structure and include values describing the individual buffer size and number of buffers in a TX pool. This allows us to use these fields for TX pool buffer accounting as opposed to using hard coded values. Include a new pool array for TSO transmissions. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:12:39 -04:00
Al Viro	d47d08c8ca	sctp: use proc_remove_subtree() use proc_remove_subtree() for subtree removal, both on setup failure halfway through and on teardown. No need to make simple things complex... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:11:22 -04:00
David S. Miller	90e2c7a124	Merge branch 'hv_netvsc-minor-enhancements' Stephen Hemminger says: ==================== hv_netvsc: minor enhancements A couple of small things for net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:10:27 -04:00
Stephen Hemminger	ec9663812f	hv_netvsc: add trace points This adds tracepoints to the driver which has proved useful in debugging startup and shutdown race conditions. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:10:27 -04:00
Stephen Hemminger	0e96460e62	hv_netvsc: pass netvsc_device to rndis halt The caller has a valid pointer, pass it to rndis_filter_halt_device and avoid any possible RCU races here. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 20:10:27 -04:00
Grygorii Strashko	a3a41d2f94	net: ethernet: ti: cpsw: enable vlan rx vlan offload In VLAN_AWARE mode CPSW can insert VLAN header encapsulation word on Host port 0 egress (RX) before the packet data if RX_VLAN_ENCAP bit is set in CPSW_CONTROL register. VLAN header encapsulation word has following format: HDR_PKT_Priority bits 29-31 - Header Packet VLAN prio (Highest prio: 7) HDR_PKT_CFI bits 28 - Header Packet VLAN CFI bit. HDR_PKT_Vid bits 27-16 - Header Packet VLAN ID PKT_Type bits 8-9 - Packet Type. Indicates whether the packet is VLAN-tagged, priority-tagged, or non-tagged. 00: VLAN-tagged packet 01: Reserved 10: Priority-tagged packet 11: Non-tagged packet This feature can be used to implement TX VLAN offload in case of VLAN-tagged packets and to insert VLAN tag in case Non-tagged packet was received on port with PVID set. As per documentation, CPSW never modifies packet data on Host egress (RX) and as result, without this feature enabled, Host port will not be able to receive properly packets which entered switch non-tagged through external Port with PVID set (when non-tagged packet forwarded from external Port with PVID set to another external Port - packet will be VLAN tagged properly). Implementation details: - on RX driver will check CPDMA status bit RX_VLAN_ENCAP BIT(19) in CPPI descriptor to identify when VLAN header encapsulation word is present. - PKT_Type = 0x01 or 0x02 then ignore VLAN header encapsulation word and pass packet as is; - if HDR_PKT_Vid = 0 then ignore VLAN header encapsulation word and pass packet as is; - In dual mac mode traffic is separated between ports using default port vlans, which are not be visible to Host and so should not be reported. Hence, check for default port vlans in dual mac mode and ignore VLAN header encapsulation word; - otherwise fill SKB with VLAN info using __vlan_hwaccel_put_tag(); - PKT_Type = 0x00 (VLAN-tagged) then strip out VLAN header from SKB. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 19:51:01 -04:00
Arjun Vynipadath	d7cb44496a	cxgb4: Fix queue free path of ULD drivers Setting sge_uld_rxq_info to NULL in free_queues_uld(). We are referencing sge_uld_rxq_info in cxgb_up(). This will fix a panic when interface is brought up after a ULDq creation failure. Fixes: `94cdb8bb99` (cxgb4: Add support for dynamic allocation of resources for ULD) Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudhar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:20:03 -04:00
Sowmini Varadhan	53d0e83f93	rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock rds_tcp_connection allocation/free management has the potential to be called from __rds_conn_create after IRQs have been disabled, so spin_[un]lock_bh cannot be used with rds_tcp_conn_lock. Bottom-halves that need to synchronize for critical sections protected by rds_tcp_conn_lock should instead use rds_destroy_pending() correctly. Reported-by: syzbot+c68e51bb5e699d3f8d91@syzkaller.appspotmail.com Fixes: `ebeeb1ad9b` ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:18:54 -04:00
David S. Miller	3008ba5faa	Merge branch 'tipc-obsolete-zone-concept' Jon Maloy says: ==================== tipc: obsolete zone concept Functionality related to the 'zone' concept was never implemented in TIPC. In this series we eliminate the remaining traces of it in the code, and can hence take a first step in reducing the footprint and complexity of the binding table. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:11:47 -04:00
Jon Maloy	e50e73e107	tipc: some name changes We rename some lists and fields in struct publication both to make the naming more consistent and to better reflect their roles. We also update the descriptions of those lists. node_list -> local_publ cluster_list -> all_publ pport_list -> binding_sock ref -> port There are no functional changes in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:11:46 -04:00
Jon Maloy	935439cc48	tipc: merge two lists in struct publication The size of struct publication can be reduced further. Membership in lists 'nodesub_list' and 'local_list' is mutually exlusive, in that remote publications use the former and local publications the latter. We replace the two lists with one single, named 'binding_node' which reflects what it really is. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:11:46 -04:00
Jon Maloy	ba765ec637	tipc: remove zone_list member in struct publication As a further consequence of the previous commits, we can also remove the member 'zone_list 'in struct name_info and struct publication. Instead, we now let the member cluster_list take over the role a container of all publications of a given <type,lower, upper>. We also remove the counters for the size of those lists, since they don't serve any purpose. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:11:46 -04:00
Jon Maloy	64a52b26d5	tipc: remove zone publication list in name table As a consequence of the previous commit we nan now eliminate zone scope related lists in the name table. We start with name_table::publ_list[3], which can now be replaced with two lists, one for node scope publications and one for cluster scope publications. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:11:46 -04:00
Jon Maloy	928df1880e	tipc: obsolete TIPC_ZONE_SCOPE Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all aspects handled the same way, both on the publishing node and on the receiving nodes. Despite previous ambitions to the contrary, this is never going to change, so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we remain compatible with users and remote nodes still using ZONE_SCOPE. Furthermore, the non-formalized scope value 0 has always been permitted for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE. We now permit it even as binding scope, but for compatibility reasons we choose to not change the value of TIPC_CLUSTER_SCOPE. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:11:46 -04:00
David S. Miller	4f1aec01fc	Merge branch 'pernet-convert-part8' Kirill Tkhai says: ==================== Converting pernet_operations (part #8) this series continues to review and to convert pernet_operations to make them possible to be executed in parallel for several net namespaces at the same time. There are different operations over the tree, mostly are ipvs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:40 -04:00
Kirill Tkhai	6c77e79557	net: Convert ip_vs_ftp_ops These pernet_operations register and unregister ipvs app. register_ip_vs_app(), unregister_ip_vs_app() and register_ip_vs_app_inc() modify per-net structures, and there are no global structures touched. So, this looks safe to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:39 -04:00
Kirill Tkhai	d0edfbb4ba	net: Convert ipvs_core_dev_ops Exit method stops two per-net threads and cancels delayed work. Everything looks nicely per-net divided. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:39 -04:00
Kirill Tkhai	554855ccdf	net: Convert ipvs_core_ops These pernet_operations register and unregister nf hooks, /proc entries, sysctl, percpu statistics. There are several global lists, and the only list modified without exclusive locks is ip_vs_conn_tab in ip_vs_conn_flush(). We iterate the list and force the timers expire at the moment. Since there were possible several timer expirations before this patch, and since they are safe, the patch does not invent new parallelism of their destruction. These pernet_operations look safe to be converted. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:39 -04:00
Kirill Tkhai	ec716650a7	net: Convert ovs_net_ops These pernet_operations initialize and destroy net_generic() data pointed by ovs_net_id. Exit method destroys vports from alive net to exiting net. Since they are only pernet_operations interested in this data, and exit method is executed under exclusive global lock (ovs_mutex), they are safe to be executed in parallel. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:39 -04:00
Kirill Tkhai	8cec2f49dc	net: Convert mpls_net_ops These pernet_operations register and unregister sysctl table. Exit methods frees platform_labels from net::mpls::platform_label. Everything is per-net, and they looks safe to be marked async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:39 -04:00
Kirill Tkhai	489b30b53f	net: Convert l2tp_net_ops Init method is rather simple. Exit method queues del_work for every tunnel from per-net list. This seems to be safe to be marked async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-17 17:07:39 -04:00
Yousuk Seung	5379457004	net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit Set tp->snd_ssthresh to BDP upon STARTUP exit. This allows us to check if a BBR flow exited STARTUP and the BDP at the time of STARTUP exit with SCM_TIMESTAMPING_OPT_STATS. Since BBR does not use snd_ssthresh this fix has no impact on BBR's behavior. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 15:07:48 -04:00
Yousuk Seung	7156d194a0	tcp: add snd_ssthresh stat in SCM_TIMESTAMPING_OPT_STATS This patch adds TCP_NLA_SND_SSTHRESH stat into SCM_TIMESTAMPING_OPT_STATS that reports tcp_sock.snd_ssthresh. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 15:07:48 -04:00
Vinicius Costa Gomes	5dce837944	selftests/txtimestamp: Add more configurable parameters Add a way to configure if poll() should wait forever for an event, the number of packets that should be sent for each and if there should be any delay between packets. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 15:05:57 -04:00
Intiyaz Basha	5eb297a9a5	liquidio: Simplified napi poll 1) Moved interrupt enable related code from octeon_process_droq_poll_cmd() to separate function octeon_enable_irq(). 2) Removed wrapper function octeon_process_droq_poll_cmd(), and directlyi using octeon_droq_process_poll_pkts(). 3) Removed unused macros POLL_EVENT_XXX. Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 15:03:26 -04:00
David S. Miller	b8124b53d1	Merge branch 'net-smc-IPv6-support' Ursula Braun says: ==================== net/smc: IPv6 support these smc patches for the net-next tree add IPv6 support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 14:57:26 -04:00
Karsten Graul	aaa4d33f6d	net/smc: enable ipv6 support for smc Add ipv6 support to the smc socket layer functions. Make use of the updated clc layer functions to retrieve and match ipv6 information. The indicator for ipv4 or ipv6 is the protocol constant that is provided in the socket() call with address family AF_SMC. Based-on-patch-by: Takanori Ueda <tkueda@jp.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 14:57:26 -04:00
Karsten Graul	1a26d0201d	net/smc: add ipv6 support to CLC layer The CLC layer is updated to support ipv6 proposal messages from peers and to match incoming proposal messages against the ipv6 addresses of the net device. struct smc_clc_ipv6_prefix is updated to provide the space for an ipv6 address (struct was not used before). SMC_CLC_MAX_LEN is updated to include the size of the proposal prefix. Existing code in net is not affected, the previous SMC_CLC_MAX_LEN value is large enough to hold ipv4 proposal messages. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 14:57:25 -04:00
Karsten Graul	c246d942ea	net/smc: restructure netinfo for CLC proposal msgs Introduce functions smc_clc_prfx_set to retrieve IP information for the CLC proposal msg and smc_clc_prfx_match to match the contents of a proposal message against the IP addresses of the net device. The new functions replace the functionality provided by smc_clc_netinfo_by_tcpsk, which is removed by this patch. The match functionality is extended to scan all ipv4 addresses of the net device for a match against the ipv4 subnet from the proposal msg. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 14:57:25 -04:00
Ganesh Goudar	8b7372c101	cxgb4: notify fatal error to uld drivers notify uld drivers if the adapter encounters fatal error. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 14:46:57 -04:00
David S. Miller	ce627a1b10	Merge branch 'rtnl_lock_killable' Kirill Tkhai says: ==================== Introduce rtnl_lock_killable() rtnl_lock() is widely used mutex in kernel. Some of kernel code does memory allocations under it. In case of memory deficit this may invoke OOM killer, but the problem is a killed task can't exit if it's waiting for the mutex. This may be a reason of deadlock and panic. This patchset adds a new primitive, which responds on SIGKILL, and it allows to use it in the places, where we don't want to sleep forever. Also, the first place is made to use it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 12:31:19 -04:00
Kirill Tkhai	b0f3debc9a	net: Use rtnl_lock_killable() in register_netdev() This patch adds rtnl_lock_killable() to one of hot path using rtnl_lock(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 12:31:19 -04:00
Kirill Tkhai	79ffdfc652	net: Add rtnl_lock_killable() rtnl_lock() is widely used mutex in kernel. Some of kernel code does memory allocations under it. In case of memory deficit this may invoke OOM killer, but the problem is a killed task can't exit if it's waiting for the mutex. This may be a reason of deadlock and panic. This patch adds a new primitive, which responds on SIGKILL, and it allows to use it in the places, where we don't want to sleep forever. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 12:31:19 -04:00
Tonghao Zhang	320bd6de79	doc: Change the udp/sctp rmem/wmem default value. The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 12:03:30 -04:00
Tonghao Zhang	1e80295158	udp: Move the udp sysctl to namespace. This patch moves the udp_rmem_min, udp_wmem_min to namespace and init the udp_l3mdev_accept explicitly. The udp_rmem_min/udp_wmem_min affect udp rx/tx queue, with this patch namespaces can set them differently. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 12:03:30 -04:00
David S. Miller	859844e5c2	Merge branch 'net-ipv6-Address-checks-need-to-consider-the-L3-domain' David Ahern says: ==================== net/ipv6: Address checks need to consider the L3 domain IPv6 prohibits a local address from being used as a gateway for a route. However, it is ok for the gateway to be a local address in a different L3 domain (e.g., VRF). This allows, for example, veth pairs to connect VRFs. ip6_route_info_create calls ipv6_chk_addr_and_flags for gateway addresses to determine if the address is a local one, but ipv6_chk_addr_and_flags does not currently consider L3 domains. As a result routes can not be added in one VRF with a nexthop that points to a local address in a second VRF. Resolve by comparing the l3mdev for the passed in device and requiring an l3mdev match with the device containing an address. The intent of checking for an address on the specified device versus any device in the domain is mantained by a new argument to skip the check between the passed in device and the device with the address. Patch 1 moves the gateway validation from ip6_route_info_create into a helper; the function is long enough and refactoring drops the indent level. Patch 2 adds a skip_dev_check argument to ipv6_chk_addr_and_flags to allow a device to always be passed yet skip the device check when looking at addresses and fixes up a few ipv6_chk_addr callers that pass a NULL device. Patch 3 adds l3mdev checks to ipv6_chk_addr_and_flags. Patches 4 and 5 do some refactoring to the fib_tests script and then patch 6 adds nexthop validation tests. v4 - separated l3mdev check into a separate patch (patch 3 of this set) as suggested by Kirill - consolidated dev and ipv6_chk_addr_and_flags call into 1 if (Kirill) - added a temp variable for gw type (Kirill) v3 - set skip_dev_check in ipv6_chk_addr based on dev == NULL (per comment from Ido) v2 - handle 2 variations of route spec with sane error path - add test cases ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:40 -04:00
David Ahern	654d3a7821	selftests: fib_tests: Add IPv6 nexthop spec tests Add series of tests for valid and invalid nexthop specs for IPv6. $ TEST=fib_nexthop_test ./fib_tests.sh ... IPv6 nexthop tests TEST: Directly connected nexthop, unicast address [ OK ] TEST: Directly connected nexthop, unicast address with device [ OK ] TEST: Gateway is linklocal address [ OK ] TEST: Gateway is linklocal address, no device [ OK ] TEST: Gateway can not be local unicast address [ OK ] TEST: Gateway can not be local unicast address, with device [ OK ] TEST: Gateway can not be a local linklocal address [ OK ] TEST: Gateway can be local address in a VRF [ OK ] TEST: Gateway can be local address in a VRF, with device [ OK ] TEST: Gateway can be local linklocal address in a VRF [ OK ] TEST: Redirect to VRF lookup [ OK ] TEST: VRF route, gateway can be local address in default VRF [ OK ] TEST: VRF route, gateway can not be a local address [ OK ] TEST: VRF route, gateway can not be a local addr with device [ OK ] Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:39 -04:00
David Ahern	a511858c75	selftests: fib_tests: Allow user to run a specific test Allow a user to run just a specific fib test by setting the TEST environment variable. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:39 -04:00
David Ahern	171a48717b	selftests: fib_tests: Use an alias for ip command Replace 'ip -netns testns' with the alias IP. Shortens the line lengths and makes running the commands manually a bit easier. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:39 -04:00
David Ahern	1893ff2027	net/ipv6: Add l3mdev check to ipv6_chk_addr_and_flags Lookup the L3 master device for the passed in device. Only consider addresses on netdev's with the same master device. If the device is not enslaved or is NULL, then the l3mdev is NULL which means only devices not enslaved (ie, in the default domain) are considered. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:38 -04:00
David Ahern	232378e8db	net/ipv6: Change address check to always take a device argument ipv6_chk_addr_and_flags determines if an address is a local address and optionally if it is an address on a specific device. For example, it is called by ip6_route_info_create to determine if a given gateway address is a local address. The address check currently does not consider L3 domains and as a result does not allow a route to be added in one VRF if the nexthop points to an address in a second VRF. e.g., $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23 Error: Invalid gateway address. where 2001:db8:102::23 is an address on an interface in vrf r1. ipv6_chk_addr_and_flags needs to allow callers to always pass in a device with a separate argument to not limit the address to the specific device. The device is used used to determine the L3 domain of interest. To that end add an argument to skip the device check and update callers to always pass a device where possible and use the new argument to mean any address in the domain. Update a handful of users of ipv6_chk_addr with a NULL dev argument. This patch handles the change to these callers without adding the domain check. ip6_validate_gw needs to handle 2 cases - one where the device is given as part of the nexthop spec and the other where the device is resolved. There is at least 1 VRF case where deferring the check to only after the route lookup has resolved the device fails with an unintuitive error "RTNETLINK answers: No route to host" as opposed to the preferred "Error: Gateway can not be a local address." The 'no route to host' error is because of the fallback to a full lookup. The check is done twice to avoid this error. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:38 -04:00
David Ahern	9fbb704c33	net/ipv6: Refactor gateway validation on route add Move gateway validation code from ip6_route_info_create into ip6_validate_gw. Code move plus adjustments to handle the potential reset of dev and idev and to make checkpatch happy. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-16 11:28:38 -04:00

1 2 3 4 5 ...

739241 Commits All Branches Search

739241 Commits

All Branches