linux

Commit Graph

Author	SHA1	Message	Date
David S. Miller	e3e37e7017	Merge branch 'vhost_net-batching' Jason Wang says: ==================== vhost_net tx batching This series tries to implement tx batching support for vhost. This was done by using MSG_MORE as a hint for under layer socket. The backend (e.g tap) can then batch the packets temporarily in a list and submit it all once the number of bacthed exceeds a limitation. Tests shows obvious improvement on guest pktgen over over mlx4(noqueue) on host: Mpps -+% rx-frames = 0 0.91 +0% rx-frames = 4 1.00 +9.8% rx-frames = 8 1.00 +9.8% rx-frames = 16 1.01 +10.9% rx-frames = 32 1.07 +17.5% rx-frames = 48 1.07 +17.5% rx-frames = 64 1.08 +18.6% rx-frames = 64 (no MSG_MORE) 0.91 +0% Changes from V4: - stick to NAPI_POLL_WEIGHT for rx-frames is user specify a value greater than it. Changes from V3: - use ethtool instead of module parameter to control the maximum number of batched packets - avoid overhead when MSG_MORE were not set and no packet queued Changes from V2: - remove uselss queue limitation check (and we don't drop any packet now) Changes from V1: - drop NAPI handler since we don't use NAPI now - fix the issues that may exceeds max pending of zerocopy - more improvement on available buffer detection - move the limitation of batched pacekts from vhost to tuntap ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:30 -05:00
Jason Wang	5503fcecd4	tun: rx batching We can only process 1 packet at one time during sendmsg(). This often lead bad cache utilization under heavy load. So this patch tries to do some batching during rx before submitting them to host network stack. This is done through accepting MSG_MORE as a hint from sendmsg() caller, if it was set, batch the packet temporarily in a linked list and submit them all once MSG_MORE were cleared. Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host: Mpps -+% rx-frames = 0 0.91 +0% rx-frames = 4 1.00 +9.8% rx-frames = 8 1.00 +9.8% rx-frames = 16 1.01 +10.9% rx-frames = 32 1.07 +17.5% rx-frames = 48 1.07 +17.5% rx-frames = 64 1.08 +18.6% rx-frames = 64 (no MSG_MORE) 0.91 +0% User were allowed to change per device batched packets through ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation to prevent bh from being disabled too long. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:30 -05:00
Jason Wang	0ed005ce02	vhost_net: tx batching This patch tries to utilize tuntap rx batching by peeking the tx virtqueue during transmission, if there's more available buffers in the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap) to batch the packets. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:30 -05:00
Jason Wang	275bf960ac	vhost: better detection of available buffers This patch tries to do several tweaks on vhost_vq_avail_empty() for a better performance: - check cached avail index first which could avoid userspace memory access. - using unlikely() for the failure of userspace access - check vq->last_avail_idx instead of cached avail index as the last step. This patch is need for batching supports which needs to peek whether or not there's still available buffers in the ring. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:29 -05:00
Mao Wenan	1a8b6d76dc	net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can enhance the performance for some cpu architecure, such as SPARC and so on. Currently it only supports one special cpu architecture(SPARC) in 82599 driver to enable RO feature, this is not very common for other cpu architecture which really needs RO feature. This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature, and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly. Signed-off-by: Mao Wenan <maowenan@huawei.com> Reviewed-by: Alexander Duyck <alexander.duyck@gmail.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:33:00 -05:00
David S. Miller	1e48aac14c	Merge branch 'ipv6-simplify-rt6_fill_node' David Ahern says: ==================== net: ipv6: simplify rt6_fill_node Remove a couple of unnecessary input arguments to rt6_fill_node. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 15:44:00 -05:00
David Ahern	f8cfe2ceb1	net: ipv6: remove prefix arg to rt6_fill_node The prefix arg to rt6_fill_node is non-0 in only 1 path - rt6_dump_route where a user is requesting a prefix only dump. Simplify rt6_fill_node by removing the prefix arg and moving the prefix check to rt6_dump_route. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 15:43:59 -05:00
David Ahern	fd61c6ba31	net: ipv6: remove nowait arg to rt6_fill_node All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and simplify rt6_fill_node accordingly. rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the nowait arg from it as well. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 15:43:59 -05:00
David S. Miller	1ce463dd75	Merge branch 'sctp-sender-side-stream-reconf-ssn-reset-request-chunk' Xin Long says: ==================== sctp: add sender-side procedures for stream reconf ssn reset request chunk Patch 6/6 is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3 Patches 1-5/6 are ahead of it to define some apis and asoc members for it. Note that with this patchset, asoc->reconf_enable has no chance yet to be set, until the patch "sctp: add get and set sockopt for reconf_enable" is applied in the future. As we can not just enable it when sctp is not capable of processing reconf chunk yet. v1->v2: - put these into a smaller group. - rename some temporary variables in the codes. - rename the titles of the commits and improve some changelogs. v2->v3: - re-split the patchset and make sure it has no dead codes for review. v3->v4: - move sctp_make_reconf() into patch 1/6 to avoid kbuild warning. - drop unused struct sctp_strreset_req. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:11 -05:00
Xin Long	7f9d68ac94	sctp: implement sender-side procedures for SSN Reset Request Parameter This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:11 -05:00
Xin Long	9fb657aec0	sctp: add sockopt SCTP_ENABLE_STREAM_RESET This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:10 -05:00
Xin Long	c28445c3cb	sctp: add reconf_enable in asoc ep and netns This patch is to add reconf_enable field in all of asoc ep and netns to indicate if they support stream reset. When initializing, asoc reconf_enable get the default value from ep reconf_enable which is from netns netns reconf_enable by default. It is also to add reconf_capable in asoc peer part to know if peer supports reconf_enable, the value is set if ext params have reconf chunk support when processing init chunk, just as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:10 -05:00
Xin Long	7a090b0452	sctp: add stream reconf primitive This patch is to add a primitive based on sctp primitive frame for sending stream reconf request. It works as the other primitives, and create a SCTP_CMD_REPLY command to send the request chunk out. sctp_primitive_RECONF would be the api to send a reconf request chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:10 -05:00
Xin Long	7b9438de0c	sctp: add stream reconf timer This patch is to add a per transport timer based on sctp timer frame for stream reconf chunk retransmission. It would start after sending a reconf request chunk, and stop after receiving the response chunk. If the timer expires, besides retransmitting the reconf request chunk, it would also do the same thing with data RTO timer. like to increase the appropriate error counts, and perform threshold management, possibly destroying the asoc if sctp retransmission thresholds are exceeded, just as section 5.1.1 describes. This patch is also to add asoc strreset_chunk, it is used to save the reconf request chunk, so that it can be retransmitted, and to check if the response is really for this request by comparing the information inside with the response chunk as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:10 -05:00
Xin Long	cc16f00f65	sctp: add support for generating stream reconf ssn reset request chunk This patch is to add asoc strreset_outseq and strreset_inseq for saving the reconf request sequence, initialize them when create assoc and process init, and also to define Incoming and Outgoing SSN Reset Request Parameter described in rfc6525 section 4.1 and 4.2, As they can be in one same chunk as section rfc6525 3.1-3 describes, it makes them in one function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:09 -05:00
David S. Miller	b16ed2b1d0	Merge branch 'rework-inet_csk_get_port' Josef Bacik says: ==================== Rework inet_csk_get_port V3->V4: -Removed the random include of addrconf.h that is no longer needed. V2->V3: -Dropped the fastsock from the tb and instead just carry the saddrs, family, and ipv6 only flag. -Reworked the helper functions to deal with this change so I could still use them when checking the fast path. -Killed tb->num_owners as per Eric's request. -Attached a reproducer to the bottom of this email. V1->V2: -Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one' at Hannes' suggestion. -Dropped ->bind_conflict and just use the new helper. -Fixed a compile bug from the original ->bind_conflict patch. The original description of the series follows: At some point recently the guys working on our load balancer added the ability to use SO_REUSEPORT. When they restarted their app with this option enabled they immediately hit a softlockup on what appeared to be the inet_bind_bucket->lock. Eventually what all of our debugging and discussion led us to was the fact that the application comes up without SO_REUSEPORT, shuts down which creates around 100k twsk's, and then comes up and tries to open a bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket owners list under the lock. Since this lock is needed for dealing with the twsk's and basically anything else related to connections we would softlockup, and sometimes not ever recover. To solve this problem I did what you see in Path 5/5. Once we have a SO_REUSEPORT socket on the tb->owners list we know that the socket has no conflicts with any of the other sockets on that list. So we can add a copy of the sock_common (really all we need is the recv_saddr but it seemed ugly to copy just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've copied the whole common) in order to check subsequent SO_REUSEPORT sockets. If they match the previous one then we can skip the expensive inet_csk_bind_conflict check. This is what eliminated the soft lockup that we were seeing. Patches 1-4 are cleanups and re-workings. For instance when we specify port == 0 we need to find an open port, but we would do two passes through inet_csk_bind_conflict every time we found a possible port. We would also keep track of the smallest_port value in order to try and use it if we found no port our first run through. This however made no sense as it would have had to fail the first pass through inet_csk_bind_conflict, so would not actually pass the second pass through either. Finally I split the function into two functions in order to make it easier to read and to distinguish between the two behaviors. I have tested this on one of our load balancing boxes during peak traffic and it hasn't fallen over. But this is not my area, so obviously feel free to point out where I'm being stupid and I'll get it fixed up and retested. Thanks, ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:30 -05:00
Josef Bacik	637bc8bbe6	inet: reset tb->fastreuseport when adding a reuseport sk If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and never set it again. Which means that in the future if we end up adding a bunch of reuseport sk's to that tb we'll have to do the expensive scan every time. Instead add the ipv4/ipv6 saddr fields to the bind bucket, as well as the family so we know what comparison to make, and the ipv6 only setting so we can make sure to compare with new sockets appropriately. Once one sk has made it onto the list we know that there are no potential bind conflicts on the owners list that match that sk's rcv_addr. So copy the sk's information into our bind bucket and set tb->fastruseport to FASTREUSESOCK_STRICT so we know we have to do an extra check for subsequent reuseport sockets and skip the expensive bind conflict check. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:29 -05:00
Josef Bacik	289141b768	inet: split inet_csk_get_port into two functions inet_csk_get_port does two different things, it either scans for an open port, or it tries to see if the specified port is available for use. Since these two operations have different rules and are basically independent lets split them into two different functions to make them both more readable. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:29 -05:00
Josef Bacik	6cd6661683	inet: don't check for bind conflicts twice when searching for a port This is just wasted time, we've already found a tb that doesn't have a bind conflict, and we don't drop the head lock so scanning again isn't going to give us a different answer. Instead move the tb->reuse setting logic outside of the found_tb path and put it in the success: path. Then make it so that we don't goto again if we find a bind conflict in the found_tb path as we won't reach this anymore when we are scanning for an ephemeral port. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:29 -05:00
Josef Bacik	b9470c2760	inet: kill smallest_size and smallest_port In inet_csk_get_port we seem to be using smallest_port to figure out where the best place to look for a SO_REUSEPORT sk that matches with an existing set of SO_REUSEPORT's. However if we get to the logic if (smallest_size != -1) { port = smallest_port; goto have_port; } we will do a useless search, because we would have already done the inet_csk_bind_conflict for that port and it would have returned 1, otherwise we would have gone to found_tb and succeeded. Since this logic makes us do yet another trip through inet_csk_bind_conflict for a port we know won't work just delete this code and save us the time. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:29 -05:00
Josef Bacik	aa078842b7	inet: drop ->bind_conflict The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict is how they check the rcv_saddr, so delete this call back and simply change inet_csk_bind_conflict to call inet_rcv_saddr_equal. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:28 -05:00
Josef Bacik	fe38d2a1c8	inet: collapse ipv4/v6 rcv_saddr_equal functions into one We pass these per-protocol equal functions around in various places, but we can just have one function that checks the sk->sk_family and then do the right comparison function. I've also changed the ipv4 version to not cast to inet_sock since it is unneeded. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 13:04:28 -05:00
jpinto	ab70e58626	stmicro: add more information to Kconfig This patch adds more info to stmicro' Kconfig files in order to be clearer that the driver can be used by ethernet cards based on 10/100/1000/EQOS Synopsys IP Cores. EQOS was also added stmmac/Kconfig Kconfig, since dwmac4 is in fact EQoS, one of Synopsys Ethernet IPs. More info at: https://www.synopsys.com/dw/ipdir.php?ds=dwc_ether_qos Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 12:12:04 -05:00
Martin KaFai Lau	d8bec2b29a	net/mlx5e: Support bpf_xdp_adjust_head() This patch adds bpf_xdp_adjust_head() support to mlx5e. 1. rx_headroom is added to struct mlx5e_rq. It uses an existing 4 byte hole in the struct. 2. The adjusted data length is checked against MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu). 3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h. MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason but it is not a must. v2: - Keep the xdp specific logic in mlx5e_xdp_handle() - Update dma_len after the sanity checks in mlx5e_xmit_xdp_frame() Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 11:31:13 -05:00
Jason Baron	0e40f4c959	tcp: accept RST for rcv_nxt - 1 after receiving a FIN Using a Mac OSX box as a client connecting to a Linux server, we have found that when certain applications (such as 'ab'), are abruptly terminated (via ^C), a FIN is sent followed by a RST packet on tcp connections. The FIN is accepted by the Linux stack but the RST is sent with the same sequence number as the FIN, and Linux responds with a challenge ACK per RFC 5961. The OSX client then sometimes (they are rate-limited) does not reply with any RST as would be expected on a closed socket. This results in sockets accumulating on the Linux server left mostly in the CLOSE_WAIT state, although LAST_ACK and CLOSING are also possible. This sequence of events can tie up a lot of resources on the Linux server since there may be a lot of data in write buffers at the time of the RST. Accepting a RST equal to rcv_nxt - 1, after we have already successfully processed a FIN, has made a significant difference for us in practice, by freeing up unneeded resources in a more expedient fashion. A packetdrill test demonstrating the behavior: // testing mac osx rst behavior // Establish a connection 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32768 <mss 1460,nop,wscale 10> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 5> 0.200 < . 1:1(0) ack 1 win 32768 0.200 accept(3, ..., ...) = 4 // Client closes the connection 0.300 < F. 1:1(0) ack 1 win 32768 // now send rst with same sequence 0.300 < R. 1:1(0) ack 1 win 32768 // make sure we are in TCP_CLOSE 0.400 %{ assert tcpi_state == 7 }% Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:51:55 -05:00
Tobias Klauser	a870a97757	net: ethoc: Make needlessly global struct ethtool_ops static Make the needlessly global struct ethtool_ops ethoc_ethtool_ops static to fix a sparse warning. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:50:56 -05:00
David S. Miller	49a67b5db7	Merge branch 'sfc-RX-hash-config' Edward Cree says: ==================== sfc: RX hash configuration This series improves support for getting and setting RX hashing configuration on Solarflare adapters through ethtool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:49:52 -05:00
Edward Cree	a707d18851	sfc: read back RX hash config from the NIC when querying it with ethtool -x Ensures that we report the key and indirection table the NIC is using, rather than (if setting them failed earlier) what we wanted it to use. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:49:52 -05:00
Edward Cree	f74d199519	sfc: support setting RSS hash key through ethtool API Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:49:51 -05:00
Ganesh Goudar	96fe11f27b	cxgb4: Implement ndo_get_phys_port_id for mgmt dev Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:48:13 -05:00
Gao Feng	a7ef6715c4	net: ping: Use right format specifier to avoid type casting The inet_num is u16, so use %hu instead of casting it to int. And the sk_bound_dev_if is int actually, so it needn't cast to int. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:25:55 -05:00
Shyam Saini	0ee28e3155	qed: Replace memset with eth_zero_addr Use eth_zero_addr to assign zero address to the given address array instead of memset when the second argument in memset is address of zero. Also, it makes the code clearer Signed-off-by: Shyam Saini <mayhs11saini@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:24:47 -05:00
Lance Richardson	53631a5f9c	bridge: sparse fixes in br_ip6_multicast_alloc_query() Changed type of csum field in struct igmpv3_query from __be16 to __sum16 to eliminate type warning, made same change in struct igmpv3_report for consistency. Fixed up an ntohs() where htons() should have been used instead. Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 15:22:05 -05:00
David S. Miller	580bdf5650	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-01-17 15:19:37 -05:00
David S. Miller	e60a42635b	Merge branch 'mpls-packet-stats' Robert Shearman says: ==================== mpls: Packet stats This patchset records per-interface packet stats in the MPLS forwarding path and exports them using a nest of attributes root at a new IFLA_STATS_AF_SPEC attribute as part of RTM_GETSTATS messages: [IFLA_STATS_AF_SPEC] -> [AF_MPLS] -> [MPLS_STATS_LINK] -> struct mpls_link_stats The first patch adds the rtnl infrastructure for this, including a new callbacks to per-AF ops of fill_stats_af and get_stats_af_size. The second patch records MPLS stats and makes use of the infrastructure to export them. The rtnl infrastructure could also be used to export IPv6 stats in the future. Changes in v2: - make incrementing IPv6 stats in mpls_stats_inc_outucastpkts conditional on CONFIG_IPV6 to fix build with CONFIG_IPV6=n ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 14:38:44 -05:00
Robert Shearman	27d691056b	mpls: Packet stats Having MPLS packet stats is useful for observing network operation and for diagnosing network problems. In the absence of anything better, RFC2863 and RFC3813 are used for guidance for which stats to expose and the semantics of them. In particular rx_noroutes maps to in unknown protos in RFC2863. The stats are exposed to userspace via AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages. All the introduced fields are 64-bit, even error ones, to ensure no overflow with long uptimes. Per-CPU counters are used to avoid cache-line contention on the commonly used fields. The other fields have also been made per-CPU for code to avoid performance problems in error conditions on the assumption that on some platforms the cost of atomic operations could be more expensive than sending the packet (which is what would be done in the success case). If that's not the case, we could instead not use per-CPU counters for these fields. Only unicast and non-fragment are exposed at the moment, but other counters can be exposed in the future either by adding to the end of struct mpls_link_stats or by additional netlink attributes in the AF_MPLS IFLA_STATS_AF_SPEC nested attribute. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 14:38:43 -05:00
Robert Shearman	aefb4d4ad8	net: AF-specific RTM_GETSTATS attributes Add the functionality for including address-family-specific per-link stats in RTM_GETSTATS messages. This is done through adding a new IFLA_STATS_AF_SPEC attribute under which address family attributes are nested and then the AF-specific attributes can be further nested. This follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the advantage of presenting an easily extended hierarchy. The rtnl_af_ops structure is extended to provide AFs with the opportunity to fill and provide the size of their stats attributes. One alternative would have been to provide AFs with the ability to add attributes directly into the RTM_GETSTATS message without a nested hierarchy. I discounted this approach as it increases the rate at which the 32 attribute number space is used up and it makes implementation a little more tricky for stats dump resuming (at the moment the order in which attributes are added to the message has to match the numeric order of the attributes). Another alternative would have been to register per-AF RTM_GETSTATS handlers. I discounted this approach as I perceived a common use-case to be getting all the stats for an interface and this approach would necessitate multiple requests/dumps to retrieve them all. Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 14:38:43 -05:00
Julia Lawall	a249708bc2	stmmac: add missing of_node_put The function stmmac_dt_phy provides several possibilities for initializing plat->mdio_node, all of which have the effect of increasing the reference count of the assigned value. This field is not updated elsewhere, so the value is live until the end of the lifetime of plat (devm_allocated), just after the end of stmmac_remove_config_dt. Thus, add an of_node_put on plat->mdio_node in stmmac_remove_config_dt. It is possible that the field mdio_node is never initialized, but of_node_put is NULL-safe, so it is also safe to call of_node_put in that case. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 14:10:35 -05:00
Rolf Neugebauer	501db51139	virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit This patch part reverts `fd2a0437dc` and `e858fae2b0` which introduced a subtle change in how the virtio_net flags are derived from the SKBs ip_summed field. With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to ip_summed == CHECKSUM_NONE, which should be the same. Further, the virtio spec 1.0 / CS04 explicitly says that VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver. Fixes: `fd2a0437dc` ("virtio_net: introduce virtio_net_hdr_{from,to}_skb") Fixes: `e858fae2b0` (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 14:07:31 -05:00
Linus Torvalds	4b19a9e20b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Handle multicast packets properly in fast-RX path of mac80211, from Johannes Berg. 2) Because of a logic bug, the user can't actually force SW checksumming on r8152 devices. This makes diagnosis of hw checksumming bugs really annoying. Fix from Hayes Wang. 3) VXLAN route lookup does not take the source and destination ports into account, which means IPSEC policies cannot be matched properly. Fix from Martynas Pumputis. 4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger. 5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky. 6) If lwtunnel_fill_encap() fails, we do not abort the netlink message construction properly in fib_dump_info(), from David Ahern. 7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan Schmidt. 8) Openvswitch conntack actions need to maintain a correct checksum, fix from Lance Richardson. 9) ax25_disconnect() is missing a check for ax25->sk being NULL, in fact it already checks this, but not in all of the necessary spots. Fix from Basil Gunn. 10) Action GET operations in the packet scheduler can erroneously bump the reference count of the entry, making it unreleasable. Fix from Jamal Hadi Salim. Jamal gives a great set of example command lines that trigger this in the commit message. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) net sched actions: fix refcnt when GETing of action after bind net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions net/mlx4_core: Fix racy CQ (Completion Queue) free net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered net/mlx5e: Fix a -Wmaybe-uninitialized warning ax25: Fix segfault after sock connection timeout bpf: rework prog_digest into prog_tag tipc: allocate user memory with GFP_KERNEL flag net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types ip6_tunnel: Account for tunnel header in tunnel MTU mld: do not remove mld souce list info when set link down be2net: fix MAC addr setting on privileged BE3 VFs be2net: don't delete MAC on close on unprivileged BE3 VFs be2net: fix status check in be_cmd_pmac_add() cpmac: remove hopeless #warning ravb: do not use zero-length alignment DMA descriptor mlx4: do not call napi_schedule() without care openvswitch: maintain correct checksum state in conntrack actions tcp: fix tcp_fastopen unaligned access complaints on sparc ...	2017-01-17 09:33:10 -08:00
Linus Torvalds	203f80f1c4	Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fix from Konrad Rzeszutek Wilk: "A tiny fix to make sure that page-sized mappings are page-aligned (and not say straddle two pages). This is important for some drivers (such as NVME)" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: ensure that page-sized mappings are page-aligned	2017-01-17 09:27:50 -08:00
Linus Torvalds	7e84b30355	MMC core: - Fix regressions detecting HS/HS DDR eMMC cards related to CMD6 MMC host: - mmc: mxs-mmc: Fix additional cycles after transmission stop - sdhci-acpi: Only powered up enabled acpi child devices - meson: avoid possible NULL dereference -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYfjT3AAoJEP4mhCVzWIwpA5AQAI46BefvYn72YjrXygGDly2d NTHTvSE6TK+2M1v1fv0w8xlHTTJqfGydzPzpiOgPkOHF6jNp1zGbK0VK60HFstwy YZ6hxJxRvRcZc5tGOFw+UoN2Bfxxt8p7lFZFwWLOMS8b4b95rZrJoCIS3Xd0ED3t lAwJXDfUse0pi0gpR/WFqn1TnKh1lHqapoXe4G/9RWfbQdsRiW9UBBAG4Gk0M/aJ FzCSJJyD4OXgfwZZ2UbUMMWi40kwjC1ayrv+Ejw8g55lnbNZvVRqSg9zHRE5W1Xw pzMD5cbhFsDKOXakH/Gzqzk1HCUfBilgxZbOQ6Q9ysGMjxS/1U74NuvyI5NQEvah W+rV6Q6kKB1ztwbJ47Rn7zpP6bQ6fn73Z9aHYJgrP2hwuU5wG2CCqRQ5w1HIdjYc Fkfo+UHOeKlSVMAvC92t39chyisTVmvHvejyDejuAPrOWWn2aXHfMm92/bn/fvnz jXGoBIQ/D91XIAr/VkkSx2RiS1d8f1qFJ02AFx/mhUD0GiakYjSCgXauVLNy6C4b IKQIJi+SsE5LdSga9secIkItlYRKujAkG8iILRPRIdCALlV0UbUrxnTewK3Kg2NM Eh7wEEa0Sm6VLU1QypIrE9/hKDmLRQdscEb0xTcUozTfjmzRaIkWpFrNELr8Eram PfXBMfb4Bk7VhqQ1VJBq =X9oZ -----END PGP SIGNATURE----- Merge tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - fix regressions detecting HS/HS DDR eMMC cards related to CMD6 MMC host: - mmc: mxs-mmc: Fix additional cycles after transmission stop - sdhci-acpi: Only powered up enabled acpi child devices - meson: avoid possible NULL dereference" * tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Restore parts of the polling policy when switch to HS/HS DDR mmc: mxs-mmc: Fix additional cycles after transmission stop mmc: sdhci-acpi: Only powered up enabled acpi child devices MMC: meson: avoid possible NULL dereference	2017-01-17 09:08:19 -08:00
Linus Torvalds	7d8b8c09d7	MTD fixes for 4.10 Just NAND updates from Boris: " - Forbid compiling xway NAND controller driver as a module - Fix tango NAND DT binding and make sure the controller is in a clean state at probe time - Add dependency on HAS_IOMEM to the oxnas NAND driver - Fix irq number validity check in the lpc32xx driver " -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYfUdiAAoJEFySrpd9RFgt1ewP/RbFupKNH9+tAZPiieMyLdYW 1wmWN9bmvaw4sUhM/QxVdPOQIUtcau2+PlGD8Cfnz3HYZAPic8cnd8+ZSbzLcXcs zDb8HqNWs7vMVcgH3R6dutWaT9RNjJwdFT/QUAgVGdny7Mr0IzenmDkW+GYorXyS ePnJGjZtJCXvGzsdOLa9DqTFDDRYxsrxeBoiP1f0jrDcPdFJFDuhO+dlepznZ5M2 BX80CdBYMnL6k4MVIdHovSiSOvaDaXuTFM8uDCqLFkKJC87ZH4lkBt/KqBRUVoIG tC7q1VWdMr+PoFzJO4jMX5X6PXyYjmoRhxvKxo7cCYMuc9DI3114zT/IIYIW85ps CTN7vojF/iCk6CptoQVF0qq96WaQSDvwaorEMkGYgF321SUoJQ54F3OvuwDU3+pC jSHWoBgTSnljjgxJiy6ZWke+E+bPdGFkcioELMowBdZlw93ZeSv9ieax9WrUuml6 fliy865O9NV02KaZmKG/EsDXNGsFEU/UIhc17YRy6t/g4djoBiQL4o9HsCggiCfh k1GSrB+5pRWVdgz9enZMJ8JoEGvWL+P2u8nQjqL7+4WZqNosNq5VvqOneKkhDIeo E1tbpkk6wBEnzBTdnvJJWTARa5S5Q+uGebMY5ZcM5KbWHJzZDxSaHfaON6os3Edd 6fnVW4tGy30zXkN9AGC3 =AjRR -----END PGP SIGNATURE----- Merge tag 'for-linus-20170116' of git://git.infradead.org/linux-mtd Pull MTD fixes from Brian Norris: "Just NAND updates from Boris: - avoid compiling xway NAND controller driver as a module (which didn't work) - fix tango NAND DT binding and make sure the controller is in a clean state at probe time - add dependency on HAS_IOMEM to the oxnas NAND driver - fix irq number validity check in the lpc32xx driver" * tag 'for-linus-20170116' of git://git.infradead.org/linux-mtd: mtd: nand: lpc32xx: fix invalid error handling of a requested irq mtd: nand: tango: Reset pbus to raw mode in probe mtd: nand: tango: Update DT binding description mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM mtd: nand: xway: fix build because of module functions mtd: nand: xway: disable module support	2017-01-17 08:50:59 -08:00
Philippe Reynes	55f78fcdd8	net: marvell: sky2: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 11:44:28 -05:00
Philippe Reynes	0f826385a6	net: marvell: skge: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. The callback set_link_ksettings no longer update the value of advertising, as the struct ethtool_link_ksettings is defined as const. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 11:44:27 -05:00
Philippe Reynes	c523838c85	net: jme: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 11:44:27 -05:00
Philippe Reynes	af473688ce	net: korina: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-17 11:44:27 -05:00
David S. Miller	b8128c4231	Merge branch 'mvneta-xmit_more-bql' Marcin Wojtas says: ==================== mvneta xmit_more and bql support This is a delayed v2 of short patchset, which introduces xmit_more and BQL to mvneta driver. The only one change was added in xmit_more support - condition check preventing excessive descriptors concatenation before flushing in HW. Any comments or feedback would be welcome. Changelog: v1 -> v2: * Add checking condition that ensures too much descriptors are not concatenated before flushing in HW. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-16 20:07:29 -05:00
Marcin Wojtas	a29b623556	net: mvneta: add BQL support Tests showed that when whole bandwidth is consumed, the latency for various kind of traffic can reach high values. With saturated link (e.g. with iperf from target to host) simple ping could take significant amount of time. BQL proved to improve this situation when implemented in mvneta driver. Measurements of ping latency for 3 link speeds: Speed \| Latency w/o BQL \| Latency with BQL 10 \| 7-14 ms \| 3.5 ms 100 \| 2-12 ms \| 0.6 ms 1000 \| often timeout \| up to 2ms Decreasing latency as above result in sligt performance cost - 4kpps (-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x. For 1500B packets in the same setup, the mpstat tool showed +8% of CPU occupation (default affinity, second CPU idle). Even though this cost seems reasonable to take, considering other improvements. This commit adds byte queue limit mechanism for the mvneta driver. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-16 20:07:29 -05:00
Simon Guinot	2a90f7e1d5	net: mvneta: add xmit_more support Basing on xmit_more flag of the skb, TX descriptors can be concatenated before flushing. This commit delay Tx descriptor flush if the queue is running and if there is more skb's to send. A maximum allowed number of descriptors for flushing at once due to MVNETA_TXQ_UPDATE_REG(q) reqisters limitation, is 255. Because of that a new macro was added (MVNETA_TXQ_DEC_SENT_MASK) in order to ensure that concatenated amount of descriptor does not exceed that value. Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-16 20:07:29 -05:00

1 2 3 4 5 ...

649251 Commits All Branches Search

649251 Commits

All Branches