linux_old1

Commit Graph

Author	SHA1	Message	Date
Krzysztof Kozlowski	6979b9cf58	net: Drop owner assignment from platform_driver platform_driver does not need to set an owner because platform_driver_register() will set it. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 23:18:29 -07:00
David Thomson	239aa55b94	net: phy: Support setting polarity in marvell phy driver Support manually setting the polarity to mdi or mdix Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 23:17:32 -07:00
David Thomson	634ec36cc0	net: phy: Pass mdix ethtool setting through to phy driver Pass the mdix setting from ethtool down to the phy driver, to allow driver specific implementations of manually setting the polarity. Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 23:17:32 -07:00
Eric Dumazet	a4e2405cc5	tcp: do not export tcp_init_xmit_timers() After commit `900f65d361` ("tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()"), we no longer need to export tcp_init_xmit_timers() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:44:38 -07:00
Nikolay Aleksandrov	09cf0211f9	bridge: mdb: fill state in br_mdb_notify Fill also the port group state when sending notifications. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:41:11 -07:00
Masatake YAMATO	cb1c61680d	route: remove unsed variable in __mkroute_input flags local variable in __mkroute_input is not used as a variable. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:34:34 -07:00
Tom Herbert	35a256fee5	ipv6: Nonlocal bind Add support to allow non-local binds similar to how this was done for IPv4. Non-local binds are very useful in emulating the Internet in a box, etc. This add the ip_nonlocal_bind sysctl under ipv6. Testing: Set up nonlocal binding and receive routing on a host, e.g.: ip -6 rule add from ::/0 iif eth0 lookup 200 ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200 sysctl -w net.ipv6.ip_nonlocal_bind=1 Set up routing to 2001:0:0:1::/64 on peer to go to first host ping6 -I 2001:0:0:1::1 peer-address -- to verify Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:09:10 -07:00
David S. Miller	5a10ececc6	Merge branch 'tw_cleanups' Eric Dumazet says: ==================== inet: timewait cleanups Another round of patches to make tw handling simpler. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:12:21 -07:00
Eric Dumazet	dbe7faa404	inet: inet_twsk_deschedule factorization inet_twsk_deschedule() calls are followed by inet_twsk_put(). Only particular case is in inet_twsk_purge() but there is no point to defer the inet_twsk_put() after re-enabling BH. Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put() and move the inet_twsk_put() inside. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:12:20 -07:00
Eric Dumazet	fc01538f9f	inet: simplify timewait refcounting timewait sockets have a complex refcounting logic. Once we realize it should be similar to established and syn_recv sockets, we can use sk_nulls_del_node_init_rcu() and remove inet_twsk_unhash() In particular, deferred inet_twsk_put() added in commit `13475a30b6` ("tcp: connect() race with timewait reuse") looks unecessary : When removing a timewait socket from ehash or bhash, caller must own a reference on the socket anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:12:20 -07:00
Eric Dumazet	3fd2f1b9d9	inet: remove BUG_ON() in twsk_destructor() Kernel will crash the same if one of the pointer is NULL anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:12:20 -07:00
Florian Westphal	8b58a39846	ipv6: use flag instead of u16 for hop in inet6_skb_parm Hop was always either 0 or sizeof(struct ipv6hdr). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:06:59 -07:00
Aleksey S. Kazantsev	7c3d0d67d5	dsa: mv88e6352/mv88e6xxx: Add support for Marvell 88E6320 and 88E6321 MV88E6320 and MV88E6321 are largely compatible to MV886352, but are members of a different chip family. Signed-off-by: Aleksey S. Kazantsev <ioctl@yandex.ru> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:34:23 -07:00
David S. Miller	986ca37eae	Merge branch 'tcp-in-slow-start' Yuchung Cheng says: ==================== tcp: fixes some congestion control corner cases This patch series fixes corner cases of TCP congestion control. First issue is to avoid continuing slow start when cwnd reaches ssthresh. Second issue is incorrectly processing order of congestion state and cwnd update when entering fast recovery or undoing cwnd. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:53 -07:00
Yuchung Cheng	b20a3fa30a	tcp: update congestion state first before raising cwnd The congestion state and cwnd can be updated in the wrong order. For example, upon receiving a dubious ACK, we incorrectly raise the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because the state is still Open, then enter recovery state to reduce cwnd. For another example, if the ACK indicates spurious timeout or retransmits, we first revert the cwnd reduction and congestion state back to Open state. But we don't raise the cwnd even though the ACK does not indicate any congestion. To fix this problem we should first call tcp_fastretrans_alert() to process the dubious ACK and update the congestion state, then call tcp_may_raise_cwnd() that raises cwnd based on the current state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:52 -07:00
Yuchung Cheng	76174004a0	tcp: do not slow start when cwnd equals ssthresh In the original design slow start is only used to raise cwnd when cwnd is stricly below ssthresh. It makes little sense to slow start when cwnd == ssthresh: especially when hystart has set ssthresh in the initial ramp, or after recovery when cwnd resets to ssthresh. Not doing so will also help reduce the buffer bloat slightly. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:52 -07:00
Yuchung Cheng	071d5080e3	tcp: add tcp_in_slow_start helper Add a helper to test the slow start condition in various congestion control modules and other places. This is to prepare a slight improvement in policy as to exactly when to slow start. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:52 -07:00
Alexander Duyck	1007f59dce	net: skb_defer_rx_timestamp should check for phydev before setting up classify This change makes it so that the call skb_defer_rx_timestamp will first check for a phydev before going in and manipulating the skb->data and skb->len values. By doing this we can avoid unnecessary work on network devices that don't support phydev. As a result we reduce the total instruction count needed to process this on most devices. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:17:15 -07:00
Jon Maxwell	2251ae46af	tcp: v1 always send a quick ack when quickacks are enabled V1 of this patch contains Eric Dumazet's suggestion to move the per dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric. I ran some tests and after setting the "ip route change quickack 1" knob there were still many delayed ACKs sent. This occured because when icsk_ack.quick=0 the !icsk_ack.pingpong value is subsequently ignored as tcp_in_quickack_mode() checks both these values. The condition for a quick ack to trigger requires that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently only icsk_ack.pingpong is controlled by the knob. But the icsk_ack.quick value changes dynamically depending on heuristics. The crux of the matter is that delayed acks still cannot be entirely disabled even with the RTAX_QUICKACK per dst knob enabled. This patch ensures that a quick ack is always sent when the RTAX_QUICKACK per dst knob is turned on. The "ip route change quickack 1" knob was recently added to enable quickacks. It was modeled around the TCP_QUICKACK setsockopt() option. This issue is that even with "ip route change quickack 1" enabled we still see delayed ACKs under some conditions. It would be nice to be able to completely disable delayed ACKs. Here is an example: # netstat -s\|grep dela 3 delayed acks sent For all routes enable the knob # ip route change quickack 1 Generate some traffic across a slow link and we still see the delayed acks. # netstat -s\|grep dela 106 delayed acks sent 1 delayed acks further delayed because of locked socket The issue is that both the "ip route change quickack 1" knob and the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0. However at the business end in the __tcp_ack_snd_check() routine, tcp_in_quickack_mode() checks that both icsk_ack.quick != 0 and icsk_ack.pingpong=0 in order to trigger a quickack. As icsk_ack.quick is determined by heuristics it can be 0. When that occurs the icsk_ack.pingpong value is ignored and a delayed ACK is sent regardless. This patch moves the RTAX_QUICKACK per dst check into the tcp_in_quickack_mode() routine which ensures that a quickack is always sent when the quickack knob is enabled for that dst. Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:15:44 -07:00
Scott Feldman	77a58c741d	rocker: add change MTU support Implement ndo_change_mtu: on MTU change, reallocate Rx ring bufs and signal HW of new port MTU value. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 00:31:14 -07:00
Vaishali Thakkar	910be1abbe	neterion: s2io: Use module_pci_driver Use module_pci_driver for drivers whose init and exit functions only register and unregister, respectively. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @a@ identifier f, x; @@ -static f(...) { return pci_register_driver(&x); } @b depends on a@ identifier e, a.x; statement S; @@ -static e(...) { -pci_unregister_driver(&x); -DBG_PRINT(INIT_DBG,"S"); - } @c depends on a && b@ identifier a.f; declarer name module_init; @@ -module_init(f); @d depends on a && b && c@ identifier b.e, a.x; declarer name module_exit; declarer name module_pci_driver; @@ -module_exit(e); +module_pci_driver(x); Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 00:19:00 -07:00
Hariprasad Shenai	71d3c0b49a	cxgb4vf: Fix check to use new User Doorbell mechanism If we don't have access to the new User GTS (T5+), use the old doorbell mechanism; otherwise use the new BAR2 mechanism. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 00:02:01 -07:00
Xi Wang	ba29becd77	test_bpf: extend tests for 32-bit endianness conversion Currently "ALU_END_FROM_BE 32" and "ALU_END_FROM_LE 32" do not test if the upper bits of the result are zeros (the arm64 JIT had such bugs). Extend the two tests to catch this. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:19:53 -07:00
David S. Miller	ca661a2888	Merge branch 'cxgb4-t6' Hariprasad Shenai says: ==================== Cleanup, T6 changes and register range update This patch series adds the following: Don't use entire L2T table, update register ranges for T6 adapter, read stats for only available channels for T6 and enable cim_la dump for T6 adapter also. This patch series has been created against net-next tree and includes patches on cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:13:55 -07:00
Hariprasad Shenai	b7660642b7	cxgb4: Enable cim_la dump to support T6 Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:13:55 -07:00
Hariprasad Shenai	df459ebc33	cxgb4: Read stats for only available channels Updating the driver to read the stats of only available channels. T6 and later has only 2 channels Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:13:54 -07:00
Hariprasad Shenai	5b4e83e133	cxgb4: Update register ranges for T6 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:13:54 -07:00
Hariprasad Shenai	5be9ed8d49	cxgb4: Don't use entire L2T table, use only its slice The driver was retrieving the parameters for the bounds of its slice of the L2T from the firmware and then throwing those away and using the entire table. This corrects that problem. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:13:54 -07:00
Vaishali Thakkar	b11b6ed0f9	net: ec_bhf: Use module_pci_driver Use module_pci_driver for drivers whose init and exit functions only register and unregister, respectively. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @a@ identifier f, x; @@ -static f(...) { return pci_register_driver(&x); } @b depends on a@ identifier e, a.x; @@ -static e(...) { pci_unregister_driver(&x); } @c depends on a && b@ identifier a.f; declarer name module_init; @@ -module_init(f); @d depends on a && b && c@ identifier b.e, a.x; declarer name module_exit; declarer name module_pci_driver; @@ -module_exit(e); +module_pci_driver(x); Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:04:06 -07:00
Haiyang Zhang	f9cbce34c3	hv_netvsc: Add support to set MTU reservation from guest side When packet encapsulation is in use, the MTU needs to be reduced for headroom reservation. The existing code takes the updated MTU value only from the host side. But vSwitch extensions, such as Open vSwitch, require the flexibility to change the MTU to different values from within a guest during the lifecycle of a vNIC, when the encapsulation protocol is changed. The patch supports this kind of MTU changes. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:00:56 -07:00
Eric Dumazet	9e29e21a9b	ifb: add multiqueue operation Add multiqueue capabilities to ifb netdevice. This removes last bottleneck for ingress when mq qdisc can be used to shard load from multiple RX queues on physical device. Tested: # netem based setup, installed at receiver side ETH=eth0 IFB=ifb10 EST="est 1sec 4sec" # Optional rate estimator RTT_HALF=2ms #REORDER=20us #LOSS="loss 1" TXQ=8 ip link add ifb10 numtxqueues $TXQ type ifb ip link set dev $IFB up tc qdisc add dev $ETH ingress 2>/dev/null tc filter add dev $ETH parent ffff: \ protocol ip u32 match u32 0 0 flowid 1:1 \ action mirred egress redirect dev $IFB tc qdisc del dev $IFB root 2>/dev/null tc qdisc add dev $IFB root handle 1: mq for i in `seq 1 $TXQ` do slot=$( printf %x $(( i )) ) tc qd add dev $IFB parent 1:$slot $EST netem \ limit 100000 delay $RTT_HALF $REORDER $LOSS done lpaa24:~# tc -s -d qd sh dev ifb10 qdisc mq 1: root Sent 316544766 bytes 5265927 pkt (dropped 0, overlimits 0 requeues 0) backlog 98880b 1648p requeues 0 qdisc netem 8002: parent 1:1 limit 100000 delay 2.0ms Sent 39601416 bytes 658721 pkt (dropped 0, overlimits 0 requeues 0) rate 38235Kbit 79657pps backlog 12240b 204p requeues 0 qdisc netem 8003: parent 1:2 limit 100000 delay 2.0ms Sent 39472866 bytes 657227 pkt (dropped 0, overlimits 0 requeues 0) rate 38234Kbit 79655pps backlog 10620b 176p requeues 0 qdisc netem 8004: parent 1:3 limit 100000 delay 2.0ms Sent 39703417 bytes 659699 pkt (dropped 0, overlimits 0 requeues 0) rate 38320Kbit 79831pps backlog 12780b 213p requeues 0 qdisc netem 8005: parent 1:4 limit 100000 delay 2.0ms Sent 39565149 bytes 658011 pkt (dropped 0, overlimits 0 requeues 0) rate 38174Kbit 79530pps backlog 11880b 198p requeues 0 qdisc netem 8006: parent 1:5 limit 100000 delay 2.0ms Sent 39506078 bytes 657354 pkt (dropped 0, overlimits 0 requeues 0) rate 38195Kbit 79571pps backlog 12480b 208p requeues 0 qdisc netem 8007: parent 1:6 limit 100000 delay 2.0ms Sent 39675994 bytes 658849 pkt (dropped 0, overlimits 0 requeues 0) rate 38323Kbit 79838pps backlog 12600b 210p requeues 0 qdisc netem 8008: parent 1:7 limit 100000 delay 2.0ms Sent 39532042 bytes 658367 pkt (dropped 0, overlimits 0 requeues 0) rate 38177Kbit 79536pps backlog 13140b 219p requeues 0 qdisc netem 8009: parent 1:8 limit 100000 delay 2.0ms Sent 39488164 bytes 657705 pkt (dropped 0, overlimits 0 requeues 0) rate 38192Kbit 79568pps backlog 13Kb 222p requeues 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:00:09 -07:00
Hariprasad Shenai	81aa507981	cxgb4: Add PCI device ids for few more T5 and T6 adapters Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 15:53:11 -07:00
Carol Soto	0beb44b065	net/mlx4_core: Add extra check for total vfs for SRIOV Add extra check for total vfs for SRIOV to check if that value is bigger than total vfs in pci SRIOV capabalities. Fix a check and print of the number of maximum vfs that hw can handle. Fix a check and print of the number of maximum vfs per port that driver can handle. Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 15:19:54 -07:00
Michael Holzheu	d912557b34	samples: bpf: enable trace samples for s390x The trace bpf samples do not compile on s390x because they use x86 specific fields from the "pt_regs" structure. Fix this and access the fields via new PT_REGS macros. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 15:17:45 -07:00
Punnaiah Choudary Kalluri	7baaa9092d	net: macb: Add SG support for Zynq SOC family Enable SG support for Zynq SOC family devices. Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 14:42:46 -07:00
Li, Liang Z	6ab13b2769	xen-netback: remove duplicated function definition There are two duplicated xenvif_zerocopy_callback() definitions. Remove one of them. Signed-off-by: Liang Li <liang.z.li@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 14:12:53 -07:00
David S. Miller	8685255ec5	Merge branch 'sch_act_lockless' Eric Dumazet says: ==================== net_sched: act: lockless operation As mentioned by Alexei last week in Budapest, it is a bit weird to take a spinlock in order to drop a packet in a tc filter... Lets add percpu infra for tc actions and use it for gact & mirred. Before changes, my host with 8 RX queues was handling 5 Mpps with gact, and more than 11 Mpps after. Mirred change is not yet visible if ifb+qdisc is used, as ifb is not yet multi queue enabled, but is a step forward. ==================== Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	2ee22a90c7	net_sched: act_mirred: remove spinlock in fast path Like act_gact, act_mirred can be lockless in packet processing 1) Use percpu stats 2) update lastuse only every clock tick to avoid false sharing 3) use rcu to protect tcfm_dev 4) Remove spinlock usage, as it is no longer needed. Next step : add multi queue capability to ifb device Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	56e5d1ca18	net_sched: act_gact: remove spinlock in fast path Final step for gact RCU operation : 1) Use percpu stats 2) update lastuse only every clock tick to avoid false sharing 3) Remove spinlock acquisition, as it is no longer needed. Since this is the last contended lock in packet RX when tc gact is used, this gives impressive gain. My host with 8 RX queues was handling 5 Mpps before the patch, and more than 11 Mpps after patch. Tested: On receiver : dev=eth0 tc qdisc del dev $dev ingress 2>/dev/null tc qdisc add dev $dev ingress tc filter del dev $dev root pref 10 2>/dev/null tc filter del dev $dev pref 10 2>/dev/null tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \ u32 match ip src 7.0.0.0/8 flowid 1:15 action drop Sender sends packets flood from 7/8 network Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	8f2ae965b7	net_sched: act_gact: read tcfg_ptype once Third step for gact RCU operation : Following patch will get rid of spinlock protection, so we need to read tcfg_ptype once. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	cc6510a950	net_sched: act_gact: use a separate packet counters for gact_determ() Second step for gact RCU operation : We want to get rid of the spinlock protecting gact operations. Stats (packets/bytes) will soon be per cpu. gact_determ() would not work without a central packet counter, so lets add it for this mode. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	cef5ecf96b	net_sched: act_gact: make tcfg_pval non zero First step for gact RCU operation : Instead of testing if tcfg_pval is zero or not, just make it 1. No change in behavior, but slightly faster code. The smp_rmb()/smp_wmb() barriers, while not strictly needed at this stage are added for upcoming spinlock removal. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	519c818e8f	net: sched: add percpu stats to actions Reuse existing percpu infrastructure John Fastabend added for qdisc. This patch adds a new cpustats parameter to tcf_hash_create() and all actions pass false, meaning this patch should have no effect yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:41 -07:00
Eric Dumazet	24ea591d22	net: sched: extend percpu stats helpers qdisc_bstats_update_cpu() and other helpers were added to support percpu stats for qdisc. We want to add percpu stats for tc action, so this patch add common helpers. qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update() qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:41 -07:00
Eric Dumazet	0a6d424569	mlx4: TCP/UDP packets have L4 hash Mellanox driver has the knowledge if rxhash is a L4 hash, if it receives a non fragmented TCP or UDP frame and NETIF_F_RXCSUM is enabled on netdev. ip_summed value is CHECKSUM_UNNECESSARY in this case. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Ido Shamay <idos@mellanox.com> Acked-by: Ido Shamay <idos@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:45:22 -07:00
David S. Miller	71c3353775	Merge branch 'tcp-policer-drops' Yuchung Cheng says: ==================== tcp: reducing lost retransmits in recovery This patch series reduces lost retransmits in recovery, in particular when dealing with traffic policers. The main problem is that slow start in recovery under policing can cause massive lost and retransmit storms: any excess sending rate turns into drops. The solution is to avoid doing slow start when lost retransmit is detected and use packet conservation instead. On networks with traffic policers the patches have lowered the TCP loss rates by ~20% from Google servers without latency regressions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:29:46 -07:00
Yuchung Cheng	3759824da8	tcp: PRR uses CRB mode by default and SS mode conditionally PRR slow start is often too aggressive especially when drops are caused by traffic policers. The policers mainly use token bucket to enforce the rate so sending (twice) faster than the delivery rate causes excessive drops. This patch changes PRR to the conservative reduction bound (CRB) mode in RFC 6937 by default. CRB follows the packet conservation rule to send at most the delivery rate by default. But if many packets are lost and the pipe is empty, CRB may take N round trips to repair N losses. We conditionally turn on slow start mode if all these conditions are made to speed up the recovery: 1) on the second round or later in recovery 2) retransmission sent in the previous round is delivered on this ACK 3) no retransmission is marked lost on this ACK By using packet conservation by default, this change reduces the loss retransmits signicantly on networks that deploy traffic policers, up to 20% reduction of overall loss rate. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:29:46 -07:00
Yuchung Cheng	291a00d1a7	tcp: reduce cwnd if retransmit is lost in CA_Loss If the retransmission in CA_Loss is lost again, we should not continue to slow start or raise cwnd in congestion avoidance mode. Instead we should enter fast recovery and use PRR to reduce cwnd, following the principle in RFC5681: "... or the loss of a retransmission, should be taken as two indications of congestion and, therefore, cwnd (and ssthresh) MUST be lowered twice in this case." This is especially important to reduce loss when the CA_Loss state was caused by a traffic policer dropping the entire inflight. The CA_Loss state has a problem where a loss of L packets causes the sender to send a burst of L packets. So a policer that's dropping most packets in a given RTT can cause a huge retransmit storm. By contrast, PRR includes logic to bound the number of outbound packets that result from a given ACK. So switching to CA_Recovery on lost retransmits in CA_Loss avoids this retransmit storm problem when in CA_Loss. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:29:45 -07:00
Hariprasad Shenai	fda8b18c51	cxgb4: Fix incorrect sequence numbers shown in devlog Part of commit 49aa284fe64c4c1 ("cxgb4: Add support for devlog") change introduced a real bug where the Device Log Sequence Numbers are no longer being converted from firmware Big-Endian to local CPU-Endian format. This patch moves all of the translation into the devlog_show() routine. The only endianness code now in devlog_open() is the small loop to find the earliest (lowest Sequence Number) Device Log entry in the circular buffer. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:54:02 -07:00
Angga	4c938d22c8	ipv6: Make MLD packets to only be processed locally Before commit `daad151263` ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") MLD packets were only processed locally. After the change, a copy of MLD packet goes through ip6_mr_input, causing MRT6MSG_NOCACHE message to be generated to user space. Make MLD packet only processed locally. Fixes: `daad151263` ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:52:38 -07:00

1 2 3 4 5 ...

532601 Commits All Branches Search

532601 Commits

All Branches