linux

Commit Graph

Author	SHA1	Message	Date
David S. Miller	f3591fd4c9	Merge branch 'inet_csums' Tom Herbert says: ==================== net: Checksum offload changes - Part IV I am working on overhauling RX checksum offload. Goals of this effort are: - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY - Preserve CHECKSUM_COMPLETE through encapsulation layers - Don't do skb_checksum more than once per packet - Unify GRO and non-GRO csum verification as much as possible - Unify the checksum functions (checksum_init) - Simply code What is in this fourth patch set: - Preserve CHECKSUM_COMPLETE instead of changing it to CHECKSUM_UNNECESSARY. This allows correct reuse in validating multiple csums in a packet. - When SW needs to compute the packet checksum, save it as CHECKSUM_COMPLETE. Also mark that checksum was compute by SW. - Add skb_gro_postpull_rcsum to udp and vxlan to make GRO work with CHECKSUM_COMPLETE. v2: Removed patch setting skb_encapsulation when validating checksum in tcp_gro_receive Please review carefully and test if possible, mucking with basic checksum functions is always a little precarious :-) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:46:17 -07:00
Tom Herbert	6bae1d4cc3	net: Add skb_gro_postpull_rcsum to udp and vxlan Need to gro_postpull_rcsum for GRO to work with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:46:13 -07:00
Tom Herbert	7e3cead517	net: Save software checksum complete In skb_checksum complete, if we need to compute the checksum for the packet (via skb_checksum) save the result as CHECKSUM_COMPLETE. Subsequent checksum verification can use this. Also, added csum_complete_sw flag to distinguish between software and hardware generated checksum complete, we should always be able to trust the software computation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:46:13 -07:00
Tom Herbert	5d0c2b95bc	net: Preserve CHECKSUM_COMPLETE at validation Currently when the first checksum in a packet is validated using CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY so that any subsequent checksums in the packet are not correctly validated. This patch adds csum_valid flag in sk_buff and uses that to indicate validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit is set accordingly in the skb_checksum_validate_* functions. The flag is checked in skb_checksum_complete, so that validation is communicated between checksum_init and checksum_complete sequence in TCP and UDP. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:46:12 -07:00
David S. Miller	1054cc150c	Merge branch 'qlcnic-next' Shahed Shaikh says: ==================== This series contains an enhancement in the area of firmware minidump collection and optimization of ring count validation function. Please apply this series to net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:44:42 -07:00
Shahed Shaikh	038782d6d0	qlcnic: Update version to 5.3.60 Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:44:29 -07:00
Shahed Shaikh	18e0d62533	qlcnic: Optimize ring count validations - Check interrupt mode at the start of qlcnic_set_channels(). - Do not validate ring count if they are not going to change. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:44:29 -07:00
Shahed Shaikh	4da005cf1e	qlcnic: Pre-allocate DMA buffer used for minidump collection Pre-allocate the physically contiguous DMA buffer used for minidump collection at driver load time, rather than at run time, to minimize allocation failures. Driver will allocate the buffer at load time if PEX DMA support capability is indicated by the adapter. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:44:29 -07:00
Dmitry Popov	efd0f11d85	ip_vti: fix sparse warnings for VTI_ISVTI This patch fixes the following sparse warnings: net/ipv4/ip_tunnel.c:245:53: warning: restricted __be16 degrades to integer net/ipv4/ip_vti.c:321:19: warning: incorrect type in assignment (different base types) net/ipv4/ip_vti.c:321:19: expected restricted __be16 [addressable] [assigned] [usertype] i_flags net/ipv4/ip_vti.c:321:19: got int net/ipv4/ip_vti.c:447:24: warning: incorrect type in assignment (different base types) net/ipv4/ip_vti.c:447:24: expected restricted __be16 [usertype] i_flags net/ipv4/ip_vti.c:447:24: got int Since VTI_ISVTI is always used with ip_tunnel_parm->i_flags (which is __be16), we can __force cast VTI_ISVTI to __be16 in header file. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
Dan Carpenter	2f87208efb	drivers: net: davinci_cpdma: double free on error We recently change the kzalloc() to devm_kzalloc() so freeing "ctlr" here could lead to a double free. Fixes: `e194312854` ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
Dan Carpenter	8fc908c3c3	amd-xgbe: unwind on error in xgbe_mdio_register() There is a typo here so we return directly instead of unwinding. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
Varka Bhadram	0aaf43f51d	mrf24j40: add device managed APIs adds the device managed APIs so that no need worry about freeing the resources. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
stephen hemminger	f647944995	ceph: remove bogus extern Sparse complained about this bogus extern on definition of a function. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
Alexei Starovoitov	783e327b69	net: filter: document internal instruction encoding This patch adds a description of eBPFs instruction encoding in order to bring the documentation in line with the implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:18 -07:00
Alexei Starovoitov	e4ad403269	net: filter: mention eBPF terminology as well Since the term eBPF is used anyway on mailing list discussions, lets also document that in the main BPF documentation file and replace a couple of occurrences with eBPF terminology to be more clear. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:18 -07:00
Eric Dumazet	9709674e68	ipv4: fix a race in ip4_datagram_release_cb() Alexey gave a AddressSanitizer[1] report that finally gave a good hint at where was the origin of various problems already reported by Dormando in the past [2] Problem comes from the fact that UDP can have a lockless TX path, and concurrent threads can manipulate sk_dst_cache, while another thread, is holding socket lock and calls __sk_dst_set() in ip4_datagram_release_cb() (this was added in linux-3.8) It seems that all we need to do is to use sk_dst_check() and sk_dst_set() so that all the writers hold same spinlock (sk->sk_dst_lock) to prevent corruptions. TCP stack do not need this protection, as all sk_dst_cache writers hold the socket lock. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel AddressSanitizer: heap-use-after-free in ipv4_dst_check Read of size 2 by thread T15453: [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Freed by thread T15455: [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 Allocated by thread T15453: [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406 [< inlined >] __ip_route_output_key+0x3e8/0xf70 __mkroute_output ./net/ipv4/route.c:1939 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b ./arch/x86/kernel/entry_64.S:629 [2] <4>[196727.311203] general protection fault: 0000 [#1] SMP <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1 <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013 <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000 <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>] [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.311399] RSP: 0018:ffff885effd23a70 EFLAGS: 00010282 <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040 <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200 <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800 <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce <4>[196727.311510] FS: 0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000 <4>[196727.311554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0 <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[196727.311713] Stack: <4>[196727.311733] ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42 <4>[196727.311784] ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0 <4>[196727.311834] ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0 <4>[196727.311885] Call Trace: <4>[196727.311907] <IRQ> <4>[196727.311912] [<ffffffff815b7f42>] dst_destroy+0x32/0xe0 <4>[196727.311959] [<ffffffff815b86c6>] dst_release+0x56/0x80 <4>[196727.311986] [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0 <4>[196727.312013] [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820 <4>[196727.312041] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312070] [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150 <4>[196727.312097] [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360 <4>[196727.312125] [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230 <4>[196727.312154] [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90 <4>[196727.312183] [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360 <4>[196727.312212] [<ffffffff815fe00b>] ip_rcv+0x22b/0x340 <4>[196727.312242] [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan] <4>[196727.312275] [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640 <4>[196727.312308] [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150 <4>[196727.312338] [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70 <4>[196727.312368] [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0 <4>[196727.312397] [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140 <4>[196727.312433] [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe] <4>[196727.312463] [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340 <4>[196727.312491] [<ffffffff815b1691>] net_rx_action+0x111/0x210 <4>[196727.312521] [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70 <4>[196727.312552] [<ffffffff810519d0>] __do_softirq+0xd0/0x270 <4>[196727.312583] [<ffffffff816cef3c>] call_softirq+0x1c/0x30 <4>[196727.312613] [<ffffffff81004205>] do_softirq+0x55/0x90 <4>[196727.312640] [<ffffffff81051c85>] irq_exit+0x55/0x60 <4>[196727.312668] [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0 <4>[196727.312696] [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a <4>[196727.312722] <EOI> <1>[196727.313071] RIP [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80 <4>[196727.313100] RSP <ffff885effd23a70> <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]--- <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt Reported-by: Alexey Preobrazhensky <preobr@google.com> Reported-by: dormando <dormando@rydia.ne> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `8141ed9fce` ("ipv4: Add a socket release callback for datagram sockets") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:18 -07:00
Daniel Borkmann	a101ccd141	net: filter: add test_bpf module under MAINTAINERS' networking section Add lib/test_bpf.c entry to maintainers file under networking. All changes were posted via netdev for review, so make sure other people Cc it as well when they call get_maintainer.pl. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:18 -07:00
Octavian Purdila	bad93e9d4e	net: add __pskb_copy_fclone and pskb_copy_for_clone There are several instances where a pskb_copy or __pskb_copy is immediately followed by an skb_clone. Add a couple of new functions to allow the copy skb to be allocated from the fclone cache and thus speed up subsequent skb_clone calls. Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Arvid Brodin <arvid.brodin@alten.se> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org> Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:38:02 -07:00
Jon Cooper	daf37b556e	sfc: PIO:Restrict to 64bit arch and use 64-bit writes. Fixes:ee45fd92c739 ("sfc: Use TX PIO for sufficiently small packets") The linux net driver uses memcpy_toio() in order to copy into the PIO buffers. Even on a 64bit machine this causes 32bit accesses to a write- combined memory region. There are hardware limitations that mean that only 64bit naturally aligned accesses are safe in all cases. Due to being write-combined memory region two 32bit accesses may be coalesced to form a 64bit non 64bit aligned access. Solution was to open-code the memory copy routines using pointers and to only enable PIO for x86_64 machines. Not tested on platforms other than x86_64 because this patch disables the PIO feature on other platforms. Compile-tested on x86 to ensure that works. The WARN_ON_ONCE() code in the previous version of this patch has been moved into the internal sfc debug driver as the assertion was unnecessary in the upstream kernel code. This bug fix applies to v3.13 and v3.14 stable branches. Signed-off-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:36:21 -07:00
David S. Miller	1a0b20b257	Merge branch 'bridge-next' Toshiaki Makita says: ==================== bridge: 802.1ad vlan protocol support Currently bridge vlan filtering doesn't work fine with 802.1ad protocol. Only if a bridge is configured without pvid, the bridge receives only 802.1ad tagged frames and no STP is used, it will work. Otherwise: - If pvid is configured, it can put only 802.1Q tags but cannot put 802.1ad tags. - If 802.1Q and 802.1ad tagged frames arrive in mixture, it applies filtering regardless of their protocols. - While an 802.1ad bridge should use another mac address for STP BPDU and should forward customer's BPDU frames, it can't. Thus, we can't properly handle frames once 802.1ad is used. Handling 802.1ad is useful if we want to allow stacked vlans to be used, e.g., guest VMs wants to use vlan tags and the host also wants to segregate guest's traffic from other guests' by vlan tags. Here is the image describing how to configure a bridge to filter VMs traffic. +-------+p/u +-----+ +---------+ +----+ \| \|------\|vnet0\|--\|User A VM\| \|eth0\|--\|802.1ad\| +-----+ +---------+ +----+ \|bridge \|p/u +-----+ +---------+ \| \|------\|vnet1\|--\|User B VM\| +-------+ +-----+ +---------+ p/u: pvid/untagged This patch set enables us to set vlan protocols per bridge. This tries to implement a bridge like S-VLAN component in IEEE 802.1Q-2011 spec. Note that there is another possible implementation that sets vlan protocols per port. Some HW switches seem to take that approach. However, I think per-bridge approach is better, because; - I think the typical usage of an 802.1ad bridge is segregating 802.1Q tagged traffic (like what is described above), and this doesn't need the ability to be set protocols per port. Also, If a bridge has many ports and it supports per-port setting, we might have to make much more extra configurations to change protocols of all ports. - I assume that the main perpose to set protocol per port is to assign S-VID according to C-VID, or to realize two logical bridges (one is an 802.1Q filtering bridge and the other is an 802.1ad filtering bridge) in one bridge. The former usually needs additional features such as vlan id mapping, and is likely to make bridge's code complicated. If a user wants, such enhanced features can be accomplished by a combination of multiple bridges, so it is not absolutely necessary to implement these features in a bridge itself. The latter is simply unnecessary because we can easily make two bridges of which one is an 802.1Q bridge and the other is an 802.1ad bridge. Here is an example of the enhanced feature that we can realize by using multiple bridges and veth interfaces. This way is documented in IEEE 802.1Q-2011 clause 15.4 (C-tagged service interface). +----+ +-------+p/u +------+ +----+ +--+ \|eth0\|--\|802.1ad\|----veth----\|802.1Q\|--\|vnet\|--\|VM\| +----+ \|bridge \|----veth----\|bridge\| +----+ +--+ +-------+p/u +------+ p/u: pvid/untagged In this configuration, we can map C-VIDs to any S-VID. For example; C-VID 10 and 20 to S-VID 100 C-VID 30 to S-VID 110 This is achieved through the 802.1Q bridge that forwards C-tagged frames to proper ports of the 802.1ad bridge. Changes: v1 -> v2: - Make the way to forward bridge group addresses more generic by introducing new mask, group_fwd_mask_required. RFC -> v1: - Add S-TAG tx offload. - Remove a fix around stacked vlan which has already been fixed. - Take into account Bridge Group Addresses. - Separate handling of protocol-mismatch from br_vlan_get_tag(). - Change the way to set vlan_proto from netlink to sysfs because no other existing configuration per bridge can be set by netlink. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:23:03 -07:00
Toshiaki Makita	204177f3f3	bridge: Support 802.1ad vlan filtering This enables us to change the vlan protocol for vlan filtering. We come to be able to filter frames on the basis of 802.1ad vlan tags through a bridge. This also changes br->group_addr if it has not been set by user. This is needed for an 802.1ad bridge. (See IEEE 802.1Q-2011 8.13.5.) Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad bridge can forward the Nearest Customer Bridge group addresses except for br->group_addr, which should be passed to higher layer. To change the vlan protocol, write a protocol in sysfs: # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	f2808d226f	bridge: Prepare for forwarding another bridge group addresses If a bridge is an 802.1ad bridge, it must forward another bridge group addresses (the Nearest Customer Bridge group addresses). (For details, see IEEE 802.1Q-2011 8.6.3.) As user might not want group_fwd_mask to be modified by enabling 802.1ad, introduce a new mask, group_fwd_mask_required, which indicates addresses the bridge wants to forward. This will be set by enabling 802.1ad. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	8580e2117c	bridge: Prepare for 802.1ad vlan filtering support This enables a bridge to have vlan protocol informantion and allows vlan tag manipulation (retrieve, insert and remove tags) according to the vlan protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	1c5abb6c77	bridge: Add 802.1ad tx vlan acceleration Bridge device doesn't need to embed S-tag into skb->data. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Arnd Bergmann	e7b599d7c1	net: xen-netback: include linux/vmalloc.h again commit `e9ce7cb6b1` ("xen-netback: Factor queue-specific data into queue struct") added a use of vzalloc/vfree to interface.c, but removed the #include <linux/vmalloc.h> statement at the same time, which causes this build error: drivers/net/xen-netback/interface.c: In function 'xenvif_free': drivers/net/xen-netback/interface.c:754:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(vif->queues); ^ cc1: some warnings being treated as errors Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:19:28 -07:00
Jongsung Kim	71b9c4a838	net: phy: realtek: register/unregister multiple drivers properly Using phy_drivers_register/_unregister functions is proper way to handle multiple PHY drivers registration. For Realtek PHY drivers module, it fixes incomplete current error-handlings up and adds missed unregistration for the RTL8201CP driver. Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:17:27 -07:00
Yoshihiro Shimoda	1b72a0fc9c	net: sh_eth: Fix timing of RACT setting in sh_eth_rx() This patch fixes an issue that we cannot use nfs rootfs correctly on r8a7790 when the command below runs on a host PC. $ sudo ping -f -l 8 $BOARD_IP_ADDR Since the driver sets the RACT to 1 in the first while loop of sh_eth_rx(), the controller accepts a next frame into the next RX descriptor during the while loop. But, in the first while loop doesn't allocate a next skb. So, this patch removes the RACT setting in the first while loop of sh_eth_rx(). Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:15:31 -07:00
Yoshihiro Shimoda	4f809cea61	net: sh_eth: Fix receive packet "exceeded" condition in sh_eth_rx() This patch fixes the packet "exceeded" condition in sh_eth_rx() when RACT in an RX descriptor is not set and the "quota" is 0. Otherwise, kernel panic happens because the "&n->poll_list" is deleted twice in sh_eth_poll() which calls napi_complete() and net_rx_action(). Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:15:31 -07:00
Alexei Starovoitov	61f83d0d57	net: filter: fix warning on 32-bit arch fix compiler warning on 32-bit architectures: net/core/filter.c: In function '__sk_run_filter': net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:12:27 -07:00
Jon Paul Maloy	02c00c2ab0	tipc: fix potential bug in function tipc_backlog_rcv In commit `4f4482dcd9` ("tipc: compensate for double accounting in socket rcv buffer") we access 'truesize' of a received buffer after it might have been released by the function filter_rcv(). In this commit we correct this by reading the value of 'truesize' to the stack before delivering the buffer to filter_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:01:30 -07:00
Dan Carpenter	cf97b8ff22	net: sxgbe: remove duplicate SXGBE_CORE_L34_ADDCTL_REG define The SXGBE_CORE_L34_ADDCTL_REG define is cut and pasted twice so we can delete the second instance. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:01:30 -07:00
Dan Carpenter	5e3ec11b64	qlcnic: remove duplicate QLC_83XX_GET_LSO_CAPABILITY define The QLC_83XX_GET_LSO_CAPABILITY define is cut and pasted twice so we can delete the second instance. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Sony Chacko <sony.chacko@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:01:30 -07:00
David S. Miller	9b07d735c0	Merge branch 'mlx4' Amir Vadai says: ==================== cpumask,net: affinity hint helper function This patchset will set affinity hint to influence IRQs to be allocated on the same NUMA node as the one where the card resides. As discussed in http://www.spinics.net/lists/netdev/msg271497.html If number of IRQs allocated is greater than the number of local NUMA cores, all local cores will be used first, and the rest of the IRQs will be on a remote NUMA node. If no NUMA support - IRQ's and cores will be mapped 1:1 Since the utility function to calculate the mapping could be useful in other mq drivers in the kernel, it was added to cpumask.[ch] This patchset was tested and applied on top of net-next since the first consumer is a network device (mlx4_en). Over commit `fff1f59` "mac802154: llsec: add forgotten list_del_rcu in key removal" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 14:59:21 -07:00
Yuval Atias	9e311e77a8	net/mlx4_en: Use affinity hint The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 14:58:16 -07:00
Amir Vadai	da91309e0a	cpumask: Utility function to set n'th cpu - local cpu first This function sets the n'th cpu - local cpu's first. For example: in a 16 cores server with even cpu's local, will get the following values: cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set ... cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set ... cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set Curently this function will be used by multi queue networking devices to calculate the irq affinity mask, such that as many local cpu's as possible will be utilized to handle the mq device irq's. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 14:58:16 -07:00
Linus Torvalds	c31c24b825	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management update from Zhang Rui: "Specifics: - fix a bug in Exynos thermal driver, which overwrites the hardware trip point threshold when updating software trigger levels and results in emergency shutdown. From: Tushar Behera. - add thermal sensor support for Armada 375 and 38x SoCs. From Ezequiel Garcia. - add TMU (Thermal Management Unit) support for Exynos5260 and Exynos5420 SoCs. From Naveen Krishna Chatradhi. - add support for the additional digital temperature sensors in the Intel SoCs like Bay Trail. From: Srinivas Pandruvada. - a couple of cleanups and small fixes from Jingoo Han, Bartlomiej Zolnierkiewicz, Geert Uytterhoeven, Jacob Pan, Paul Walmsley and Lan,Tianyu" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (21 commits) thermal: spear: remove unnecessary OOM messages thermal: exynos: remove unnecessary OOM messages thermal: rcar: remove unnecessary OOM messages thermal: armada: Support Armada 380 SoC thermal: armada: Support Armada 375 SoC thermal: armada: Allow to specify an 'inverted readout' sensor thermal: armada: Pass the platform_device to init_sensor() thermal: armada: Add generic infrastructure to handle the sensor thermal: armada: Add infrastructure to support generic formulas thermal: armada: Rename armada_thermal_ops struct thermal/intel_powerclamp: add newer cpu ids thermal: rcar: Use pm_runtime_put() i.s.o. pm_runtime_put_sync() thermal: samsung: Only update available threshold limits Thermal/int3403: Fix thermal hysteresis unit conversion thermal: Intel SoC DTS thermal thermal: samsung: Add TMU support for Exynos5260 SoCs thermal: samsung: Add TMU support for Exynos5420 SoCs thermal: samsung: change base_common to more meaningful base_second thermal: samsung: replace inten_ bit fields with intclr_ thermal: offer Samsung thermal support only when ARCH_EXYNOS is defined ...	2014-06-11 14:26:21 -07:00
Linus Torvalds	7f33e7241d	pwm: Changes for v3.16-rc1 The majority of these changes are cleanups and fixes across all drivers. Redundant error messages are removed and more PWM controllers set the .can_sleep flag to signal that they can't be used in atomic context. Support is added for the Broadcom Kona family of SoCs and the Intel LPSS driver can now probe PCI devices in addition to ACPI devices. Upon shut- down, the pwm-backlight driver will now power off the backlight. It also uses the new descriptor-based GPIO API for more concise GPIO handling. A large chunk of these changes also converts platforms to use the lookup mechanism rather than relying on the global number space to reference PWM devices. This is largely in preparation for more unification and cleanups in future patches. Eventually it will allow the legacy PWM API to be removed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTl/AqAAoJEN0jrNd/PrOhoZsP/1yLaSK3NuBXWg3VdpH9i8so GXBeh3dbKAmC5MYQlhh5XTvuNBbfOoSp6dGdL3pV9GjcffbqzTynn5YszrbanezX +fqBF1NvW+jb2sUfQmedh9y30O1ADZM0p+FXW/R7e2khiE+8VF2ox35Hc3LLBqk8 SiZoy1UEzIo0BAHgtgCw2VXUYUSYX/KYGoF/t8TCCObKVC3wQ7pW5tN3Ekj14yNL NspM0Q8OsITCQO0PdOfHw1gBmy4iLSuoNpPKP12BQVx5seZ4LBaIz9Wh0jFu89hq zI1gFpGptMsxsaAn/zk6Nr9lHDkqxkhnuYA+dgkA6k0KI9jS1Me20WQEmvM9H9xs BJ8QOfMQP7AHCZeW61J+iPTtCyMwFejRSPMtPjNMfaOQduWJw7+o0GaA30F39dw0 3Cki1C44o9KfwCdC9OcmLignHt5TC1FEJgJL4OY695x0za7XcVgEN6nTg70AQfaz pcm4PeCqtM9jvXdJQdDGDI7gVzT33kpBnGatqQ2bUqMDx8HeHIkdEXehLwsYP46m FX0RJb5ue40esbVWZDGYWJqkdInpHt6deahTW+Jq9Exo4ZMr5/DVkMQCl8oF3/em Y5ED67dnAQ4au1MhElnDTPKk4Uh28aWTYwo8HSO6rt+8jcguH1KvXvLa+z2BcaMv ZVN0ZPy2813ix6Q0yO3D =BDxR -----END PGP SIGNATURE----- Merge tag 'pwm/for-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm changes from Thierry Reding: "The majority of these changes are cleanups and fixes across all drivers. Redundant error messages are removed and more PWM controllers set the .can_sleep flag to signal that they can't be used in atomic context. Support is added for the Broadcom Kona family of SoCs and the Intel LPSS driver can now probe PCI devices in addition to ACPI devices. Upon shutdown, the pwm-backlight driver will now power off the backlight. It also uses the new descriptor-based GPIO API for more concise GPIO handling. A large chunk of these changes also converts platforms to use the lookup mechanism rather than relying on the global number space to reference PWM devices. This is largely in preparation for more unification and cleanups in future patches. Eventually it will allow the legacy PWM API to be removed" * tag 'pwm/for-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (38 commits) pwm: fsl-ftm: set pwm_chip can_sleep flag pwm: ab8500: Fix wrong value shift for disable/enable PWM pwm: samsung: do not set manual update bit in pwm_samsung_config pwm: lp3943: Set pwm_chip can_sleep flag pwm: atmel: set pwm_chip can_sleep flag pwm: mxs: set pwm_chip can_sleep flag pwm: tiehrpwm: inline accessor functions pwm: tiehrpwm: don't build PM related functions when not needed pwm-backlight: retrieve configured PWM period leds: leds-pwm: retrieve configured PWM period ARM: pxa: hx4700: use PWM_LOOKUP to initialize struct pwm_lookup ARM: shmobile: armadillo: use PWM_LOOKUP to initialize struct pwm_lookup ARM: OMAP3: Beagle: use PWM_LOOKUP to initialize struct pwm_lookup pwm: modify PWM_LOOKUP to initialize all struct pwm_lookup members ARM: pxa: hx4700: initialize all the struct pwm_lookup members ARM: OMAP3: Beagle: initialize all the struct pwm_lookup members pwm: renesas-tpu: remove unused struct tpu_pwm_platform_data ARM: shmobile: armadillo: initialize all struct pwm_lookup members pwm: add period and polarity to struct pwm_lookup pwm: twl: Really disable twl6030 PWMs ...	2014-06-11 14:06:55 -07:00
Lukas Czerner	09869de57e	dm thin: update discard_granularity to reflect the thin-pool blocksize DM thinp already checks whether the discard_granularity of the data device is a factor of the thin-pool block size. But when using the dm-thin-pool's discard passdown support, DM thinp was not selecting the max of the underlying data device's discard_granularity and the thin-pool's block size. Update set_discard_limits() to set discard_granularity to the max of these values. This enables blkdev_issue_discard() to properly align the discards that are sent to the DM thin device on a full block boundary. As such each discard will now cover an entire DM thin-pool block and the block will be reclaimed. Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-06-11 16:56:12 -04:00
Heinz Mauelshagen	adcc44472b	dm bio prison: implement per bucket locking in the dm_bio_prison hash table Split the single per bio-prison lock by using per bucket locking. Per bucket locking benefits both dm-thin and dm-cache targets by reducing bio-prison lock contention. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-11 16:48:54 -04:00
Bjorn Helgaas	38a6148248	Merge branches 'pci/msi', 'pci/iommu' and 'pci/cleanup' into next * pci/msi: PCI/MSI: Fix memory leak in free_msi_irqs() * pci/iommu: PCI: Add function 1 DMA alias quirk for HighPoint RocketRaid 642L PCI: Add bridge DMA alias quirk for ITE bridge * pci/cleanup: PCI: Merge multi-line quoted strings PCI: Whitespace cleanup PCI: Move EXPORT_SYMBOL so it immediately follows function/variable	2014-06-11 14:38:25 -06:00
Nicholas Bellinger	9f977ef7b6	vhost-scsi: Include prot_bytes into expected data transfer length This patch updates vhost_scsi_get_tag() to accept the combined expected data transfer length + T10 PI bytes as the value passed into target_submit_cmd(). This is required now that target-core logic in commit 14ef9200 expects to subtract se_cmd->prot_length from se_cmd->data_length. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-06-11 13:06:50 -07:00
Sagi Grimberg	e2a4f55c64	TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire In various areas of the code, it is assumed that se_cmd->data_length describes pure data. In case that protection information exists over the wire (protect bits is are on) the target core re-calculates the data length from the CDB and the backed device block size (instead of each transport peeking in the cdb). Modify loopback device to include protection information in the transferred data length (like other scsi transports). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-06-11 13:06:50 -07:00
Sagi Grimberg	d77e65350f	libiscsi, iser: Adjust data_length to include protection information In case protection information exists over the wire iscsi header data length is required to include it. Use protection information aware scsi helpers to set the correct transfer length. In order to avoid breakage, remove iser transfer length checks for each task as they are not always true and somewhat redundant anyway. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-06-11 13:06:45 -07:00
Sagi Grimberg	8846bab180	scsi_cmnd: Introduce scsi_transfer_length helper In case protection information exists on the wire scsi transports should include it in the transfer byte count (even if protection information does not exist in the host memory space). This helper will compute the total transfer length from the scsi command data length and protection attributes. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-06-11 13:06:42 -07:00
Jérôme Carretero	c2e0fb966a	PCI: Add function 1 DMA alias quirk for HighPoint RocketRaid 642L This device uses function 1 as the PCIe requester ID. This vendor has similar boards based on the same Marvell 88SE9235 chipset, but this patch was only tested with the 642L. Tested on ASUS Sabertooth 990FX (AMD). Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679 Signed-off-by: Jérôme Carretero <cJ-ko@zougloub.eu> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2014-06-11 13:45:19 -06:00
David S. Miller	d4f3862017	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-06-11 This series contains updates to igb, i40e and i40evf. Todd makes a change to igb to un-hide invariant returns by getting rid of the E1000_SUCCESS define and converting those returns to return 0. Jacob separates the hardware logic from the set function, so that we can re-use it during a ptp_reset in igb. This enables the reset to return functionality to the last know timestamp mode, rather than resetting the value. Ashish implements context flags for headwb and headwb_addr so that we do not have to keep them always enabled. Shannon updates the admin queue API for the new firmware, which adds set_pf_content, nvm_config_read/write, replaces set_phy_reset with set_phy_debug and removes nvm_read/write_reg_se. Cleans up the driver to use the stored base_queue value since there is no need to read the PCI register for the PF's base queue on every single transmit queue enable and disable as we already have the value stored from reading the capability features at startup. Anjali changes the notion of source and destination for FD_SB in ethtool to align i40e with other drivers. Adds flow director statistics to the PF stats. Fixes a bug in ethtool for flow director drop packet filter where the drop action comes down as a ring_cookie value, so allow it as a special value that can be used to configure destination control. Mitch fixes the i40evf to keep the driver from going down when it is already in a down state. This prevents a CPU soft lock in napi_disable(). Also change the i40evf to check the admin queue error bits since the firmware can indicate any admin queue error states to the driver via some bits in the length registers. Neerav separates out the DCB capability and enabled flags because currently if the firmware reports DCB capability the driver enables I40E_FLAG_DCB_ENABLED flag. When this flag is enabled the driver inserts a tag when transmitting a packet from the port even if there are no DCB traffic classes configured at the port. So by adding the additional flag, I40E_FLAG_DCB_CAPABLE, that will be set when the DCB capability is present and the existing enabled flag will only be set if there are more than one traffic classes configured at the port. Greg fixes the i40e driver to not automatically accept tagged packets by default so that the system must request a VLAN tag packet filter to get packets with that tag. Greg also converts i40e to use the in-kernel ether_addr_copy() instead of mempcy(). Jesse removes the FTYPE field from the receive descriptor to match the hardware implementation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:25:12 -07:00
David S. Miller	813ebbbf8e	Merge branch 'sctp-next' Daniel Borkmann says: ==================== SCTP update This set contains transport path selection improvements in SCTP. Please see individual patches for details. ==================== Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:30 -07:00
Daniel Borkmann	9b87d46510	net: sctp: fix incorrect type in gfp initializer This fixes the following sparse warning: net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types) net/sctp/associola.c:1556:29: expected bool [unsigned] [usertype] preload net/sctp/associola.c:1556:29: got restricted gfp_t Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Daniel Borkmann	a7288c4dd5	net: sctp: improve sctp_select_active_and_retran_path selection In function sctp_select_active_and_retran_path(), we walk the transport list in order to look for the two most recently used ACTIVE transports (trans_pri, trans_sec). In case we didn't find anything ACTIVE, we currently just camp on a possibly PF or INACTIVE transport that is primary path; this behavior actually dates back to linux-history tree of the very early days of lksctp, and can yield a behavior that chooses suboptimal transport paths. Instead, be a bit more clever by reusing and extending the recently introduced sctp_trans_elect_best() handler. In case both transports are evaluated to have the same score resulting from their states, break the tie by looking at: 1) transport patch error count 2) last_time_heard value from each transport. This is analogous to Nishida's Quick Failover draft [1], section 5.1, 3: The sender SHOULD avoid data transmission to PF destinations. When all destinations are in either PF or Inactive state, the sender MAY either move the destination from PF to active state (and transmit data to the active destination) or the sender MAY transmit data to a PF destination. In the former scenario, (i) the sender MUST NOT notify the ULP about the state transition, and (ii) MUST NOT clear the destination's error counter. It is recommended that the sender picks the PF destination with least error count (fewest consecutive timeouts) for data transmission. In case of a tie (multiple PF destinations with same error count), the sender MAY choose the last active destination. Thus for sctp_select_active_and_retran_path(), we keep track of the best, if any, transport that is in PF state and in case no ACTIVE transport has been found (hence trans_{pri,sec} is NULL), we select the best out of the three: current primary_path and retran_path as well as a possible PF transport. The secondary may still camp on the original primary_path as before. The change in sctp_trans_elect_best() with a more fine grained tie selection also improves at the same time path selection for sctp_assoc_update_retran_path() in case of non-ACTIVE states. [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00
Daniel Borkmann	e575235fc6	net: sctp: migrate most recently used transport to ktime Be more precise in transport path selection and use ktime helpers instead of jiffies to compare and pick the better primary and secondary recently used transports. This also avoids any side-effects during a possible roll-over, and could lead to better path decision-making. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 12:23:17 -07:00

... 2 3 4 5 6 ...

454902 Commits All Branches Search

454902 Commits

All Branches