linux

Commit Graph

Author	SHA1	Message	Date
Andrey Utkin	36beddc272	appletalk: Fix socket referencing in skb Setting just skb->sk without taking its reference and setting a destructor is invalid. However, in the places where this was done, skb is used in a way not requiring skb->sk setting. So dropping the setting of skb->sk. Thanks to Eric Dumazet <eric.dumazet@gmail.com> for correct solution. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79441 Reported-by: Ed Martin <edman007@edman007.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 19:39:43 -07:00
Dmitry Popov	e0056593b6	ip_tunnel: fix ip_tunnel_lookup This patch fixes 3 similar bugs where incoming packets might be routed into wrong non-wildcard tunnels: 1) Consider the following setup: ip address add 1.1.1.1/24 dev eth0 ip address add 1.1.1.2/24 dev eth0 ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst = 1.1.1.2. Moreover even if there was wildcard tunnel like ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0 but it was created before explicit one (with local 1.1.1.1), incoming ipip packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1. Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti) 2) ip address add 1.1.1.1/24 dev eth0 ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0 ip link set ipip1 up Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this issue, 2.2.146.85 is just an example, there are more than 4 million of them. And again, wildcard tunnel like ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. Gre & vti tunnels had the same issue. 3) ip address add 1.1.1.1/24 dev eth0 ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0 ip link set gre1 up Any incoming gre packet with key = 1 were routed into gre1, no matter what src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised the issue, 2.2.146.84 is just an example, there are more than 4 million of them. Wildcard tunnel like ip tunnel add gre2 remote any local any key 1 mode gre dev eth0 wouldn't be ever matched if it was created before explicit tunnel like above. All this stuff happened because while looking for a wildcard tunnel we didn't check that matched tunnel is a wildcard one. Fixed. Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 19:35:09 -07:00
Rickard Strandqvist	85b722d760	isdn: hisax: l3ni1.c: Fix for possible null pointer dereference There is otherwise a risk of a possible null pointer dereference. Was largely found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>	2014-07-08 16:06:21 -07:00
Jon Paul Maloy	29322d0db9	tipc: fix bug in multicast/broadcast message reassembly Since commit `37e22164a8` ("tipc: rename and move message reassembly function") reassembly of long broadcast messages has been broken. This is because we test for a non-NULL return value of the *buf parameter as criteria for succesful reassembly. However, this parameter is left defined even after reception of the first fragment, when reassebly is still incomplete. This leads to a kernel crash as soon as a the first fragment of a long broadcast message is received. We fix this with this commit, by implementing a stricter behavior of the function and its return values. This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 15:55:09 -07:00
Alexander Aring	b6e195fd4f	MAINTAINERS: change IEEE 802.15.4 maintainer This patch changes the IEEE 802.15.4 subsystem maintainer to Alexander Aring. We discussed this change before via e-mail and I collected the acks from the current maintainers. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Alexander Smirnov <alex.bluesman.sminov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 14:23:17 -07:00
David S. Miller	ff883aeb42	Merge branch 'xen-netfront' David Vrabel says: ==================== xen-netfront: multi-queue related locking fixes Two fixes to the per-queue locking bugs in xen-netfront that were introduced in 3.16-rc1 with the multi-queue support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 11:21:10 -07:00
David Vrabel	f9feb1e6a2	xen-netfront: call netif_carrier_off() only once when disconnecting In xennet_disconnect_backend(), netif_carrier_off() was called once per queue when it needs to only be called once. The queue locking around the netif_carrier_off() call looked very odd. I think they were supposed to synchronize any NAPI instances with the expectation that no further NAPI instances would be scheduled because of the carrier being off (see the check in xennet_rx_interrupt()). But I can't easily tell if this works correctly. Instead, add a napi_synchronize() call after disabling the interrupts. This is obviously correct as with no Rx interrupts, no further NAPI instances will be scheduled. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 11:21:03 -07:00
David Vrabel	f50b407653	xen-netfront: don't nest queue locks in xennet_connect() The nesting of the per-queue rx_lock and tx_lock in xennet_connect() is confusing to both humans and lockdep. The locking is safe because this is the only place where the locks are nested in this way but lockdep still warns. Instead of adding the missing lockdep annotations, refactor the locking to avoid the confusing nesting. This is still safe, because the xenbus connection state changes are all serialized by the xenwatch thread. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 11:21:03 -07:00
Yuchung Cheng	6e08d5e3c8	tcp: fix false undo corner cases The undo code assumes that, upon entering loss recovery, TCP 1) always retransmit something 2) the retransmission never fails locally (e.g., qdisc drop) so undo_marker is set in tcp_enter_recovery() and undo_retrans is incremented only when tcp_retransmit_skb() is successful. When the assumption is broken because TCP's cwnd is too small to retransmit or the retransmit fails locally. The next (DUP)ACK would incorrectly revert the cwnd and the congestion state in tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs may enter the recovery state. The sender repeatedly enter and (incorrectly) exit recovery states if the retransmits continue to fail locally while receiving (DUP)ACKs. The fix is to initialize undo_retrans to -1 and start counting on the first retransmission. Always increment undo_retrans even if the retransmissions fail locally because they couldn't cause DSACKs to undo the cwnd reduction. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 21:40:48 -07:00
Or Gerlitz	e326f2f13b	net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set The add_vxlan_port ndo driver code was wrongly testing whether HW vxlan offloads are supported by the device instead of checking if they are currently enabled. This causes the driver to configure the HW parser to conduct matching for vxlan packets but since no steering rules were set, vxlan packets are dropped on RX. Fix that by doing the right test, as done in the del_vxlan_port ndo handler. Fixes: `1b136de` ('net/mlx4: Implement vxlan ndo calls') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 21:39:18 -07:00
dingtianhong	52ad353a53	igmp: fix the problem when mc leave group The problem was triggered by these steps: 1) create socket, bind and then setsockopt for add mc group. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("192.168.1.2"); setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)); 2) drop the mc group for this socket. mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37"); mreq.imr_interface.s_addr = inet_addr("0.0.0.0"); setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq)); 3) and then drop the socket, I found the mc group was still used by the dev: netstat -g Interface RefCnt Group --------------- ------ --------------------- eth2 1 255.0.0.37 Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need to be released for the netdev when drop the socket, but this process was broken when route default is NULL, the reason is that: The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr is NULL, the default route dev will be chosen, then the ifindex is got from the dev, then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL, the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be released from the mc_list, but the dev didn't dec the refcnt for this mc group, so when dropping the socket, the mc_list is NULL and the dev still keep this group. v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs, so I add the checking for the in_dev before polling the mc_list, make sure when we remove the mc group, dec the refcnt to the real dev which was using the mc address. The problem would never happened again. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 21:30:55 -07:00
Loic Prylli	5495119465	net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush A bug was introduced in NETDEV_CHANGE notifier sequence causing the arp table to be sometimes spuriously cleared (including manual arp entries marked permanent), upon network link carrier changes. The changed argument for the notifier was applied only to a single caller of NETDEV_CHANGE, missing among others netdev_state_change(). So upon net_carrier events induced by the network, which are triggering a call to netdev_state_change(), arp_netdev_event() would decide whether to clear or not arp cache based on random/junk stack values (a kind of read buffer overflow). Fixes: `be9efd3653` ("net: pass changed flags along with NETDEV_CHANGE event") Fixes: `6c8b4e3ff8` ("arp: flush arp cache on IFF_NOARP change") Signed-off-by: Loic Prylli <loicp@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 21:20:01 -07:00
Bernd Wachter	8dcb4b1526	net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2 There's a new version of the Telewell 4G modem working with, but not recognized by this driver. Signed-off-by: Bernd Wachter <bernd.wachter@jolla.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 20:53:41 -07:00
David S. Miller	26a9ebca98	Revert "net: stmmac: add platform init/exit for Altera's ARM socfpga" This reverts commit `0acf167687`. Breaks the build due to missing reference to phy_resume in the resulting dwmac-socfpga.o object. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:53:45 -07:00
Zhao Qiang	8844a00626	powerpc/ucc_geth: deal with a compile warning deal with a compile warning: comparison between 'enum qe_fltr_largest_external_tbl_lookup_key_size' and 'enum qe_fltr_tbl_lookup_key_size' the code: "if (ug_info->largestexternallookupkeysize == QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES)" is warned because different enum, so modify it. "enum qe_fltr_largest_external_tbl_lookup_key_size largestexternallookupkeysize; enum qe_fltr_tbl_lookup_key_size { QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES = 0x3f, /* LookupKey parsed by the Generate LookupKey CMD is truncated to 8 bytes / QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES = 0x5f, / LookupKey parsed by the Generate LookupKey CMD is truncated to 16 bytes / }; / QE FLTR extended filtering Largest External Table Lookup Key Size / enum qe_fltr_largest_external_tbl_lookup_key_size { QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_NONE = 0x0,/ not used / QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_8_BYTES = QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES, / 8 bytes / QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_16_BYTES = QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES, / 16 bytes */ };" Signed-off-by: Zhao Qiang <B45475@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:48:19 -07:00
David S. Miller	edc1bb0bd7	Merge branch 'net_ovs_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch Pravin B Shelar says: ==================== Open vSwitch A set of fixes for net. First bug is related flow-table management. Second one is in sample action. Third is related flow stats and last one add gre-err handler for ovs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:39:34 -07:00
Tom Herbert	11ef7a8996	net: Performance fix for process_backlog In process_backlog the input_pkt_queue is only checked once for new packets and quota is artificially reduced to reflect precisely the number of packets on the input_pkt_queue so that the loop exits appropriately. This patches changes the behavior to be more straightforward and less convoluted. Packets are processed until either the quota is met or there are no more packets to process. This patch seems to provide a small, but noticeable performance improvement. The performance improvement is a result of staying in the process_backlog loop longer which can reduce number of IPI's. Performance data using super_netperf TCP_RR with 200 flows: Before fix: 88.06% CPU utilization 125/190/309 90/95/99% latencies 1.46808e+06 tps 1145382 intrs.sec. With fix: 87.73% CPU utilization 122/183/296 90/95/99% latencies 1.4921e+06 tps 1021674.30 intrs./sec. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 19:24:34 -07:00
Edward Allcutt	68b7107b62	ipv4: icmp: Fix pMTU handling for rare case Some older router implementations still send Fragmentation Needed errors with the Next-Hop MTU field set to zero. This is explicitly described as an eventuality that hosts must deal with by the standard (RFC 1191) since older standards specified that those bits must be zero. Linux had a generic (for all of IPv4) implementation of the algorithm described in the RFC for searching a list of MTU plateaus for a good value. Commit `46517008e1` ("ipv4: Kill ip_rt_frag_needed().") removed this as part of the changes to remove the routing cache. Subsequently any Fragmentation Needed packet with a zero Next-Hop MTU has been discarded without being passed to the per-protocol handlers or notifying userspace for raw sockets. When there is a router which does not implement RFC 1191 on an MTU limited path then this results in stalled connections since large packets are discarded and the local protocols are not notified so they never attempt to lower the pMTU. One example I have seen is an OpenBSD router terminating IPSec tunnels. It's worth pointing out that this case is distinct from the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU since the commit in question dismissed that as a valid concern. All of the per-protocols handlers implement the simple approach from RFC 1191 of immediately falling back to the minimum value. Although this is sub-optimal it is vastly preferable to connections hanging indefinitely. Remove the Next-Hop MTU != 0 check and allow such packets to follow the normal path. Fixes: `46517008e1` ("ipv4: Kill ip_rt_frag_needed().") Signed-off-by: Edward Allcutt <edward.allcutt@openmarket.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 17:22:57 -07:00
David S. Miller	a921e2a3c6	Merge branch 'stmmac' Vince Bridgers says: ==================== net: stmmac: Correct socfpga init/exit and This patch series adds platform specific init/exit code so that socfpga suspend/resume works as expected, and corrects a minor issue detected by cppcheck. V2: Address review comments by adding a line break at end of function and structure declaration. Add another trivial cppcheck patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:38:04 -07:00
Vince Bridgers	c8df8ce3ee	net: stmmac: Remove unneeded I/O read caught by cppcheck Cppcheck found a case where a local variable was being assigned a value, but not used. There seems to be no reason to read this register before assigning a new value, so addressing thie issue. cppcheck --force --enable=all --inline-suppr . shows ... Variable 'value' is reassigned a value before the old one has been used. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:37:54 -07:00
Vince Bridgers	43d24e4894	net: stmmac: Correct duplicate if/then/else case found by cppcheck Cppcheck found a duplicate if/then/else case where a receive descriptor was being processed. This patch corrects that issue. cppcheck --force --enable=all --inline-suppr . ... Checking enh_desc.c... [enh_desc.c:148] -> [enh_desc.c:144]: (style) Found duplicate if expressions. ... Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:37:54 -07:00
Vince Bridgers	0acf167687	net: stmmac: add platform init/exit for Altera's ARM socfpga This patch adds platform init/exit functions and modifications to support suspend/resume for the Altera Cyclone 5 SOC Ethernet controller. The platform exit function puts the controller into reset using the socfpga reset controller driver. The platform init function sets up the Synopsys mac by first making sure the Ethernet controller is held in reset, programming the phy mode through external support logic, then deasserts reset through the socfpga reset manager driver. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:37:54 -07:00
Alexander Aring	48bc03433c	ieee802154: reassembly: fix possible buffer overflow The max_dsize attribute in ctl_table for lowpan_frags_ns_ctl_table is configured with integer accessing methods. This patch change the max_dsize attribute to int to avoid a possible buffer overflow. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:34:25 -07:00
David S. Miller	98846e698d	Merge branch 'mlx4' Amir Vadai says: ==================== Mellanox EN driver fixes 2014-06-23 Below are some fixes to patches submitted to 3.16. First patch is according to discussions with Ben [1] and Thomas [2] - to do not use affinity notifier, since it breaks RFS. Instead detect changes in IRQ affinity map, by checking if current CPU is set in affinity map on NAPI poll. The two other patches fix some bugs introduced in commit [3]. Patches were applied and tested over commit dba6311: ('powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:29:28 -07:00
Amir Vadai	bb273617a6	net/mlx4_en: IRQ affinity hint is not cleared on port down Need to remove affinity hint at mlx4_en_deactivate_cq() and not at mlx4_en_destroy_cq() - since affinity_mask might be free'd while still being used by procfs. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:29:23 -07:00
Amir Vadai	143b5ba21b	lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined When device is non numa aware (numa_node == -1), use all online cpu's. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:29:23 -07:00
Amir Vadai	35f6f45368	net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Hutchings <ben@decadent.org.uk> Fixes: `2eacc23` ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:29:23 -07:00
Maciej W. Rozycki	1b037474d0	defxx: Fix !DYNAMIC_BUFFERS compilation warnings This fixes compilation warnings: drivers/net/fddi/defxx.c:294: warning: 'dfx_rcv_flush' declared inline after being called drivers/net/fddi/defxx.c:294: warning: previous declaration of 'dfx_rcv_flush' was here drivers/net/fddi/defxx.c:2854: warning: 'my_skb_align' defined but not used triggered when the driver is built with DYNAMIC_BUFFERS undefined. Code tested to work just fine with these changes and a few DEFPA and DEFTA boards. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:26:29 -07:00
Maciej W. Rozycki	f46d53d0e9	defxx: Remove an incorrectly inverted preprocessor conditional The RX handler of the driver has two paths switched between, depending on the size of the frame received, as determined by SKBUFF_RX_COPYBREAK. When a small frame is received, a new skb allocated has data space large enough to hold the incoming frame only, and data is copied there from the original skb whose buffer is returned to the DMA RX ring; in that case `rx_in_place' is 0. When a large frame is received, a new skb allocated has data space large enough to hold the largest frame possible, including the overhead for alignment, the receive status and padding, over 4.5kiB overall, and its buffer is placed on the DMA RX ring while the original buffer is passed up to the network stack avoiding the need to copy data; in that case `rx_in_place' is 1. However the latter scenario is only possible when dynamic buffers are used, as determined by DYNAMIC_BUFFERS, because otherwise the buffers used for the DMA RX ring are fixed at the time the interface is brought up. That leads to an observation that the preprocessor conditional around the `rx_in_place' check is inverted, the check only really matters when dynamic buffers are in use. It has gone unnoticed for many years since support for using dynamic buffers on the DMA RX ring was introduced in 2.1.40 -- because the only problem that results is in the case where `rx_in_place' is 1 frame data received is unnecessarily copied to the newly-allocated buffer, before the buffer placed on the the DMA receive RX and its contents ignored. Therefore the only symptom is some performance loss. Rather than flipping the condition though I decided to discard the conditional altogether -- in the case of static buffers `rx_in_place' is always 0 so GCC will optimise the C conditional away instead. Tested on a few DEFPA and DEFTA boards successfully using both small and large frames, both with DYNAMIC_BUFFERS defined and with the macro undefined. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:25:07 -07:00
Christoph Paasch	5924f17a8a	tcp: Fix divide by zero when pushing during tcp-repair When in repair-mode and TCP_RECV_QUEUE is set, we end up calling tcp_push with mss_now being 0. If data is in the send-queue and tcp_set_skb_tso_segs gets called, we crash because it will divide by mss_now: [ 347.151939] divide error: 0000 [#1] SMP [ 347.152907] Modules linked in: [ 347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4 [ 347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000 [ 347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1 [ 347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0 [ 347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000 [ 347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00 [ 347.152907] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0 [ 347.152907] Stack: [ 347.152907] c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080 [ 347.152907] f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85 [ 347.152907] c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8 [ 347.152907] Call Trace: [ 347.152907] [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34 [ 347.152907] [<c16013e6>] tcp_init_tso_segs+0x36/0x50 [ 347.152907] [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0 [ 347.152907] [<c10a0188>] ? up+0x28/0x40 [ 347.152907] [<c10acc85>] ? console_unlock+0x295/0x480 [ 347.152907] [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0 [ 347.152907] [<c1605716>] __tcp_push_pending_frames+0x36/0xd0 [ 347.152907] [<c15f4860>] tcp_push+0xf0/0x120 [ 347.152907] [<c15f7641>] tcp_sendmsg+0xf1/0xbf0 [ 347.152907] [<c116d920>] ? kmem_cache_free+0xf0/0x120 [ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40 [ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40 [ 347.152907] [<c114f0f0>] ? do_wp_page+0x3e0/0x850 [ 347.152907] [<c161c36a>] inet_sendmsg+0x4a/0xb0 [ 347.152907] [<c1150269>] ? handle_mm_fault+0x709/0xfb0 [ 347.152907] [<c15a006b>] sock_aio_write+0xbb/0xd0 [ 347.152907] [<c1180b79>] do_sync_write+0x69/0xa0 [ 347.152907] [<c1181023>] vfs_write+0x123/0x160 [ 347.152907] [<c1181d55>] SyS_write+0x55/0xb0 [ 347.152907] [<c167f0d8>] sysenter_do_call+0x12/0x28 This can easily be reproduced with the following packetdrill-script (the "magic" with netem, sk_pacing and limit_output_bytes is done to prevent the kernel from pushing all segments, because hitting the limit without doing this is not so easy with packetdrill): 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460> +0 > S. 0:0(0) ack 1 <mss 1460> +0.1 < . 1:1(0) ack 1 win 65000 +0 accept(3, ..., ...) = 4 // This forces that not all segments of the snd-queue will be pushed +0 `tc qdisc add dev tun0 root netem delay 10ms` +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2` +0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0 +0 write(4,...,10000) = 10000 +0 write(4,...,10000) = 10000 // Set tcp-repair stuff, particularly TCP_RECV_QUEUE +0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0 +0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0 // This now will make the write push the remaining segments +0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0 +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000` // Now we will crash +0 write(4,...,1000) = 1000 This happens since `ec34232575` (tcp: fix retransmission in repair mode). Prior to that, the call to tcp_push was prevented by a check for tp->repair. The patch fixes it, by adding the new goto-label out_nopush. When exiting tcp_sendmsg and a push is not required, which is the case for tp->repair, we go to this label. When repairing and calling send() with TCP_RECV_QUEUE, the data is actually put in the receive-queue. So, no push is required because no data has been added to the send-queue. Cc: Andrew Vagin <avagin@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Fixes: `ec34232575` (tcp: fix retransmission in repair mode) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Andrew Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:21:03 -07:00
Eric Dumazet	5925a0555b	net: fix sparse warning in sk_dst_set() sk_dst_cache has __rcu annotation, so we need a cast to avoid following sparse error : include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces) include/net/sock.h:1774:19: expected struct dst_entry [noderef] <asn:4>__ret include/net/sock.h:1774:19: got struct dst_entry dst Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: `7f50236153` ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:03:06 -07:00
Eric Dumazet	a48e5fafec	vlan: free percpu stats in device destructor Madalin-Cristian reported crashs happening after a recent commit (`5a4ae5f6e7` "vlan: unnecessary to check if vlan_pcpu_stats is NULL") ----------------------------------------------------------------------- root@p5040ds:~# vconfig add eth8 1 root@p5040ds:~# vconfig rem eth8.1 Unable to handle kernel paging request for data at address 0x2bc88028 Faulting instruction address: 0xc058e950 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: CPU: 3 PID: 2167 Comm: vconfig Tainted: G W 3.16.0-rc3-00346-g65e85bf #2 task: e7264d90 ti: e2c2c000 task.ti: e2c2c000 NIP: c058e950 LR: c058ea30 CTR: c058e900 REGS: e2c2db20 TRAP: 0300 Tainted: G W (3.16.0-rc3-00346-g65e85bf) MSR: 00029002 <CE,EE,ME> CR: 48000428 XER: 20000000 DEAR: 2bc88028 ESR: 00000000 GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000 GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000 GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000 GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48 NIP [c058e950] vlan_dev_get_stats64+0x50/0x170 LR [c058ea30] vlan_dev_get_stats64+0x130/0x170 Call Trace: [e2c2dbd0] [ffffffea] 0xffffffea (unreliable) [e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140 [e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960 [e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110 [e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0 [e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50 [e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0 [e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160 [`e2c2de20`] [c058e360] vlan_ioctl_handler+0x1c0/0x550 [e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0 [e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0 [e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80 [e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c Fix this problem by freeing percpu stats from dev->destructor() instead of ndo_uninit() Reported-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com> Fixes: `5a4ae5f6e7` ("vlan: unnecessary to check if vlan_pcpu_stats is NULL") Cc: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:02:05 -07:00
Daniel Mack	d9daa24720	net: fix circular dependency in of_mdio code Commit `86f6cf4127` (net: of_mdio: add of_mdiobus_link_phydev()) introduced a circular dependency between libphy and of_mdio. depmod: ERROR: <modroot>/kernel/drivers/net/phy/libphy.ko in dependency cycle! depmod: ERROR: <modroot>/kernel/drivers/of/of_mdio.ko in dependency cycle! The problem is that of_mdio.c references &mdio_bus_type and libphy now references of_mdiobus_link_phydev. Fix this by not exporting of_mdiobus_link_phydev() from of_mdio.ko. Make it a static function in mdio_bus.c instead. Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-by: Jeff Mahoney <jeffm@suse.com> Fixes: `86f6cf4127` (net: of_mdio: add of_mdiobus_link_phydev()) Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 00:24:14 -07:00
David S. Miller	eb608d2b99	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-06-27 Please pull the following batch of fixes for the 3.16 stream... For the mac80211 bits, Johannes says: "We have a fix from Eliad for a time calculation, a fix from Max for head/tailroom when sending authentication packets, a revert that Felix requested since the patch in question broke regulatory and a fix from myself for an issue with a new command that we advertised in the wrong place." For the bluetooth bits, Gustavo says: "A few fixes for 3.16. This pull request contains a NULL dereference fix, and some security/pairing fixes." For the iwlwifi bits, Emmanuel says: "I have here a fix from Eliad for scheduled scan: it fixes a firmware assertion. Arik reverts a patch I made that didn't take into account that 3160 doesn't have UAPSD and hence, we can't assume that all newer firmwares support the feature. Here too, the visible effect is a firmware assertion. Along with that, we have a few fixes and additions to the device list." For the ath10k bits, Kalle says: "Bartosz fixed an issue where we were not able to create 8 vdevs when using DFS. Michal removed a false warning which was just confusing people." On top of that... Arend van Spriel fixes a 'divide by zero' regression in brcmfmac. Amitkumar Karwar corrects a transmit timeout in mwifiex. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 23:47:33 -07:00
Florian Fainelli	b758858c5c	net: bcmgenet: do not set packet length for RX buffers Hardware will provide this information as soon as we will start processing incoming packets, so there is no need to set the RX buffer length during buffer allocation. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 17:25:03 -07:00
Florian Fainelli	219575eb63	net: bcmgenet: start with carrier off We use the PHY library which will determine the link state for us, make sure we start with a carrier off until libphy has completed the link training. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 17:25:02 -07:00
Florian Fainelli	0f50ce96b7	net: bcmgenet: disable clock before register_netdev As soon as register_netdev() is called, the network device notifiers are running which means that other parts of the kernel, or user-space programs can call the network device ndo_open() callback and use the interface. Disable the Ethernet device clock before we register the network device such that we do not create the following situation: CPU0 CPU1 register_netdev() bcmgenet_open() clk_prepare_enable() clk_disable_unprepare() and leave the hardware block gated off, while we think it should be gated on. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 17:25:02 -07:00
Florian Fainelli	16f62d9bed	net: systemport: fix TX NAPI work done return value Although we do not limit the number of packets the TX completion function bcm_sysport_tx_reclaim() is allowed to reclaim, we were still using its return value as-is. This means that we could hit the WARN() in net/core/dev.c where work_done >= budget. Make sure we do exit the NAPI context when the TX ring is empty, and pretend there was no work to do. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 17:10:17 -07:00
Florian Fainelli	412bce83ac	net: systemport: fix UniMAC reset logic The UniMAC CMD_SW_RESET bit is not a self-clearing bit, so we need to assert it, wait a bit and clear it manually. As a result, umac_reset() is updated not to return any value. The previous version of the code simply wrote 0 to the CMD register, which would make the busy-waiting loop exit immediately, having zero effect. By writing 0 to the CMD register, we were clearing all bits in the CMD register, and not using the hardware reset default values which are set on purpose. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 17:10:16 -07:00
Florian Fainelli	3b140a6788	net: systemport: do not clear IFF_MULTICAST flag The SYSTEMPORT Ethernet MAC supports multicast just fine, it just lacks any sort of Unicast/Broadcast/Multicasting filtering at the Ethernet MAC level since that is handled by the front end Ethernet switch, but that is properly handled by bcm_sysport_set_rx_mode(). Some user-space applications might be relying on the presence of this flag to prevent using multicast sockets, this also prevents that interface from joining the IPv6 all-router mcast group. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 17:10:16 -07:00
Eric Dumazet	07b0f00964	bnx2x: fix possible panic under memory stress While it is legal to kfree(NULL), it is not wise to use : put_page(virt_to_head_page(NULL)) BUG: unable to handle kernel paging request at ffffeba400000000 IP: [<ffffffffc01f5928>] virt_to_head_page+0x36/0x44 [bnx2x] Reported-by: Michel Lespinasse <walken@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ariel Elior <ariel.elior@qlogic.com> Fixes: `d46d132cc0` ("bnx2x: use netdev_alloc_frag()") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 12:20:20 -07:00
Eric Dumazet	7f50236153	ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix We have two different ways to handle changes to sk->sk_dst First way (used by TCP) assumes socket lock is owned by caller, and use no extra lock : __sk_dst_set() & __sk_dst_reset() Another way (used by UDP) uses sk_dst_lock because socket lock is not always taken. Note that sk_dst_lock is not softirq safe. These ways are not inter changeable for a given socket type. ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used the socket lock as synchronization, but users might be UDP sockets. Instead of converting sk_dst_lock to a softirq safe version, use xchg() as we did for sk_rx_dst in commit `e47eb5dfb2` ("udp: ipv4: do not use sk_dst_lock from softirq context") In a follow up patch, we probably can remove sk_dst_lock, as it is only used in IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: `9cb3a50c5f` ("ipv4: Invalidate the socket cached route on pmtu events if possible") Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-30 23:40:58 -07:00
Alex Wang	4a46b24e14	openvswitch: Use exact lookup for flow_get and flow_del. Due to the race condition in userspace, there is chance that two overlapping megaflows could be installed in datapath. And this causes userspace unable to delete the less inclusive megaflow flow even after it timeout, since the flow_del logic will stop at the first match of masked flow. This commit fixes the bug by making the kernel flow_del and flow_get logic check all masks in that case. Introduced by `03f0d916a` (openvswitch: Mega flow implementation). Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-30 20:47:15 -07:00
Ben Pfaff	ad55200734	openvswitch: Fix tracking of flags seen in TCP flows. Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the flow found in the kernel flow table (in 'flow'). This bug made the Open vSwitch userspace fin_timeout action have no effect in many cases. This bug is introduced by commit `88d73f6c41` (openvswitch: Use TCP flags in the flow key for stats.) Reported-by: Len Gao <leng@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-29 14:10:51 -07:00
Wei Zhang	e0bb8c44ed	openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash When use gre vport, openvswitch register a gre_cisco_protocol but does not supply a err_handler with it. The gre_cisco_err() in net/ipv4/gre_demux.c expect err_handler be provided with the gre_cisco_protocol implementation, and call ->err_handler() without existence check, cause the kernel crash. This patch provide a err_handler to fix this bug. This bug introduced by commit `aa310701e7` (openvswitch: Add gre tunnel support.) Signed-off-by: Wei Zhang <asuka.com@163.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-29 14:10:48 -07:00
Andy Zhou	fe984c08e2	openvswitch: Fix a double free bug for the sample action When sample action returns with an error, the skb has already been freed. This patch fix a bug to make sure we don't free it again. This bug introduced by commit `ccb1352e76` (net: Add Open vSwitch kernel components.) Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-29 14:10:43 -07:00
Denis Kirjanov	dba63115ce	powerpc: bpf: Fix the broken LD_VLAN_TAG_PRESENT test We have to return the boolean here if the tag presents or not, not just ANDing the TCI with the mask which results to: [ 709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT [ 709.412245] ret 4096 != 1 [ 709.412332] ret 4096 != 1 [ 709.412333] FAIL (2 times) Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 16:14:12 -07:00
Denis Kirjanov	3fc60aa097	powerpc: bpf: Use correct mask while accessing the VLAN tag To get a full tag (and not just a VID) we should access the TCI except the VLAN_TAG_PRESENT field (which means that 802.1q header is present). Also ensure that the VLAN_TAG_PRESENT stay on its place Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 16:14:12 -07:00
John W. Linville	f9fa39e9ac	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2014-06-27 13:35:56 -04:00
Hangbin Liu	e940f5d6ba	ipv6: Fix MLD Query message check Based on RFC3810 6.2, we also need to check the hop limit and router alert option besides source address. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-27 00:21:50 -07:00

1 2 3 4 5 ...

455539 Commits All Branches Search

455539 Commits

All Branches