linux_old1

Commit Graph

Author	SHA1	Message	Date
Bjørn Mork	cc6ba5fdaa	net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround) We normally trust and use the CDC functional descriptors provided by a number of devices. But some of these will erroneously list the address reserved for the device end of the link. Attempting to use this on both the device and host side will naturally not work. Work around this bug by ignoring the functional descriptor and assign a random address instead in this case. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:51:17 -04:00
Bjørn Mork	6483bdc9d7	net: qmi_wwan: fixup destination address (firmware bug workaround) Received packets are sometimes addressed to 00:a0:c6:00:00:00 instead of the address the device firmware should have learned from the host: 321.224126 77.16.85.204 -> 148.122.171.134 ICMP 98 Echo (ping) request id=0x4025, seq=64/16384, ttl=64 0000 82 c0 82 c9 f1 67 82 c0 82 c9 f1 67 08 00 45 00 .....g.....g..E. 0010 00 54 00 00 40 00 40 01 57 cc 4d 10 55 cc 94 7a .T..@.@.W.M.U..z 0020 ab 86 08 00 62 fc 40 25 00 40 b2 bc 6e 51 00 00 ....b.@%.@..nQ.. 0030 00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15 ..k............. 0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 .......... !"#$% 0050 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 &'()+,-./012345 0060 36 37 67 321.240607 148.122.171.134 -> 77.16.85.204 ICMP 98 Echo (ping) reply id=0x4025, seq=64/16384, ttl=55 0000 00 a0 c6 00 00 00 02 50 f3 00 00 00 08 00 45 00 .......P......E. 0010 00 54 00 56 00 00 37 01 a0 76 94 7a ab 86 4d 10 .T.V..7..v.z..M. 0020 55 cc 00 00 6a fc 40 25 00 40 b2 bc 6e 51 00 00 U...j.@%.@..nQ.. 0030 00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15 ..k............. 0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 .......... !"#$% 0050 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 &'()+,-./012345 0060 36 37 67 The bogus address is always the same, and matches the address suggested by many devices as a default address. It is likely a hardcoded firmware default. The circumstances where this bug has been observed indicates that the trigger is related to timing or some other factor the host cannot control. Repeating the exact same configuration sequence that caused it to trigger once, will not necessarily cause it to trigger the next time. Reproducing the bug is therefore difficult. This opens up a possibility that the bug is more common than we can confirm, because affected devices often will work properly again after a reset. A procedure most users are likely to try out before reporting a bug. Unconditionally rewriting the destination address if the first digit of the received packet is 0, is considered an acceptable compromise since we already have to inspect this digit. The simplification will cause unnecessary rewrites if the real address starts with 0, but this is still better than adding additional tests for this particular case. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:51:17 -04:00
Bjørn Mork	6ff509af38	net: qmi_wwan: fixup missing ethernet header (firmware bug workaround) A number of LTE devices from different vendors all suffer from the same firmware bug: Most of the packets received from the device while it is attached to a LTE network will not have an ethernet header. The devices work as expected when attached to 2G or 3G networks, sending an ethernet header with all packets. This driver is not aware of which network the modem attached to, and even if it were there are still some packet types which are always received with the header intact. All devices supported by this driver have severely limited networking capabilities: - can only transmit IPv4, IPv6 and possibly ARP - can only support a single host hardware address at any time - will only do point-to-point communcation with the host Because of this, we are able to reliably identify any bogus raw IP packets by simply looking at the 4 IP version bits. All we need to do is to avoid 4 or 6 in the first digit of the mac address. This workaround ensures this, and fix up the received packets as necessary. Given the distribution of the bug, it is believed that the source is the chipset vendor. The devices which are verified to be affected are: Huawei E392u-12 (Qualcomm MDM9200) Pantech UML290 (Qualcomm MDM9600) Novatel USB551L (Qualcomm MDM9600) Novatel E362 (Qualcomm MDM9600) It is believed that the bug depend on firmware revision, which means that possibly all devices based on the above mentioned chipset may be affected if we consider all available firmware revisions. The information about affected devices and versions is likely incomplete. As the additional overhead for packets not needing this fixup is very small, it is considered acceptable to apply the workaround to all devices handled by this driver. Reported-by: Dan Williams <dcbw@redhat.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:51:16 -04:00
David S. Miller	0cb670eef5	Merge branch 'bonding' Nikolay Aleksandrov says: ==================== This patch-set fixes mainly bugs on enslave failure and one occasion of a needed locking. The patches are: 1. On enslave failure mc addresses are not flushed from the slave 2. On enslave failure vlans are not cleaned up from the slave 3. On enslave failure the bond's primary and curr_active_slave are not cleaned up (which might result in use of freed memory) 4. On enslave failure netpoll is not disabled which might result in a memory leak 5. In bond_mc_swap() the bond's mc addr list is walked without netif_addr_lock, since it can be called without rtnl, add it v2: patch 01 - fix log message and remove unnecessary code move ==================== Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:49:11 -04:00
nikolay@redhat.com	d632ce989c	bonding: in bond_mc_swap() bond's mc addr list is walked without lock Use netif_addr_lock_bh() to acquire the appropriate lock before walking. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:19 -04:00
nikolay@redhat.com	fc7a72ac86	bonding: disable netpoll on enslave failure slave_disable_netpoll() is not called upon enslave failure which would lead to a memory leak. Call slave_disable_netpoll() after err_detach as that's the first error path after enabling netpoll on that slave. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:19 -04:00
nikolay@redhat.com	3c5913b53f	bonding: primary_slave & curr_active_slave are not cleaned on enslave failure On enslave failure primary_slave can point to new_slave which is to be freed, and the same applies to curr_active_slave. So check if this is the case and clean up properly after err_detach because that's the first error code path after they're set. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:19 -04:00
nikolay@redhat.com	a506e7b479	bonding: vlans don't get deleted on enslave failure The main problem is with vid refcount which only gets bumped up. Delete the vlans after err_detach as that's the first error path after the vlans are added. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:18 -04:00
nikolay@redhat.com	25e40305d4	bonding: mc addresses don't get deleted on enslave failure Add bond_mc_list_flush() after err_detach as that's the first error path after the addresses are added. The main issue is the mc addresses' refcount which only gets bumped up. v2: update log message and don't move code unnecessarily Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:48:18 -04:00
Wei Yongjun	cb95ec6261	pkt_sched: fix error return code in fw_change_attrs() Fix to return -EINVAL when tb[TCA_FW_MASK] is set and head->mask != 0xFFFFFFFF instead of 0 (ifdef CONFIG_NET_CLS_IND and tb[TCA_FW_INDEV]), as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:34:53 -04:00
Dan Carpenter	e15465e180	irda: small read past the end of array in debug code The "reason" can come from skb->data[] and it hasn't been capped so it can be from 0-255 instead of just 0-6. For example in irlmp_state_dtr() the code does: reason = skb->data[3]; ... irlmp_disconnect_indication(self, reason, skb); Also LMREASON has a couple other values which don't have entries in the irlmp_reasons[] array. And 0xff is a valid reason as well which means "unknown". So far as I can see we don't actually care about "reason" except for in the debug code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 17:32:31 -04:00
David S. Miller	f36391d279	sparc64: Fix race in TLB batch processing. As reported by Dave Kleikamp, when we emit cross calls to do batched TLB flush processing we have a race because we do not synchronize on the sibling cpus completing the cross call. So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.) and either flushes are missed or flushes will flush the wrong addresses. Fix this by using generic infrastructure to synchonize on the completion of the cross call. This first required getting the flush_tlb_pending() call out from switch_to() which operates with locks held and interrupts disabled. The problem is that smp_call_function_many() cannot be invoked with IRQs disabled and this is explicitly checked for with WARN_ON_ONCE(). We get the batch processing outside of locked IRQ disabled sections by using some ideas from the powerpc port. Namely, we only batch inside of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a region, we flush TLBs synchronously. 1) Get rid of xcall_flush_tlb_pending and per-cpu type implementations. 2) Do TLB batch cross calls instead via: smp_call_function_many() tlb_pending_func() __flush_tlb_pending() 3) Batch only in lazy mmu sequences: a) Add 'active' member to struct tlb_batch b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE c) Set 'active' in arch_enter_lazy_mmu_mode() d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode() e) Check 'active' in tlb_batch_add_one() and do a synchronous flush if it's clear. 4) Add infrastructure for synchronous TLB page flushes. a) Implement __flush_tlb_page and per-cpu variants, patch as needed. b) Likewise for xcall_flush_tlb_page. c) Implement smp_flush_tlb_page() to invoke the cross-call. d) Wire up global_flush_tlb_page() to the right routine based upon CONFIG_SMP 5) It turns out that singleton batches are very common, 2 out of every 3 batch flushes have only a single entry in them. The batch flush waiting is very expensive, both because of the poll on sibling cpu completeion, as well as because passing the tlb batch pointer to the sibling cpus invokes a shared memory dereference. Therefore, in flush_tlb_pending(), if there is only one entry in the batch perform a completely asynchronous global_flush_tlb_page() instead. Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2013-04-19 17:26:26 -04:00
Stephen Boyd	cea15092f0	ARM: 7699/1: sched_clock: Add more notrace to prevent recursion cyc_to_sched_clock() is called by sched_clock() and cyc_to_ns() is called by cyc_to_sched_clock(). I suspect that some compilers inline both of these functions into sched_clock() and so we've been getting away without having a notrace marking. It seems that my compiler isn't inlining cyc_to_sched_clock() though, so I'm hitting a recursion bug when I enable the function graph tracer, causing my system to crash. Marking these functions notrace fixes it. Technically cyc_to_ns() doesn't need the notrace because it's already marked inline, but let's just add it so that if we ever remove inline from that function it doesn't blow up. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2013-04-19 22:23:55 +01:00
Andy Gospodarek	bb5b052f75	bond: add support to read speed and duplex via ethtool This patch adds support for the get_settings ethtool op to the bonding driver. This was motivated by users who wanted to get the speed of the bond and compare that against throughput to understand utilization. The behavior before this patch was added was problematic when computing line utilization after trying to get link-speed and throughput via SNMP. Output from ethtool looks like this for a round-robin bond: Settings for bond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 11000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Link detected: yes I tested this and verified it works as expected. A test was also done on a version backported to an older kernel and it worked well there. v2: Switch to using ethtool_cmd_speed_set to set speed, added check to SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations as they were not needed, and set port type to 'Other.' v3: Fix useless assignment and checkpatch warning. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:39:50 -04:00
Daniel Borkmann	4b457bdf1d	packet: move hw/sw timestamp extraction into a small helper This patch introduces a small, internal helper function, that is used by PF_PACKET. Based on the flags that are passed, it extracts the packet timestamp in the receive path. This is merely a refactoring to remove some duplicate code in tpacket_rcv(), to make it more readable, and to enable others to use this function in PF_PACKET as well, e.g. for TX. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:39:13 -04:00
Daniel Borkmann	6e94d1ef37	net: socket: move ktime2ts to ktime header api Currently, ktime2ts is a small helper function that is only used in net/socket.c. Move this helper into the ktime API as a small inline function, so that i) it's maintained together with ktime routines, and ii) also other files can make use of it. The function is named ktime_to_timespec_cond() and placed into the generic part of ktime, since we internally make use of ktime_to_timespec(). ktime_to_timespec() itself does not check the ktime variable for zero, hence, we name this function ktime_to_timespec_cond() for only a conditional conversion, and adapt its users to it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:39:13 -04:00
David S. Miller	cf27014866	net: Add .gitignore to networking selftests directory. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:36:12 -04:00
David S. Miller	2d6577f17b	net: Add missing netdev feature strings for NETIF_F_HW_VLAN_STAG_* Noticed by Ben Hutchings. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:16:50 -04:00
David S. Miller	92352df1db	Merge branch 'qlcnic' Rajesh Borundia says: ==================== * "qlcnic: Change 82xx adapter VLAN id endian type". - Adapter requires VLAN id in little endian. VLAN id was being converted to __le16 and then passed as a parameter. Pass VLAN id as u16 and then use cpu_to_le16 at appropriate places. It is appropriate for net-next as SR-IOV patches have a dependency on it. * "qlcnic: Fix loopback test for SR-IOV PF". - It is appropriate for net-next as change is needed for SRIOV PF only. * Remaining patches add enhancements to SR-IOV functionality like - FLR handling - Adapter reset recovery handling - iproute2 tool support for configuring MAC address, Tx rate and VLAN id. - Mailbox polling support for SR-IOV PF in case mailbox interrupts are disabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:16:42 -04:00
Rajesh Borundia	c637627820	qlcnic: Update version to 5.2.41 Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:14:54 -04:00
Rajesh Borundia	7ed3ce4800	qlcnic: Support polling for mailbox events. o When mailbox interrupt is disabled PF should be able to process request from VF. Enable polling for such cases. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:14:54 -04:00
Rajesh Borundia	d1a1105efd	qlcnic: Fix loopback test for SR-IOV PF. o Do not disable mailbox interrupts while running loopback test through SR-IOV PF. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:14:53 -04:00
Rajesh Borundia	91b7282b61	qlcnic: Support VLAN id config. o Add support for VLAN id configuration per VF using iproute2 tool. o VLAN id's 1-4094 are treated as PVID by the PF and Guest VLAN tagging is not allowed by default. o PVID is disabled when the VLAN id is set to 0 o Guest VLAN tagging is allowed when the VLAN id is set to 4095. o Only one Guest VLAN id is supported. o VLAN id can be changed only when the VF driver is not loaded. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:14:40 -04:00
Rajesh Borundia	4000e7a78d	qlcnic: Support MAC address, Tx rate config. o Add support for MAC address and Tx rate configuration per VF via iproute2 tool. o Tx rate change is allowed while the guest is running and the VF driver is loaded. o MAC address change is allowed only when VF driver is not loaded. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:02:38 -04:00
Rajesh Borundia	f036e4f44e	qlcnic: VF reset recovery implementation. o Implement recovery mechanism for VF to recover from adapter resets. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:02:38 -04:00
Rajesh Borundia	97d8105cf3	qlcnic: VF FLR implementation. o FLR from Hypervisor - When hypervisor issues a VF FLR request, adapter notifies the parent PF driver of the FLR request for PF driver to perform any cleanup on behalf of that VF. o FLR from VF Driver - VF driver may initiate a VF FLR request, if VF state needs to be cleaned up before a re-initialization. VF re-initialization during kdump is an example. o PF driver cleans up all resources allocated on behalf of a VF, on VF FLR notifications from the adapter or from the VF driver. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:02:38 -04:00
Rajesh Borundia	f80bc8fe6d	qlcnic: Change 82xx adapter VLAN id endian type. o 82xx adapter requires VLAN id in little endian format. Instead of passing vlan id parameter as __le16, pass the parameter as u16 and use cpu_to_le16 at appropriate places. Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 16:02:38 -04:00
David S. Miller	42bbcb7803	Merge branch 'netlink-mmap' Patrick McHardy says: ==================== The following patches contain an implementation of memory mapped I/O for netlink. The implementation is modelled after AF_PACKET memory mapped I/O with a few differences: - In order to perform memory mapped I/O to userspace, the kernel allocates skbs with the data area pointing to the data area of the mapped frames. All netlink subsystems assume a linear data area, so for the sake of simplicity, the mapped data area is not attached to the paged area but to skb->data. This requires introduction of a special skb alloction function that just allocates an skb head without the data area. Since this is a quite rare use case, I introduced a new function based on __alloc_skb instead of splitting it up into head and data alloction. The alternative would be to introduce an __alloc_skb_head and __alloc_skb_data function, which would actually be useful for a specific error case in memory mapped netlink, but would require a couple of extra instructions for the common skb allocation case, so it doesn't really seem worth it. In order to get the destination memory area for skb->data before message construction, memory mapped netlink I/O needs to look up the destination socket during allocation instead of during transmission because the ring is owned by the receiveing socket/process. A special skb allocation function (netlink_alloc_skb) taking the destination pid as an argument is used for this, all subsystems that want to support memory mapped I/O need to use this function, automatic fallback to the receive queue happens for unconverted subsystems. Dumps automatically use memory mapped I/O if the receiving socket has enabled it. The visible effect of looking up the destination socket during allocation instead of transmission is that message ordering in userspace might change in case allocation and transmission aren't performed atomically. This usually doesn't matter since most subsystems have a BKL-like lock like the rtnl mutex, to my knowledge the currently only existing case where it might matter is nfnetlink_queue combined with the recently introduced batched verdicts, but a) that subsystem already includes sequence numbers which allow userspace to reorder messages in case it cares to, also the reodering window is quite small and b) with memory mapped transmission batching can be performed in a subsystem indepandant manner. - AF_NETLINK contains flow control for database dumps, with regular I/O dump continuation are triggered based on the sockets receive queue space and by recvmsg() calls. Since with memory mapped I/O there are no recvmsg() calls under normal operation, this is done in netlink_poll(), under the assumption that userspace has processed all pending frames before invoking poll(), thus the ring is expected to have room for new messages. Dumps currently don't benefit as much as they could from memory mapped I/O because each single continuation requires a poll() call. A more agressive approach seems like a good idea to me, especially in case the socket is not subscribed to any multicast groups (IOW only receiving explicitly requested data). Besides that, the memory mapped netlink implementation extends the states defined by AF_PACKET between userspace and the kernel by a SKIP status, this is intended for the case that userspace wants to queue frames (specifically when using nfnetlink_queue, an IDS and stream reassembly, requested by Eric Leblond) for a longer period of time. The kernel skips over all frames marked with SKIP when looking or unused frames and only fails when not finding a free frame or when having skipped the entire ring. Also noteworthy is memory mapped sendmsg: the kernel performs validation of messages before accepting and processing them, in order to prevent userspace from changing the messages contents after validation, the kernel checks that the ring is only mapped once and the file descriptor is not shared (in order to avoid having userspace set up another mapping after the first mentioned check). If either of both is not true, the message copied to an allocated skb and processed as with regular I/O. I'd especially appreciate review of this part since I'm not really versed in memory, file and process management, The remaining interesting details are included in the changelogs of the individual patches and the documentation, so I won't repeat them here. As an example, nfnetlink_queue is convererted to support memory mapped I/O. Other subsystems that would probably benefit are nfnetlink_log, audit and maybe ISCSI, not sure. Following are some numbers collected by Florian Westphal based on a slightly older version, which included an experimental patch for the nfnetlink_queue ordering issue. === Test hardware is a 12-core machine Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz ixgbe interfaces are used (i.e., multiqueue nics). irqs are distributed across the cpus. I've made several tests. The simple one consists of 3GBit UDP traffic, packets are 1500 bytes in size (i.e., no fragmentation), with a single nfqueue and the test client programs in libmnl examples directory. Packets are sent from one /24 net to another /24 net, i.e. there are a few hundred flows active at any given time. I've also tested with snort, but I disabled all rules. 6Gbit UDP traffic is generated in the snort case, and 6 nfqueues are used (i.e., 6 snorts run in parallel). I've tested with 3 different kernels, all based on 3.7.1. - 3.7.1, without the mmap patches - 3.7.1, with Patricks mmap patches - 3.7.1, with mmap patches and extended spinlock to ensure packet ids are monotonically increasing and cannot be re-ordered. This is what we currently ship in our product. [ the spinlock that is extended is the per nfqueue spinlock, it will be held from the time the netlink skb is allocated until the netlink skb is sent to userspace: http://1984.lsi.us.es/git/nf-next/commit/?h=mmap-netlink3&id=b8eb19c46650fef4e9e4fe53f367f99bbf72afc9 ] snort is normally used in "batch mode", i.e., after processing 25 packets a single "batch verdict" is sent to accept the packets seen so far. "mmap snort" means RX_RING + sendmsg(), i.e. TX_RING is not used at this time (except where noted below). One reason is that snort has a reload thread, so kernel needs to copy; also in the snort case no payload rewrite takes place, so compared to the rx path the tx path is cheap. Results: 3.7.1, without mmap patches, i.e. recv()+sendmsg() for everyone nfq-queue: 1.7 gbit out snort-recv-batch-25 5.1 gbit out snort-recv-no-batch 3.1 gbit out 3.7.1 + mmap + without extended spinlocked section nfq-queue: 1.7 gbit out (recv/sendmsg) nfq-queue-mmap: 2.4 gbit out snort-mmap-batch-25 5.6 gbit out (warning: since ids can be re-ordered, this version is "broken"). snort-recv-batch-25 5.1 gbit out snort-mmap-no-batch 4.6 gbit out (i.e., one verdict per packet) Kernel 3.7.1 + mmap + extended spinlock section: nfq-queue: 1.4 gbit out nfq-queue-mmap: 2.3 gbit out snort: 5.6 gbit out Conclusions: - The "extended spinlocked section" hurts performance in the single queue case; with 6 snorts there is no measureable slowdown. - I tried to re-write the mmap-snort to work without batch verdicts, but results were not very encouraging: kernel 3.7.1 + mmap (without extended spinlocked section): snort-mmap-batch-25 5.6 gbit out (what we currenlty ship) snort-recv-batch-25 5.1 gbit out (without using mmap) snort-mmap-batch-1 4.6 gbit out (with mmap but without batch verdicts) snort-mmap-txring-25 5.2 gbit out (with mmap but without batch verdicts) snort-mmap-txring-1 4.6 gbit out (with mmap but without batch verdicts) The difference between the last two is that in the txring-25 case, we put a verdict into the tx ring after every packet, but will only invoke sendmsg(, NULL, 0) after processing 25 packets. So the only difference is the number of sendmsg calls/context switches. So, i.o.w, kernel 3.7.1 + mmap + the extra locking crap is faster than 3.7.1 + mmap-without-extra-locking and single-verdict-per packet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 15:37:09 -04:00
Patrick McHardy	3ab1f683bf	nfnetlink: add support for memory mapped netlink Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:58:36 -04:00
Patrick McHardy	ec464e5dc5	netfilter: rename netlink related "pid" variables to "portid" Get rid of the confusing mix of pid and portid and use portid consistently for all netlink related socket identities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:58:36 -04:00
Patrick McHardy	5683264c39	netlink: add documentation for memory mapped I/O Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:58:36 -04:00
Patrick McHardy	4ae9fbee16	netlink: add RX/TX-ring support to netlink diag Based on AF_PACKET. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:58 -04:00
Patrick McHardy	cd1df525da	netlink: add flow control for memory mapped I/O Add flow control for memory mapped RX. Since user-space usually doesn't invoke recvmsg() when using memory mapped I/O, flow control is performed in netlink_poll(). Dumps are allowed to continue if at least half of the ring frames are unused. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:58 -04:00
Patrick McHardy	f9c2288837	netlink: implement memory mapped recvmsg() Add support for mmap'ed recvmsg(). To allow the kernel to construct messages into the mapped area, a dataless skb is allocated and the data pointer is set to point into the ring frame. This means frames will be delivered to userspace in order of allocation instead of order of transmission. This usually doesn't matter since the order is either not determinable by userspace or message creation/transmission is serialized. The only case where this can have a visible difference is nfnetlink_queue. Userspace can't assume mmap'ed messages have ordered IDs anymore and needs to check this if using batched verdicts. For non-mapped sockets, nothing changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:58 -04:00
Patrick McHardy	5fd96123ee	netlink: implement memory mapped sendmsg() Add support for mmap'ed sendmsg() to netlink. Since the kernel validates received messages before processing them, the code makes sure userspace can't modify the message contents after invoking sendmsg(). To do that only a single mapping of the TX ring is allowed to exist and the socket must not be shared. If either of these two conditions does not hold, it falls back to copying. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	9652e931e7	netlink: add mmap'ed netlink helper functions Add helper functions for looking up mmap'ed frame headers, reading and writing their status, allocating skbs with mmap'ed data areas and a poll function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	ccdfcc3985	netlink: mmaped netlink: ring setup Add support for mmap'ed RX and TX ring setup and teardown based on the af_packet.c code. The following patches will use this to add the real mmap'ed receive and transmit functionality. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	cf0a018ac6	netlink: add netlink_skb_set_owner_r() For mmap'ed I/O a netlink specific skb destructor needs to be invoked after the final kfree_skb() to clean up state. This doesn't work currently since the skb's ownership is transfered to the receiving socket using skb_set_owner_r(), which orphans the skb, thereby invoking the destructor prematurely. Since netlink doesn't account skbs to the originating socket, there's no need to orphan the skb. Add a netlink specific skb_set_owner_r() variant that does not orphan the skb and use a netlink specific destructor to call sock_rfree(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	1298ca4671	netlink: don't orphan skb in netlink_trim() Netlink doesn't account skbs to the sending socket, so the there's no need to orphan the skb before trimming it. Removing the skb_orphan() call is required for mmap'ed netlink, which uses a netlink specific skb destructor that must not be invoked before the final freeing of the skb. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	0ebd0ac5ff	net: add function to allocate sk_buff head without data area Add a function to allocate a sk_buff head without any data. This will be used by memory mapped netlink to attach data from the mmaped area to the skb. Additionally change skb_release_all() to check whether the skb has a data area to allow the skb destructor to clear the data pointer in case only a head has been allocated. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:57 -04:00
Patrick McHardy	e32123e598	netlink: rename ssk to sk in struct netlink_skb_params Memory mapped netlink needs to store the receiving userspace socket when sending from the kernel to userspace. Rename 'ssk' to 'sk' to avoid confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:56 -04:00
Patrick McHardy	cd967e0571	netlink: add symbolic value for congested state Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:57:56 -04:00
David S. Miller	447b816fe0	Merge branch '8021ad' Patrick McHardy says: ==================== The following patches add support for 802.1ad (provider tagging) to the VLAN driver. The patchset consists of the following parts: - renaming of the NET_F_HW_VLAN feature flags to indicate that they only operate on CTAGs - preparation for 802.1ad VLAN filtering offload by adding a proto argument to the rx_{add,kill}_vid net_device_ops callbacks - preparation of the VLAN code to support multiple protocols by making the protocol used for tagging a property of the VLAN device and converting the device lookup functions accordingly - second step of preparation of the VLAN code by making the packet tagging functions take a protocol argument - introducation of 802.1ad support in the VLAN code, consisting mainly of checking for ETH_P_8021AD in a couple of places and testing the netdevice offload feature checks to take the protocol into account - announcement of STAG offloading capabilities in a couple of drivers for virtual network devices The patchset is based on net-next.git and has been tested with single and double tagging with and without HW acceleration (for CTAGs). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:46:27 -04:00
Patrick McHardy	28d2b136ca	net: vlan: announce STAG offload capability in some drivers - macvlan: propagate STAG filtering capabilities from underlying device - ifb: announce STAG tagging support in addition to CTAG tagging support - veth: announce STAG tagging/stripping support in addition to CTAG support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:46:06 -04:00
Patrick McHardy	8ad227ff89	net: vlan: add 802.1ad support Add support for 802.1ad VLAN devices. This mainly consists of checking for ETH_P_8021AD in addition to ETH_P_8021Q in a couple of places and check offloading capabilities based on the used protocol. Configuration is done using "ip link": # ip link add link eth0 eth0.1000 \ type vlan proto 802.1ad id 1000 # ip link add link eth0.1000 eth0.1000.1000 \ type vlan proto 802.1q id 1000 52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64 92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84) 20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:46:06 -04:00
Patrick McHardy	86a9bad3ab	net: vlan: add protocol argument to packet tagging functions Add a protocol argument to the VLAN packet tagging functions. In case of HW tagging, we need that protocol available in the ndo_start_xmit functions, so it is stored in a new field in the skb. The new field fits into a hole (on 64 bit) and doesn't increase the sks's size. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:46:06 -04:00
Patrick McHardy	1fd9b1fc31	net: vlan: prepare for 802.1ad support Make the encapsulation protocol value a property of VLAN devices and change the device lookup functions to take the protocol value into account. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:27 -04:00
Patrick McHardy	80d5c3689b	net: vlan: prepare for 802.1ad VLAN filtering offload Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:27 -04:00
Patrick McHardy	f646968f8f	net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:45:26 -04:00
Linus Torvalds	f068f5e158	ARM: arm-soc fixes for 3.9 Only one remaining fix for arm-soc platforms at this time, a small bugfix for cpu hotplug on highbank platforms that has become much easier to hit as of late. Details in the patch description, but it's small and well-contained and definitely impacts users of the platform, so 3.9 seems appropriate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRcYGyAAoJEIwa5zzehBx3sYYP/iiongz+96eVHVxBMGMWHg0f BcTtHw2ef0h4vCDMNUXmPEmVu74hN5VMCutUd2qeKvckyC7gEIrbVnLFwej05RiA AFwIJ0A2pAjdvsAkRfTNXPpFs7bvZ+8r6cUJd4JGCP7IC8Zi4sj4dET1ZNEsaccD vMQ+Y8O3dvVu1lEFfGT87huppQgCr/jzc6O9oc3eHDZ242v8tVLS31PpuZe8Qt25 8QvKKsY/UG82/aiz+ijlcddDJz132byNzrOvmY0DkcZ5ZMxbTnGUNFUqFszFkBYu FrtAyJyQEyUYwY7r6geogfgtj17mQzbq1/4azcApmDQHadBhVbXdDuTW0EPO63QC sQzf9JgWR66H5hOYeDp2ka1RbBh00k6byvh7T5adzgoJDtbHtJJ8OxW16OR/eoCQ umCZ2rQxAQCpw11qRnA8LDwnmujr5qXFMuj5NqepUaDCLbWAq2VWeNTicu0LMgCN RJZ7ifk94o+uCQETd0D2ZrZZtqvrbLtaLAgfa54PiMYecV0rRF44FUVDCIbUBRKq 5ouN76JfvbOQutLj41nA/QBr0ATCAw4mPoxaeaIwie1G7lXvZvMRRkjKLxDhe/qs +3gZivBhQQuKEe3CYooLx3xoC/bs1VoMsyiRXa8YgdKsZbY0bIsQAUZp8dYjAgoW hYf3zRgRd69+k0uqWA1f =j3Hq -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Only one remaining fix for arm-soc platforms at this time, a small bugfix for cpu hotplug on highbank platforms that has become much easier to hit as of late. Details in the patch description, but it's small and well-contained and definitely impacts users of the platform, so 3.9 seems appropriate." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: highbank: fix cache flush ordering for cpu hotplug	2013-04-19 11:38:36 -07:00

1 2 3 4 5 ...

363995 Commits All Branches Search

363995 Commits

All Branches