linux

Commit Graph

Author	SHA1	Message	Date
Mahesh Bandewar	7451495755	bonding: Allow userspace to set actors' macaddr in an AD-system. In an AD system, the communication between actor and partner is the business between these two entities. In the current setup anyone on the same L2 can "guess" the LACPDU contents and then possibly send the spoofed LACPDUs and trick the partner causing connectivity issues for the AD system. This patch allows to use a random mac-address obscuring it's identity making it harder for someone in the L2 is do the same thing. This patch allows user-space to choose the mac-address for the AD-system. This mac-address can not be NULL or a Multicast. If the mac-address is set from user-space; kernel will honor it and will not overwrite it. In the absence (value from user space); the logic will default to using the masters' mac as the mac-address for the AD-system. It can be set using example code below - # modprobe bonding mode=4 # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \ $(( (RANDOM & 0xFE) \| 0x02 )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF ))) # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: fixed up style issues reported by checkpatch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:59:32 -04:00
Mahesh Bandewar	6791e4661c	bonding: Allow userspace to set actors' system_priority in AD system This patch allows user to randomize the system-priority in an ad-system. The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user does not specify this value, the system defaults to 0xFFFF, which is what it was before this patch. Following example code could set the value - # modprobe bonding mode=4 # sys_prio=$(( 1 + RANDOM + RANDOM )) # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * changed how the default value is set in bond_check_params(), this makes the default consistent between what gets set for a new bond and what the default is claimed to be in the bonding options.] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:59:31 -04:00
David S. Miller	0198e09c4b	Merge branch 'kernel_socket_netns' Eric W. Biederman says: ==================== Cleanup the kernel sockets. Right now the situtation for allocating kernel sockets is a mess. - sock_create_kern does not take a namespace parameter. - kernel sockets must not reference count a network namespace and keep it alive or else we will have a reference counting loop. - The way we avoid the reference counting loop with sk_change_net and sk_release_kernel are major hacks. This patchset addresses this mess by fixing sock_create_kern to do everything necessary to create a kernel socket. None of the current users of kernel sockets need the network namespace reference counted. Either kernel sockets are network namespace aware (and using the current hacks) or kernel sockets are limited to the initial network namespace in which case it does not matter. This patchset starts by addressing tun which should be using normal userspace sockets like macvtap. Then sock_create_kern is fixed to take a network namespace. Then the in kernel status of sockets are passed through to sk_alloc. Then sk_alloc is fixed to not reference count the network namespace of kernel sockets. Then the callers of sock_create_kern are fixed up to stop using hacks. Then netlink which uses it's own flavor of sock_create_kern is fixed. Finally the hacks that are sk_change_net and sk_release_kernel are removed. When it is all done the code is easier to follow, easier to use, easier to maintain and shorter by about 70 lines. ==================== Reported-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:19 -04:00
Eric W. Biederman	affb9792f1	net: kill sk_change_net and sk_release_kernel These functions are no longer needed and no longer used kill them. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:18 -04:00
Eric W. Biederman	13d3078e22	netlink: Create kernel netlink sockets in the proper network namespace Utilize the new functionality of sk_alloc so that nothing needs to be done to suprress the reference counting on kernel sockets. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:18 -04:00
Eric W. Biederman	26abe14379	net: Modify sk_alloc to not reference count the netns of kernel sockets. Now that sk_alloc knows when a kernel socket is being allocated modify it to not reference count the network namespace of kernel sockets. Keep track of if a socket needs reference counting by adding a flag to struct sock called sk_net_refcnt. Update all of the callers of sock_create_kern to stop using sk_change_net and sk_release_kernel as those hacks are no longer needed, to avoid reference counting a kernel socket. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:18 -04:00
Eric W. Biederman	11aa9c28b4	net: Pass kern from net_proto_family.create to sk_alloc In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:17 -04:00
Eric W. Biederman	eeb1bd5c40	net: Add a struct net parameter to sock_create_kern This is long overdue, and is part of cleaning up how we allocate kernel sockets that don't reference count struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:17 -04:00
Eric W. Biederman	140e807da1	tun: Utilize the normal socket network namespace refcounting. There is no need for tun to do the weird network namespace refcounting. The existing network namespace refcounting in tfile has almost exactly the same lifetime. So rewrite the code to use the struct sock network namespace refcounting and remove the unnecessary hand rolled network namespace refcounting and the unncesary tfile->net. This change allows the tun code to directly call sock_put bypassing sock_release and making SOCK_EXTERNALLY_ALLOCATED unnecessary. Remove the now unncessary tun_release so that if anything tries to use the sock_release code path the kernel will oops, and let us know about the bug. The macvtap code already uses it's internal socket this way. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-11 10:50:16 -04:00
Eric Dumazet	80ba92fa1a	codel: add ce_threshold attribute For DCTCP or similar ECN based deployments on fabrics with shallow buffers, hosts are responsible for a good part of the buffering. This patch adds an optional ce_threshold to codel & fq_codel qdiscs, so that DCTCP can have feedback from queuing in the host. A DCTCP enabled egress port simply have a queue occupancy threshold above which ECT packets get CE mark. In codel language this translates to a sojourn time, so that one doesn't have to worry about bytes or bandwidth but delays. This makes the host an active participant in the health of the whole network. This also helps experimenting DCTCP in a setup without DCTCP compliant fabric. On following example, ce_threshold is set to 1ms, and we can see from 'ldelay xxx us' that TCP is not trying to go around the 5ms codel target. Queue has more capacity to absorb inelastic bursts (say from UDP traffic), as queues are maintained to an optimal level. lpaa23:~# ./tc -s -d qd sh dev eth1 qdisc mq 1: dev eth1 root Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961) backlog 3108242b 364p requeues 42961 qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503) rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503 count 0 lastcount 0 ldelay 1.0ms drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384 qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186) rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186 count 0 lastcount 0 ldelay 694us drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873 qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554) rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554 count 0 lastcount 0 ldelay 889us drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780 ... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Glenn Judd <glenn.judd@morganstanley.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:50:20 -04:00
Bert Vermeulen	7c0c826828	net: mdio-gpio: Allow for unspecified bus id When the bus id was supplied via a struct platform_device, the driver wasn't handling -1 to mean an unspecified id of the only instance of this driver, as the platform spec requires. Signed-off-by: Bert Vermeulen <bert@biot.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:42:11 -04:00
Kretschmer, Mathias	fbf33a2802	af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT). This patch fixes an issue where the send(MSG_DONTWAIT) call on a TX_RING is not fully non-blocking in cases where the device's sndBuf is full. We pass nonblock=true to sock_alloc_send_skb() and return any possibly occuring error code (most likely EGAIN) to the caller. As the fast-path stays as it is, we keep the unlikely() around skb == NULL. Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:40:08 -04:00
Michal Schmidt	cd9c399777	bnx2x: limit fw delay in kdump to 5s after boot Commit `12a8541d5c` "bnx2x: Delay during kdump load" added a 5 seconds delay to bnx2x's probe function in the kdump case to let the firmware realize the old driver is gone. The problem with the delay is that it is per-device, so if you have several bnx2x NICs in NPAR mode, the delays can accumulate to minutes. Fix it by adjusting the delay so that we do not wait more than necessary, i.e. no more delaying after 5 seconds of kernel boot time. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:23:22 -04:00
Nicolas Schichan	0b59d8806a	ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits. The ARM JIT code emits "ldr rX, [pc, #offset]" to access the literal pool. #offset maximum value is 4095 and if the generated code is too large, the #offset value can overflow and not point to the expected slot in the literal pool. Additionally, when overflow occurs, bits of the overflow can end up changing the destination register of the ldr instruction. Fix that by detecting the overflow in imm_offset() and setting a flag that is checked for each BPF instructions converted in build_body(). As of now it can only be detected in the second pass. As a result the second build_body() call can now fail, so add the corresponding cleanup code in that case. Using multiple literal pools in the JITed code is going to require lots of intrusive changes to the JIT code (which would better be done as a feature instead of fix), just delegating to the kernel BPF interpreter in that case is a more straight forward, minimal fix and easy to backport. Fixes: `ddecdfcea0` ("ARM: 7259/3: net: JIT compiler for packet filters") Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:21:49 -04:00
Nicolas Schichan	19fc99d0c6	ARM: net fix emit_udiv() for BPF_ALU \| BPF_DIV \| BPF_K intruction. In that case, emit_udiv() will be called with rn == ARM_R0 (r_scratch) and loading rm first into ARM_R0 will result in jit_udiv() function being called the same dividend and divisor. Fix that by loading rn first into ARM_R1 and then rm into ARM_R0. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Cc: <stable@vger.kernel.org> # v3.13+ Fixes: `aee636c480` (bpf: do not use reciprocal divide) Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 19:20:32 -04:00
Linus Torvalds	030bbdbf4c	Linux 4.1-rc3	2015-05-10 15:12:29 -07:00
Linus Torvalds	01d07351f2	Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "I really need to get back to sending these on my Friday, instead of my Monday morning, but nothing too amazing in here: a few amdkfd fixes, a few radeon fixes, i915 fixes, one tegra fix and one core fix" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm: Zero out invalid vblank timestamp in drm_update_vblank_count. drm/tegra: Don't use vblank_disable_immediate on incapable driver. drm/radeon: stop trying to suspend UVD sessions drm/radeon: more strictly validate the UVD codec drm/radeon: make UVD handle checking more strict drm/radeon: make VCE handle check more strict drm/radeon: fix userptr lockup drm/radeon: fix userptr BO unpin bug v3 drm/amdkfd: Initialize sdma vm when creating sdma queue drm/amdkfd: Don't report local memory size drm/amdkfd: allow unregister process with queues drm/i915: Drop PIPE-A quirk for 945GSE HP Mini drm/i915: Sink rate read should be saved in deca-kHz drm/i915/dp: there is no audio on port A drm/i915: Add missing MacBook Pro models with dual channel LVDS drm/i915: Assume dual channel LVDS if pixel clock necessitates it drm/radeon: don't setup audio on asics that don't support it drm/radeon: disable semaphores for UVD V1 (v2)	2015-05-10 14:58:53 -07:00
Varka Bhadram	cf9d0dcc5a	ethernet: qualcomm: use spi instead of spi_device All spi based drivers have an instance of struct spi_device as spi. This patch renames spi_device to spi to synchronize with all the drivers. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-10 17:52:54 -04:00
Dave Airlie	332545b301	Merge tag 'drm-intel-fixes-2015-05-08' of git://anongit.freedesktop.org/drm-intel into drm-fixes misc i915 fixes. * tag 'drm-intel-fixes-2015-05-08' of git://anongit.freedesktop.org/drm-intel: drm/i915: Drop PIPE-A quirk for 945GSE HP Mini drm/i915: Sink rate read should be saved in deca-kHz drm/i915/dp: there is no audio on port A drm/i915: Add missing MacBook Pro models with dual channel LVDS drm/i915: Assume dual channel LVDS if pixel clock necessitates it	2015-05-11 06:06:22 +10:00
Mario Kleiner	fdb68e09bb	drm: Zero out invalid vblank timestamp in drm_update_vblank_count. Since commit `844b03f277` we make sure that after vblank irq off, we return the last valid (vblank count, vblank timestamp) pair to clients, e.g., during modesets, which is good. An overlooked side effect of that commit for kms drivers without support for precise vblank timestamping is that at vblank irq enable, when we update the vblank counter from the hw counter, we can't update the corresponding vblank timestamp, so now we have a totally mismatched timestamp for the new count to confuse clients. Restore old client visible behaviour from before Linux 3.17, but zero out the timestamp at vblank counter update (instead of disable as in original implementation) if we can't generate a meaningful timestamp immediately for the new vblank counter. This will fix this regression, so callers know they need to retry again later if they need a valid timestamp, but at the same time preserves the improvements made in the commit mentioned above. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: <stable@vger.kernel.org> #v3.17+ Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-05-11 06:02:38 +10:00
Linus Torvalds	41f2a93cc6	Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm Pull ARM fixes from Russell King: "A set of ARM fixes: - fix an off-by-one error in the iommu DMA ops, which caused errors with a 4GiB size. - remove comments mentioning the non-existent CONFIG_CPU_ARM1020_CPU_IDLE macro. - remove useless CONFIG_CPU_ICACHE_STREAMING_DISABLE blocks, where this symbol never appeared in any Kconfig. - fix Feroceon code to cope with a previous change correctly (it incorrectly left an additional word in an assembly structure definition) - avoid a misleading IRQ affinity warning in the ARM PMU code for IRQs which are already affine to their CPUs. - fix the node name printed in the IRQ affinity warning" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8352/1: perf: Fix the pmu node name in warning message ARM: 8351/1: perf: don't warn about missing interrupt-affinity property for PPIs ARM: 8350/1: proc-feroceon: Fix feroceon_proc_info macro ARM: 8349/1: arch/arm/mm/proc-arm925.S: remove dead #ifdef block ARM: 8348/1: remove comments on CPU_ARM1020_CPU_IDLE ARM: 8347/1: dma-mapping: fix off-by-one check in arm_setup_iommu_dma_ops	2015-05-10 11:16:48 -07:00
Linus Torvalds	8425ac7a0d	Samsung fixes for v4.1 - fixes commit `ea08de16eb` ("ARM: dts: Add DISP1 power domain for exynos5420") which causes 'unhandled fault: imprecise external abort' error when PD turned off. : make DP a consumer of DISP1 power domain - fixes 's3c-rtc' probe failure on Odriod-X2/U2/U3 boards. : add 'rtc_src' clock to rtc node for source clock of rtc - fixes typo for 'cpu-crit-0' trip point on exynos5420/5440 - fixes S2R failure on exynos5250-snow due to card power of Marvell WiFi driver (suspend/resume) : add keep-power-in-susped to WiFi SDIO node -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJVT23xAAoJEA0Cl+kVi2xqZIcP/0CAA25uvidVXdNVYlJbSvr3 4kesNDG/LGrVnv6xt132iJsXtIfWQxG6nJkhE1x5G9zSgsAjtQWcsCr068Itktsg CG1yl/8z6TB+wS0PhTXaa985V62euTws89YGJau6YCZVSXKKcGDjM5e2RJn80yOL IMipw8x5xTt0GsIKtC2AyewcNq05SSNtwvYe8CPJ9wGFQPy3gZ1t5WqSwW2mMG+K C6mibaN7gs9+sS2ncHglZtHKAR2VxJTNCkq/LOCYlDSftT01GhmhG1fl/tUxEqUD 1bFTTajA21CNnEvWCdkFkMHkEy7lzW8WCX3tAwDHGON/NdWERV4FSaLTqR0o1ekO vLeUSvgtRntBtUY3ojvyfoYq4vrdQF1uoL2r932iO9FILUBpwRYAyG152VFJyZRx Hx50yCgyljG3X8xUp5VgiuNwDCgatiFBCeb3YT0qrB9YbnLXqqAUAfMSng8a15dc rbD02YsYvYcJPf7RDnS9QQV+ZSSmZIkY7JmxkJ/UJ0SA7dAJBtKrXQyliLVlExHu Cz0ye5NHjC+jxwPU/OEFRSZi8bKJXe/q6bAXDRA0vkZWd0G6C+wOq8bnzSWkRM+D +/uzxajdDbfs7mr2mPFyc3H22MiwWSOFIRVsCXVKqTN0yVvlaLvHtolUayAD3RrR oo25jYh9CYGZhxd+7TVb =YBxq -----END PGP SIGNATURE----- Merge tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung Pull samsung fixes from Kukjin Kim: "Here is Samsung fixes for v4.1. Since I've missed to send this via arm-soc tree before v4.1-rc3, so I'm sending this to you directly - fix commit `ea08de16eb` ("ARM: dts: Add DISP1 power domain for exynos5420") which causes 'unhandled fault: imprecise external abort' error when PD turned off. ("make DP a consumer of DISP1 power domain") - fix 's3c-rtc' probe failure on Odriod-X2/U2/U3 boards ("add 'rtc_src' clock to rtc node for source clock of rtc") - fix typo for 'cpu-crit-0' trip point on exynos5420/5440 - fix S2R failure on exynos5250-snow due to card power of Marvell WiFi driver (suspend/resume) ("add keep-power-in-susped to WiFi SDIO node")" * tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: dts: Add keep-power-in-suspend to WiFi SDIO node for exynos5250-snow ARM: dts: Fix typo in trip point temperature for exynos5420/5440 ARM: dts: add 'rtc_src' clock to rtc node for exynos4412-odroid boards ARM: dts: Make DP a consumer of DISP1 power domain on Exynos5420	2015-05-10 11:13:19 -07:00
Dan Williams	dbfe8ef559	ahci: avoton port-disable reset-quirk Avoton AHCI occasionally sees drive probe timeouts at driver load time. When this happens SCR_STATUS indicates device detected, but no D2H FIS reception. Reset the internal link state machines by bouncing port-enable in the PCS register when this occurs. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2015-05-10 11:39:16 -04:00
Tom Herbert	78f5b89919	mpls: Change reserved label names to be consistent with netbsd Since these are now visible to userspace it is nice to be consistent with BSD (sys/netmpls/mpls.h in netBSD). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:29:50 -04:00
David S. Miller	3e3b34685a	Merge branch 'pktgen-next' Jesper Dangaard Brouer says: ==================== The following series introduce some pktgen changes Patch01: Cleanup my own work when I introduced NO_TIMESTAMP. Patch02: Took over patch from Alexei, and addressed my own concerns, as Alexie is too busy with other work, and this will provide an easy tool for measuring ingress path performance, which is a hot topic ATM. Changes were primarily user interface related. Introduced a separate "xmit_mode" setting, instead of stealing one of the dev flags like Alexei did. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:26:06 -04:00
Alexei Starovoitov	62f64aed62	pktgen: introduce xmit_mode '<start_xmit\|netif_receive>' Introduce xmit_mode 'netif_receive' for pktgen which generates the packets using familiar pktgen commands, but feeds them into netif_receive_skb() instead of ndo_start_xmit(). Default mode is called 'start_xmit'. It is designed to test netif_receive_skb and ingress qdisc performace only. Make sure to understand how it works before using it for other rx benchmarking. Sample script 'pktgen.sh': \#!/bin/bash function pgset() { local result echo $1 > $PGDEV result=`cat $PGDEV \| fgrep "Result: OK:"` if [ "$result" = "" ]; then cat $PGDEV \| fgrep Result: fi } [ -z "$1" ] && echo "Usage: $0 DEV" && exit 1 ETH=$1 PGDEV=/proc/net/pktgen/kpktgend_0 pgset "rem_device_all" pgset "add_device $ETH" PGDEV=/proc/net/pktgen/$ETH pgset "xmit_mode netif_receive" pgset "pkt_size 60" pgset "dst 198.18.0.1" pgset "dst_mac 90:e2:ba:ff:ff:ff" pgset "count 10000000" pgset "burst 32" PGDEV=/proc/net/pktgen/pgctrl echo "Running... ctrl^C to stop" pgset "start" echo "Done" cat /proc/net/pktgen/$ETH Usage: $ sudo ./pktgen.sh eth2 ... Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags) 43033682pps 20656Mb/sec (20656167360bps) errors: 10000000 Raw netif_receive_skb speed should be ~43 million packet per second on 3.7Ghz x86 and 'perf report' should look like: 37.69% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core 25.81% kpktgend_0 [kernel.vmlinux] [k] kfree_skb 7.22% kpktgend_0 [kernel.vmlinux] [k] ip_rcv 5.68% kpktgend_0 [pktgen] [k] pktgen_thread_worker If fib_table_lookup is seen on top, it means skb was processed by the stack. To benchmark netif_receive_skb only make sure that 'dst_mac' of your pktgen script is different from receiving device mac and it will be dropped by ip_rcv Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:26:06 -04:00
Jesper Dangaard Brouer	f1f00d8ff6	pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags, with a negation of the flag like !NO_TIMESTAMP. Also document the option flag NO_TIMESTAMP. Fixes: `afb84b6261` ("pktgen: add flag NO_TIMESTAMP to disable timestamping") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:26:06 -04:00
David S. Miller	7c0004d396	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-05-07 This series contains updates to igb only. Toshiaki provides two fixes for igb, first fixes an issue when changing the number of rings by ethtool which causes oops because of uninitialized pointers. The second fix resolves a typo where tx_ring was used instead of the desired rx_ring. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:23:59 -04:00
David S. Miller	4d95b72f61	Merge branch 'netns-scalability' Nicolas Dichtel says: ==================== netns: ease netlink use with a lot of netns This idea was informally discussed in Ottawa / netdev0.1. The goal is to ease the use/scalability of netns, from a userland point of view. Today, users need to open one netlink socket per family and per netns. Thus, when the number of netns inscreases (for example 5K or more), the number of sockets needed to manage them grows a lot. The goal of this series is to be able to monitor netlink events, for a specified family, for a set of netns, with only one netlink socket. For this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this socket will receive netlink notifications from all netns that have a nsid assigned into the netns where the socket has been opened. The nsid is sent to userland via an anscillary data. Here is an example with a patched iproute2. vxlan10 is created in the current netns (netns0, nsid 0) and then moved to another netns (netns1, nsid 1): $ ip netns exec netns0 ip monitor all-nsid label [nsid 0][NSID]nsid 1 (iproute2 netns name: netns1) [nsid 0][NEIGH]??? lladdr 00:00:00:00:00:00 REACHABLE,PERMANENT [nsid 0][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff [nsid 0][LINK]Deleted 5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff [nsid 1][NSID]nsid 0 (iproute2 netns name: netns0) [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0 [nsid 1][ADDR]5: vxlan10 inet 192.168.0.249/24 brd 192.168.0.255 scope global vxlan10 valid_lft forever preferred_lft forever [nsid 1][ROUTE]local 192.168.0.249 dev vxlan10 table local proto kernel scope host src 192.168.0.249 [nsid 1][ROUTE]ff00::/8 dev vxlan10 table local metric 256 pref medium [nsid 1][ROUTE]2001:123::/64 dev vxlan10 proto kernel metric 256 pref medium [nsid 1][LINK]5: vxlan10@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether 92:33:17:e6:e7:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0 [nsid 1][ROUTE]broadcast 192.168.0.255 dev vxlan10 table local proto kernel scope link src 192.168.0.249 [nsid 1][ROUTE]192.168.0.0/24 dev vxlan10 proto kernel scope link src 192.168.0.249 [nsid 1][ROUTE]broadcast 192.168.0.0 dev vxlan10 table local proto kernel scope link src 192.168.0.249 [nsid 1][ROUTE]fe80::/64 dev vxlan10 proto kernel metric 256 pref medium ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:31 -04:00
Nicolas Dichtel	59324cf35a	netlink: allow to listen "all" netns More accurately, listen all netns that have a nsid assigned into the netns where the netlink socket is opened. For this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this socket will receive netlink notifications from all netns that have a nsid assigned into the netns where the socket has been opened. The nsid is sent to userland via an anscillary data. With this patch, a daemon needs only one socket to listen many netns. This is useful when the number of netns is high. Because 0 is a valid value for a nsid, the field nsid_is_set indicates if the field nsid is valid or not. skb->cb is initialized to 0 on skb allocation, thus we are sure that we will never send a nsid 0 by error to the userland. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:31 -04:00
Nicolas Dichtel	cc3a572fe6	netlink: rename private flags and states These flags and states have the same prefix (NETLINK_) that netlink socket options. To avoid confusion and to be able to name a flag like a socket option, let's use an other prefix: NETLINK_[S\|F]_. Note: a comment has been fixed, it was talking about NETLINK_RECV_NO_ENOBUFS socket option instead of NETLINK_NO_ENOBUFS. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:31 -04:00
Nicolas Dichtel	95f38411df	netns: use a spin_lock to protect nsid management Before this patch, nsid were protected by the rtnl lock. The goal of this patch is to be able to find a nsid without needing to hold the rtnl lock. The next patch will introduce a netlink socket option to listen to all netns that have a nsid assigned into the netns where the socket is opened. Thus, it's important to call rtnl_net_notifyid() outside the spinlock, to avoid a recursive lock (nsid are notified via rtnl). This was the main reason of the previous patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:31 -04:00
Nicolas Dichtel	3138dbf881	netns: notify new nsid outside __peernet2id() There is no functional change with this patch. It will ease the refactoring of the locking system that protects nsids and the support of the netlink socket option NETLINK_LISTEN_ALL_NSID. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:31 -04:00
Nicolas Dichtel	7a0877d4b4	netns: rename peernet2id() to peernet2id_alloc() In a following commit, a new function will be introduced to only lookup for a nsid (no allocation if the nsid doesn't exist). To avoid confusion, the existing function is renamed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:30 -04:00
Nicolas Dichtel	cab3c8ec8d	netns: always provide the id to rtnl_net_fill() The goal of this commit is to prepare the rework of the locking of nsnid protection. After this patch, rtnl_net_notifyid() will not call anymore __peernet2id(), ie no idr_* operation into this function. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:30 -04:00
Nicolas Dichtel	109582af18	netns: returns always an id in __peernet2id() All callers of this function expect a nsid, not an error. Thus, returns NETNSA_NSID_NOT_ASSIGNED in case of error so that callers don't have to convert the error to NETNSA_NSID_NOT_ASSIGNED. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:15:30 -04:00
David S. Miller	43996fdd9b	linux-can-next-for-4.2-20150506 -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJVSoLGAAoJECte4hHFiupU6h0P/inQp/gltr5gqJ5iCscB5SMc wft/Fy9zIcZtxUSCdVIv1GQD1I4UbS9KLunXb/fXdeGrIKAZ6xP4X7TqherY58V6 8b2MdxNG5cOlcldDq7ti9uMiR/ro4M5IVLnUHLOVZNW+eZ0jcXEIxwgrgeVs8ecq SSQPw3CfQQegYP492Iwq2vcljUpB5iZq4py1JVUQb+yACrPQTq/PBEXQ3ZFcn5nF plRaJgZYHs2cCHBQFW+g4xYmGkx3LlolQ4TVvtOQA1eUsn3v4xw7/KXb0HZEdvRZ oknrgA8hgS2U61UuEEW+9A4R8nwOiuEtlMo8BltyctYlNzlo+zdftSBJXNlewLxu sGSy1Zd4+mmYU5dVc3c+mZnDBQ4Rw7fuGZLIhUwp/hZY+wEiGMbFQ3+dc9kyMlP+ rWhMr3vRihqBi2OrWrf8KraEuRmWwQzThT78PMtFFkKEz3HigFWKy61OUDHSjCsy a31BbaoKTMwZnIrnS30C5WeNUcY+CTxmibh/wVkeGsjI0knKcSgExkNXZp0dj12v sNspIVCs0MbCLDw72PPwjJ8SjDe4j7Mt9j1kSKCU8AaWh6gLMz2xmKvMIaICanJo 2nDBYKxoLGRkJoHKxMW8G4g/6KIAGpQGQ17d6FArGc2GpxmxUUbwQxvQZDSGmnwn W+yTmBO2/89+ogIFOK8E =dcJu -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-4.2-20150506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-05-06 this is a pull request of a seven patches for net-next/master. Andreas Gröger contributes two patches for the janz-ican3 driver. In the first patch, the documentation for already existing sysfs entries is added, the second patch adds support for another module/firmware variant. A patch by Shawn Landden makes the padding in the struct can_frame explicit. The next 4 patches target the flexcan driver, the first one is by David Jander adding some documentation, the reaming three by me add more documentation and two small code cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 22:12:17 -04:00
Linus Torvalds	8f59ae0643	ARM: SoC fixes for 4.1-rc2 A few patches have come up since the merge window. The largest one is a rewrite of the PXA lubbock/mainstone IRQ handling. This was already broken in 2011 by a change to the GPIO code and only noticed now. The other changes contained here are: MAINTAINERS file updates: - Ray Jui and Scott Branden are now co-maintainers for some of the mach-bcm chips, while Christian Daudt and Marc Carino have stepped down. - Andrew Victor is no longer maintaining at91. Instead, Alexandre Belloni now becomes an official maintainer, after having done a bulk of the work for a while. - Baruch Siach, who added the mach-digicolor platform in 4.1 is now listed as maintainer - The git URL for mach-socfpga has changed Bug fixes: - Three bug fixes for new rockchip rk3288 code - A regression fix to make SD card support work on certain ux500 boards - multiple smaller dts fixes for imx, omap, mvebu, and shmobile - a regression ﬁix for omap3 power consumption - a fix for regression in the ARM CCI bus driver Configuration changes: - more imx platforms are now enabled in multi_v7_defconfig -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVU5vsWCrR//JCVInAQK5Zw/9GMFUQcLfwqfUitabRvA+FjbEYyeKHW2Y BCm5OJSOH2MB4/suvmYggIESncHGCzw2iZZwFJ3seuO7vyFAg2pLOmpm2VBvIfqP JgokZOhfHDTujogpakNn8ByKszxb/XkuzuDF3AS2BX4H25Rhp7jX6EgrubK9zDcH wYsaAXSwc8uUe2885cOK8qYoBhkPRsMqUa4HHp4Z51QNwa+koV9xMFlNyy/j8N8F xKjfAW5LCeEdviOjFlYb0Bb3JIerIKmosjHx55g6XCEB1TUKAN6TmqXRUU89dj/0 MQoD5pXCI1RaV8qjpAFqwcFmfnsAzRob9n1WWEgqwCiKdIDX7E98WeJNGrQ6SlSA xePcS+1MMNkTi20CdTSHPwVg5CFz7VG9YTMDrvGtz9tuGtCBQHGhA5o8eQ6eXDYz cEMc781Ax4M4Uc3OKb+9M9OAnB+RyJmizjzgR7cQHaQfzI7hVgRAxywu9Z0j4HKJ 4OmtcOAfwZyEKms9AzvbozKWROVY+T9rC7Aup6SlOSaxupqCdvrMb2XffrixmuVz B9vox64BT0Ot0/jqTsZs5YR3UDlX7v1UTx8+BIta7zDvPXIFUSU+dNcS4BDsOkDa 8Knhi4BpydNaKCxMb8/Q7BjzxNsaMcCEUvNfrjOH7LMHmQhygiKmOSOj3fxEM9S5 xMSWzw3U0yU= =lVQn -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "A few patches have come up since the merge window. The largest one is a rewrite of the PXA lubbock/mainstone IRQ handling. This was already broken in 2011 by a change to the GPIO code and only noticed now. The other changes contained here are: MAINTAINERS file updates: - Ray Jui and Scott Branden are now co-maintainers for some of the mach-bcm chips, while Christian Daudt and Marc Carino have stepped down. - Andrew Victor is no longer maintaining at91. Instead, Alexandre Belloni now becomes an official maintainer, after having done a bulk of the work for a while. - Baruch Siach, who added the mach-digicolor platform in 4.1 is now listed as maintainer - The git URL for mach-socfpga has changed Bug fixes: - Three bug fixes for new rockchip rk3288 code - A regression fix to make SD card support work on certain ux500 boards - multiple smaller dts fixes for imx, omap, mvebu, and shmobile - a regression ﬁix for omap3 power consumption - a fix for regression in the ARM CCI bus driver Configuration changes: - more imx platforms are now enabled in multi_v7_defconfig" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (39 commits) MAINTAINERS: add Conexant Digicolor machines entry MAINTAINERS: socfpga: update the git repo for SoCFPGA ARM: multi_v7_defconfig: Select more FSL SoCs MAINTAINERS: replace an AT91 maintainer drivers: CCI: fix used_mask init in validate_group() bus: omap_l3_noc: Fix master id address decoding for OMAP5 bus: omap_l3_noc: Fix offset for DRA7 CLK1_HOST_CLK1_2 instance ARM: dts: dra7: Fix efuse register size for ABB ARM: dts: am57xx-beagle-x15: Switch GPIO fan number ARM: dts: am57xx-beagle-x15: Switch UART mux pins ARM: dts: am437x-sk: reduce col-scan-delay-us ARM: dts: am437x-sk: fix for new newhaven display module revision ARM: dts: am57xx-beagle-x15: Fix RTC aliases ARM: dts: am57xx-beagle-x15: Fix IRQ type for mcp7941x ARM: dts: omap3: Add #iommu-cells to isp and iva iommu ARM: omap2plus_defconfig: Enable EXTCON_USB_GPIO ARM: dts: OMAP3-N900: Add microphone bias voltages ARM: OMAP2+: Fix omap off idle power consumption creeping up MAINTAINERS: Update brcmstb entry MAINTAINERS: Remove Christian Daudt for mach-bcm ...	2015-05-09 16:13:38 -07:00
Linus Torvalds	51dfcb076d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user-namespace fix from Eric Biederman: "Eric Windish recently reported a really bug that allows mounting fresh copies of proc and sysfs when it really should not be allowed. The code attempted to verify that proc and sysfs were fully visible but there is a test missing to ensure that the root of the filesystem is visible. Doh! The following patch fixes that. This fixes a containment issue that the docker folks are seeing" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: Fix fs_fully_visible to verify the root directory is visible	2015-05-09 16:07:14 -07:00
Linus Torvalds	9d88f22a81	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Two patches from the irq departement: - a simple fix to make dummy_irq_chip usable for wakeup scenarios - removal of the gic arch_extn hackery. Now that all users are converted we really want to get rid of the interface so people wont come up with new use cases" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: gic: Drop support for gic_arch_extn genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip	2015-05-09 14:59:05 -07:00
Linus Torvalds	95f3b1f4b1	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A simple fix to actually shut down a detached device instead of keeping it active" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: Shutdown detached clockevent device	2015-05-09 14:57:49 -07:00
Harini Katakam	a5898ea09a	net: macb: Add change_mtu callback with jumbo support Add macb_change_mtu callback; if jumbo frame support is present allow mtu size changes upto (jumbo max length allowed - headers). Signed-off-by: Harini Katakam <harinik@xilinx.com> Reviewed-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:41:54 -04:00
Harini Katakam	98b5a0f4a2	net: macb: Add support for jumbo frames Enable jumbo frame support for Zynq Ultrascale+ MPSoC. Update the NWCFG register and descriptor length masks accordingly. Jumbo max length register should be set according to support in SoC; it is set to 10240 for Zynq Ultrascale+ MPSoC. Signed-off-by: Harini Katakam <harinik@xilinx.com> Reviewed-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:41:54 -04:00
Harini Katakam	7b61f9c132	net: macb: Add compatible string for Zynq Ultrascale+ MPSoC Add compatible string and config structure for Zynq Ultrascale+ MPSoC Signed-off-by: Harini Katakam <harinik@xilinx.com> Reviewed-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:41:53 -04:00
Harini Katakam	988d6f07fc	devicetree: Add compatible string for Zynq Ultrascale+ MPSoC Add "cdns,zynqmp-gem" to be used for Zynq Ultrascale+ MPSoC. Signed-off-by: Harini Katakam <harinik@xilinx.com> Reviewed-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:41:53 -04:00
Jason Baron	790ba4566c	tcp: set SOCK_NOSPACE under memory pressure Under tcp memory pressure, calling epoll_wait() in edge triggered mode after -EAGAIN, can result in an indefinite hang in epoll_wait(), even when there is sufficient memory available to continue making progress. The problem is that when __sk_mem_schedule() returns 0 under memory pressure, we do not set the SOCK_NOSPACE flag in the tcp write paths (tcp_sendmsg() or do_tcp_sendpages()). Then, since SOCK_NOSPACE is used to trigger wakeups when incoming acks create sufficient new space in the write queue, all outstanding packets are acked, but we never wake up with the the EPOLLOUT that we are expecting from epoll_wait(). This issue is currently limited to epoll() when used in edge trigger mode, since 'tcp_poll()', does in fact currently set SOCK_NOSPACE. This is sufficient for poll()/select() and epoll() in level trigger mode. However, in edge trigger mode, epoll() is relying on the write path to set SOCK_NOSPACE. EPOLL(7) says that in edge-trigger mode we can only call epoll_wait() after read/write return -EAGAIN. Thus, in the case of the socket write, we are relying on the fact that tcp_sendmsg()/network write paths are going to issue a wakeup for us at some point in the future when we get -EAGAIN. Normally, epoll() edge trigger works fine when we've exceeded the sk->sndbuf because in that case we do set SOCK_NOSPACE. However, when we return -EAGAIN from the write path b/c we are over the tcp memory limits and not b/c we are over the sndbuf, we are never going to get another wakeup. I can reproduce this issue, using SO_SNDBUF, since __sk_mem_schedule() will return 0, or failure more readily with SO_SNDBUF: 1) create socket and set SO_SNDBUF to N 2) add socket as edge trigger 3) write to socket and block in epoll on -EAGAIN 4) cause tcp mem pressure via: echo "<small val>" > net.ipv4.tcp_mem The fix here is simply to set SOCK_NOSPACE in sk_stream_wait_memory() when the socket is non-blocking. Note that SOCK_NOSPACE, in addition to waking up outstanding waiters is also used to expand the size of the sk->sndbuf. However, we will not expand it by setting it in this case because tcp_should_expand_sndbuf(), ensures that no expansion occurs when we are under tcp memory pressure. Note that we could still hang if sk->sk_wmem_queue is 0, when we get the -EAGAIN. In this case the SOCK_NOSPACE bit will not help, since we are waiting for and event that will never happen. I believe that this case is harder to hit (and did not hit in my testing), in that over the tcp 'soft' memory limits, we continue to guarantee a minimum write buffer size. Perhaps, we could return -ENOSPC in this case, or maybe we simply issue a wakeup in this case, such that we keep retrying the write. Note that this case is not specific to epoll() ET, but rather would affect blocking sockets as well. So I view this patch as bringing epoll() edge-trigger into sync with the current poll()/select()/epoll() level trigger and blocking sockets behavior. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:38:36 -04:00
Claudiu Manoil	3d23a05c75	gianfar: Enable changing mac addr when if up Use device flag IFF_LIVE_ADDR_CHANGE to signal that the device supports changing the hardware address when the device is running. This allows eth_mac_addr() to change the mac address also when the network device's interface is open. This capability is required by certain applications, like bonding mode 6 (Adaptive Load Balancing). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:37:46 -04:00
Claudiu Manoil	bc60228087	gianfar: Move TxFIFO underrun handling to reset path Handle TxFIFO underrun exceptions outside the fast path. A controller reset is more reliable in this exceptional case, as opposed to re-enabling on-the-fly the Tx DMA. As the controller reset is handled outside the fast path by the reset_gfar() workqueue handler, the locking scheme on the Tx path is significantly simplified. Because the Tx processing (xmit queues and tx napi) is disabled during controller reset, tstat access from xmit does not require locking. So the scope of the txlock on the processing path is now reduced to num_txbdfree, which is shared only between process context (xmit) and softirq (clean_tx_ring). As a result, the txlock must not guard against interrupt context, and the spin_lock_irqsave() from xmit can be replaced by spin_lock_bh(). Likewise, the locking has been downgraded for clean_tx_ring(). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:37:46 -04:00
David S. Miller	39d726b76c	Merge branch 'bpf_seccomp' Daniel Borkmann says: ==================== BPF updates This set gets rid of BPF special handling in seccomp filter preparation and provides generic infrastructure from BPF side, which eventually also allows for classic BPF JITs to add support for seccomp filters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:35:05 -04:00
Daniel Borkmann	ac67eb2c53	seccomp, filter: add and use bpf_prog_create_from_user from seccomp Seccomp has always been a special candidate when it comes to preparation of its filters in seccomp_prepare_filter(). Due to the extra checks and filter rewrite it partially duplicates code and has BPF internals exposed. This patch adds a generic API inside the BPF code code that seccomp can use and thus keep it's filter preparation code minimal and better maintainable. The other side-effect is that now classic JITs can add seccomp support as well by only providing a BPF_LDX \| BPF_W \| BPF_ABS translation. Tested with seccomp and BPF test suites. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-09 17:35:05 -04:00

... 2 3 4 5 6 ...

519690 Commits All Branches Search

519690 Commits

All Branches