linux_old1

Commit Graph

Author	SHA1	Message	Date
Ondrej Zary	eb522bb4e0	tlan: Enable activity LED on Olicom OC-2325 and OC-2326 Olicom OC-2325 and OC-2326 ethernet cards have an activity LED but it does not work with tlan driver as it's not enabled. Enable it. Tested with OC-2326. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 17:06:51 -07:00
Himangi Saraogi	b98fe24ca7	batman-adv: Use kasprintf kasprintf combines kmalloc and sprintf, and takes care of the size calculation itself. The semantic patch that makes this change is as follows: // <smpl> @@ expression a,flag; expression list args; statement S; @@ a = - \(kmalloc\\|kzalloc\)(...,flag) + kasprintf(flag,args) <... when != a if (a == NULL \|\| ...) S ...> - sprintf(a,args); // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 17:00:31 -07:00
David S. Miller	0a7fdbde69	Merge branch 'ptp-vlan' Stefan Sørensen says: ==================== Add ptp vlan support This patch series adds functionality for running ptp/ieee1588 over vlan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 16:57:32 -07:00
Stefan Sørensen	a6111d3c93	vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device This allows applications to enable hardware timestamping without being aware of it being a vlan device and figuring out the real device. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 16:57:26 -07:00
Stefan Sørensen	ae5c6c6d7b	ptp: Classify ptp over ip over vlan packets This extends the ptp bpf to also match ptp over ip over vlan packets. The ptp classes are changed to orthogonal bitfields representing version, transport and vlan values to simplify matching. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 16:57:18 -07:00
Stefan Sørensen	b9c701edc7	net: Simplify ptp class checks Replace two switch statements enumerating all valid ptp classes with an if statement matching for not PTP_CLASS_NONE. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-07 16:57:09 -07:00
David S. Miller	239960d664	Merge branch 'sctp' Daniel Borkmann says: ==================== Misc SCTP updates Daniel Borkmann (2): net: sctp: improve timer slack calculation for transport HBs net: sctp: only warn in proc_sctp_do_alpha_beta if write ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com>	2014-07-02 18:44:14 -07:00
Daniel Borkmann	eaea2da728	net: sctp: only warn in proc_sctp_do_alpha_beta if write Only warn if the value is written to alpha or beta. We don't care emitting a one-time warning when only reading it. Reported-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:44:07 -07:00
Daniel Borkmann	8f61059a96	net: sctp: improve timer slack calculation for transport HBs RFC4960, section 8.3 says: On an idle destination address that is allowed to heartbeat, it is recommended that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value, and exponential backoff of the RTO if the previous HEARTBEAT is unanswered. Currently, we calculate jitter via sctp_jitter() function first, and then add its result to the current RTO for the new timeout: TMO = RTO + (RAND() % RTO) - (RTO / 2) `------------------------^-=> sctp_jitter() Instead, we can just simplify all this by directly calculating: TMO = (RTO / 2) + (RAND() % RTO) With the help of prandom_u32_max(), we don't need to open code our own global PRNG, but can instead just make use of the per CPU implementation of prandom with better quality numbers. Also, we can now spare us the conditional for divide by zero check since no div or mod operation needs to be used. Note that prandom_u32_max() won't emit the same result as a mod operation, but we really don't care here as we only want to have a random number scaled into RTO interval. Note, exponential RTO backoff is handeled elsewhere, namely in sctp_do_8_2_transport_strike(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:44:07 -07:00
David S. Miller	eb1ac820c6	Merge branch 'be2net' Sathya Perla says: ==================== be2net: patch set v2 change: merged 2 lines into one in patch 4 Patch 1 refactors be_cmd_get_profile_config() routine to reduce code duplication by using the be_cmd_notify_wait() routine, instead of using a separate variant of the code for MBOX and MCCQ. Patch 2 introduces the required FW-cmd code in the PF to query RSS support on a VF. This is in preparation for patch 3. Patch 3 adds support for the PF driver to re-configure the resource distribution in FW based on the number of VFs enabled by the user. When the user is not interested in enabling VFs, all resources of a port are set-aside for the PF. If less than maximum number of VFs are enabled, then each VF gets a better share of the resources and can now enable RSS (if the interface supports it.) Patch 4 is a minor fix to re-enable HW vlan filtering as soon as the number of vlans programmed is within the HW limit. Please consider applying to net-next tree. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:41:05 -07:00
Kalesh AP	9d4dfe4ae3	be2net: re-enable vlan filtering mode asap While adding vlans, when the HW limit of vlan filters is reached, the driver enables vlan promiscuous mode. Similarily, while removing vlans, the driver must re-enable HW filtering as soon as the number of vlan filters is within the HW limit. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:40:56 -07:00
Vasundhara Volam	bec84e6b21	be2net: create optimal number of queues on SR-IOV config If SR-IOV is enabled in the adapter, the FW distributes queue resources evenly across the PF and it's VFs. If the user is not interested in enabling VFs, the queues set aside for VFs are wasted. This patch adds support for the PF driver to re-configure the resource distribution in FW based on the number of VFs enabled by the user. This also allows for supporting RSS queues on VFs, when less number of VFs are enabled per PF. When maximum number of VFs are enabled, each VF typically gets only one RXQ. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:40:56 -07:00
Vasundhara Volam	10cccf60fb	be2net: read VF's capabilities from GET_PROFILE_CONFIG cmd The PF driver must query the FW for VF's interface capabilities to know if the VF is RSS capable or not. This patch is in preparation for enabling RSS on VFs on Skyhawk-R. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:40:56 -07:00
Vasundhara Volam	ba48c0c927	be2net: remove be_cmd_get_profile_config_mbox/mccq() variants Fix be_cmd_get_profile_cmd() to use be_cmd_notify_wait() routine, which uses MBOX if MCCQ has not been created. Doing this reduces code duplication; we don't need the _mbox/_mccq() variants anymore. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:40:56 -07:00
Fabian Frederick	bd4578bc84	drivers/net/hyperv/netvsc.c: remove unnecessary null test before kfree Fix checkpatch warning: WARNING: kfree(NULL) is safe this check is probably not required Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: netdev@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 18:22:25 -07:00
Sergei Shtylyov	179d80aff8	sh_eth: remove checks around dev_kfree_skb() calls Since consume_skb() (and hence dev_kfree_skb() macro) checks the passed pointer for NULL, there's no need to check for NULL before invoking dev_kfree_skb(). Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:37:46 -07:00
Prashant Sreedharan	236294774e	MAINTAINERS: Update tg3 maintainer Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:36:39 -07:00
David S. Miller	af7efaff26	Merge branch 'qlcnic-next' Harish Patil says: ==================== qlcnic: Enhance Tx timeout debug data collection. The following set of patches are for enhancing Tx timeout debug collection - Collect a firmware dump on first Tx timeout if netif_msg_tx_err() is set - Log Receive and Status ring info on Tx timeout, in addition to Tx ring info - Log additional Tx ring info if netif_msg_tx_err() is set - Update driver version to 5.3.61 Please apply this series to net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:11:00 -07:00
Harish Patil	28470572a6	qlcnic: Update version to 5.3.61 Signed-off-by: Harish Patil <harish.patil@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:10:29 -07:00
Harish Patil	665d1eca03	qlcnic: Enhance Tx timeout debug data collection. - Collect a firmware dump on first Tx timeout if netif_msg_tx_err() is set - Log Receive and Status ring info on Tx timeout, in addition to Tx ring info - Log additional Tx ring info if netif_msg_tx_err() is set Signed-off-by: Harish Patil <harish.patil@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:10:29 -07:00
Fabian Frederick	fb0d164cc1	net/caif/caif_socket.c: remove unnecessary null test before debugfs_remove_recursive based on checkpatch: "debugfs_remove_recursive(NULL) is safe this check is probably not required" Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:05:29 -07:00
Fabian Frederick	9f16dc2ec7	drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: remove unnecessary null test before debugfs_remove_recursive Fix checkpatch warning: "WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required" Cc: Hariprasad S <hariprasad@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-02 17:04:42 -07:00
Eric Dumazet	9fe516ba3f	inet: move ipv6only in sock_common When an UDP application switches from AF_INET to AF_INET6 sockets, we have a small performance degradation for IPv4 communications because of extra cache line misses to access ipv6only information. This can also be noticed for TCP listeners, as ipv6_only_sock() is also used from __inet_lookup_listener()->compute_score() This is magnified when SO_REUSEPORT is used. Move ipv6only into struct sock_common so that it is available at no extra cost in lookups. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 23:46:21 -07:00
David S. Miller	090cce4263	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-07-01 This series contains updates to i40e, i40evf, igb and ixgbe. Shannon adds the Base Address High and Low to the admin queue structure to simplify the logic in the configuration routines. Also adds code to clear all queues and interrupts to help clean up after a PXE or other early boot activity. Kevin fixes mask assignment value since -1 cannot be used for unsigned integer types. Mitch fixes an issue where in some circumstances the reply from the PF would come back before we were able to properly modify the admin queue pending and required flags. This would mess up the flags and put the driver in an indeterminate state, so fix this by simply setting the flags before sending the request to the admin queue. Also changes the branding string for i40evf to reduce confusion and to match up with our other marketing materials. Kamil adds a new variable defining admin send queue (ASQ) command write back timeout to allow for dynamic modification of this timeout. Anjali fix a bug in the flow director filter replay logic, so that we call a replay after a sideband reset correctly. Jesse adds code to initialize all members of the context descriptor to prevent possible stale data. Christopher fixes i40e to prevent writing to reserved bits, since the queue index is only 0-127. Jacob removes the unneeded header export.h from the i40e PTP code. Fixes ixgbe PTP code where the PPS signal was not correct, as it generates a one half HZ clock signal, it only generates one level change per second. To generate a full clock, we need two level changes per second. Todd provides a fix for igb to bring up link when the PHY has powered up, which was reported by Jeff Westfahl. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 23:09:32 -07:00
Jiri Pirko	763e0ecd72	bonding: allow to add vlans on top of empty bond This limitation maybe had some reason in the past, but now there is not one -> removing this. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:57:43 -07:00
David S. Miller	813f8e29c9	Merge branch 'cxgb4-next' Hariprasad Shenai says: ==================== cxgb4: Fix for PCI passthrough and some Misc. fixes This patch series fixes probe failure in VM when PF is exposed through PCI Passthrough. Adds support to use firmware interface to get BAR0 value. Replace the backdoor mechanism to access the HW memory with PCIe Window method which fixes memory I/O. Also adds device ID of few more adapters for cxgb4 and cxgb4vf driver. The patches series is created against 'net-next' tree. And includes patches on cxgb4, cxgb4vf and iw_cxgb4 driver. Since this patch-series contains mainly cxgb4 related changes, we would like to request this patch series to get merged via David Miller's 'net-next' tree. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:15 -07:00
Hariprasad Shenai	dde3aadf53	cxgb4vf: Adds device ID for few more Chelsio T4 Adapters Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
Hariprasad Shenai	fb1e933d3c	cxgb4: Adds device ID for few more Chelsio T4 Adapters Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
Hariprasad Shenai	fc5ab02096	cxgb4: Replaced the backdoor mechanism to access the HW memory with PCIe Window method Rip out a bunch of redundant PCI-E Memory Window Read/Write routines, collapse the more general purpose routines into a single routine thereby eliminating the need for a large stack frame (and extra data copying) in the outer routine, change everything to use the improved routine t4_memory_rw. Based on origninal work by Casey Leedom <leedom@chelsio.com> and Steve Wise <swise@opengridcomputing.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
Hariprasad Shenai	0abfd1524b	cxgb4: Use FW interface to get BAR0 value Use the firmware interface to get the BAR0 value since we really don't want to use the PCI-E Configuration Space Backdoor access which is owned by the firmware. Set up PCI-E Memory Window registers using the true values programmed into BAR registers. When the PF4 "Master Function" is exported to a Virtual Machine, the values returned by pci_resource_start() will be for the synthetic PCI-E Configuration Space and not the real addresses. But we need to program the PCI-E Memory Window address decoders with the real addresses that we're going to be using in order to have accesses through the Memory Windows work. Based on origninal work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
Hariprasad Shenai	35b1de5579	rdma/cxgb4: Fixes cxgb4 probe failure in VM when PF is exposed through PCI Passthrough Change logic which determines our Physical Function at PCI Probe time. Now we read the PL_WHOAMI register and get the Physical Function. Pass Physical Function to Upper Layer Drivers in lld_info structure in the new field "pf" added to lld_info. This is useful for the cases where the PF, say PF4, is attached to a Virtual Machine via some form of "PCI Pass Through" technology and the PCI Function shows up as PF0 in the VM. Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:56:10 -07:00
David S. Miller	2eb27a16b5	Merge branch 'dp83640-next' Stefan Sørensen says: ==================== dp83640: Increase support perout pins This patch series increases the number of periodic output pins supported on the dp83640 to 7, and allows for reprogramming the calibration pin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:53:01 -07:00
Stefan Sørensen	72df7a7244	ptp: Allow reassigning calibration pin function The ptp pin function programming does not allow calibration pin to change function. This is problematic on hardware that uses the default calibration pin for other purposes. Removing this limitation does not impact calibration if userspace does not reprogram the calibration pin. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:54 -07:00
Stefan Sørensen	e0155950f0	dp83640: Get calibration pin with ptp_find_pin For consistency, use the ptp_find_pin function to get the calibration pin, not gpio_tab. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:54 -07:00
Stefan Sørensen	6f39eb87de	dp83640: Verify calibration pin assignment This constraints the pin assignment to not allow the calibration function to be reassigned and only allow reassigning the calibratin pin if only one phy is connected. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:53 -07:00
Stefan Sørensen	ad01577aeb	dp83640: Increase supported perout pins to 7 This patch increases the number of supported periodic output pins from 1 to 7. The last pin is reserved for sync. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:53 -07:00
Stefan Sørensen	35e872ae63	dp83640: Program pulsewidth2 values of perout triggers 0 and 1 Periodic output triggers 0 and 1 of the dp83640 has a programmable duty-cycle which is controlled by the Pulsewidth2 field of the trigger data register. This field is not documented in the datasheet, but it is described in the "PHYTER Software Development Guide" section 3.1.4.1. Failing to set the field can result in unstable/no trigger output. Add programming of the Pulsewidth2 field, setting it to the same value as the Pulsewidth field for a 50% duty cycle. Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 18:52:53 -07:00
David S. Miller	b6fd8b7fa1	Merge branch 'bnx2x-next' Yuval Mintz says: ==================== bnx2x: Enhancement patch series This patch series introduces the ability to propagate link parameters to VFs as well as control the VF link via hypervisor. In addition, it contains 2 small improvements [one IOV-related and the other improves performance on machines with short cache lines]. Please consider applying these patches to `net-next'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:37 -07:00
Yuval Mintz	ebf457f931	bnx2x: Fail probe of VFs using an old incompatible driver There are linux distributions where the inbox bnx2x driver contains SRIOV support but doesn't contain the changes introduced in `b9871bcf` "bnx2x: VF RSS support - PF side". A VF in a VM running that distribution over a new hypervisor will access incorrect addresses when trying to transmit packets, causing an attention in the hypervisor and making that VF inactive until FLRed. The driver in the VM has to ne upgraded [no real way to overcome this], but due to the HW attention currently arising upgrading the driver in the VM would not suffice [since the VF needs also be FLRed if the previous driver was already loaded]. This patch causes the PF to fail the acquire message from a VF running an old problematic driver; The VF will then gracefully fail it's probe preventing the HW attention [and allow clean upgrade of driver in VM]. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:30 -07:00
Dmitry Kravkov	9927b51469	bnx2x: enlarge minimal alignemnt of data offset This improves the performance of driver on machine with L1_CACHE_SHIFT of at most 32 bytes [HW was planned for 64-byte aligned fastpath data]. Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:29 -07:00
Dmitry Kravkov	6495d15a7c	bnx2x: VF can report link speed Until now VFs were oblvious to the actual configured link parameters. This patch does 2 things: 1. It enables a PF to inform its VF using the bulletin board of the link configured, and allows the VF to present that information. 2. It adds support of `ndo_set_vf_link_state', allowing the hypervisor to set the VF link state. Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:52:29 -07:00
David S. Miller	edd79ca8b3	Merge branch 'pktgen' Jesper Dangaard Brouer says: ==================== Optimizing pktgen for single CPU performance This series focus on optimizing "pktgen" for single CPU performance. V2-series: - Removed some patches - Doc real reason for TX ring buffer filling up NIC tuning for pktgen: http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html General overload setup according to: http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html Hardware: System: CPU E5-2630 NIC: Intel ixgbe/82599 chip Testing done with net-next git tree on top of commit `6623b41944` ("Merge branch 'master' of...jkirsher/net-next") Pktgen script exercising race condition: https://github.com/netoptimizer/network-testing/blob/master/pktgen/unit_test01_race_add_rem_device_loop.sh Tool for measuring LOCK overhead: https://github.com/netoptimizer/network-testing/blob/master/src/overhead_cmpxchg.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:56 -07:00
Jesper Dangaard Brouer	8788370a1d	pktgen: RCU-ify "if_list" to remove lock in next_to_run() The if_lock()/if_unlock() in next_to_run() adds a significant overhead, because its called for every packet in busy loop of pktgen_thread_worker(). (Thomas Graf originally pointed me at this lock problem). Removing these two "LOCK" operations should in theory save us approx 16ns (8ns x 2), as illustrated below we do save 16ns when removing the locks and introducing RCU protection. Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30: (single CPU performance, ixgbe 10Gbit/s, E5-2630) * Prev : 5684009 pps --> 175.93ns (1/568400910^9) RCU-fix: 6272204 pps --> 159.43ns (1/627220410^9) Diff : +588195 pps --> -16.50ns To understand this RCU patch, I describe the pktgen thread model below. In pktgen there is several kernel threads, but there is only one CPU running each kernel thread. Communication with the kernel threads are done through some thread control flags. This allow the thread to change data structures at a know synchronization point, see main thread func pktgen_thread_worker(). Userspace changes are communicated through proc-file writes. There are three types of changes, general control changes "pgctrl" (func:pgctrl_write), thread changes "kpktgend_X" (func:pktgen_thread_write), and interface config changes "etcX@N" (func:pktgen_if_write). Userspace "pgctrl" and "thread" changes are synchronized via the mutex pktgen_thread_lock, thus only a single userspace instance can run. The mutex is taken while the packet generator is running, by pgctrl "start". Thus e.g. "add_device" cannot be invoked when pktgen is running/started. All "pgctrl" and all "thread" changes, except thread "add_device", communicate via the thread control flags. The main problem is the exception "add_device", that modifies threads "if_list" directly. Fortunately "add_device" cannot be invoked while pktgen is running. But there exists a race between "rem_device_all" and "add_device" (which normally don't occur, because "rem_device_all" waits 125ms before returning). Background'ing "rem_device_all" and running "add_device" immediately allow the race to occur. The race affects the threads (list of devices) "if_list". The if_lock is used for protecting this "if_list". Other readers are given lock-free access to the list under RCU read sections. Note, interface config changes (via proc) can occur while pktgen is running, which worries me a bit. I'm assuming proc_remove() takes appropriate locks, to assure no writers exists after proc_remove() finish. I've been running a script exercising the race condition (leading me to fix the proc_remove order), without any issues. The script also exercises concurrent proc writes, while the interface config is getting removed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:23 -07:00
Jesper Dangaard Brouer	baac167b70	pktgen: avoid expensive set_current_state() call in loop Avoid calling set_current_state() inside the busy-loop in pktgen_thread_worker(). In case of pkt_dev->delay, then it is still used/enabled in pktgen_xmit() via the spin() call. The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit is LOCK prefixed. I've measured the asm LOCK operation to take approx 8ns on this E5-2630 CPU. Performance increase corrolate with this measurement. Performance data with CLONE_SKB==100000, rx-usecs=30: (single CPU performance, ixgbe 10Gbit/s, E5-2630) * Prev: 5454050 pps --> 183.35ns (1/545405010^9) Now: 5684009 pps --> 175.93ns (1/568400910^9) Diff: +229959 pps --> -7.42ns Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:23 -07:00
Jesper Dangaard Brouer	9ceb87fcea	pktgen: document tuning for max NIC performance Using pktgen I'm seeing the ixgbe driver "push-back", due TX ring running full. Thus, the TX ring is artificially limiting pktgen. (Diagnose via "ethtool -S", look for "tx_restart_queue" or "tx_busy" counters.) Using ixgbe, the real reason behind the TX ring running full, is due to TX ring not being cleaned up fast enough. The ixgbe driver combines TX+RX ring cleanups, and the cleanup interval is affected by the ethtool --coalesce setting of parameter "rx-usecs". Do not increase the default NIC TX ring buffer or default cleanup interval. Instead simply document that pktgen needs special NIC tuning for maximum packet per sec performance. Performance results with pktgen with clone_skb=100000. TX ring size 512 (default), adjusting "rx-usecs": (Single CPU performance, E5-2630, ixgbe) - 3935002 pps - rx-usecs: 1 (irqs: 9346) - 5132350 pps - rx-usecs: 10 (irqs: 99157) - 5375111 pps - rx-usecs: 20 (irqs: 50154) - 5454050 pps - rx-usecs: 30 (irqs: 33872) - 5496320 pps - rx-usecs: 40 (irqs: 26197) - 5502510 pps - rx-usecs: 50 (irqs: 21527) TX ring size adjusting (ethtool -G), "rx-usecs==1" (default): - 3935002 pps - tx-size: 512 - 5354401 pps - tx-size: 768 - 5356847 pps - tx-size: 1024 - 5327595 pps - tx-size: 1536 - 5356779 pps - tx-size: 2048 - 5353438 pps - tx-size: 4096 Notice after commit `6f25cd47d` (pktgen: fix xmit test for BQL enabled devices) pktgen uses netif_xmit_frozen_or_drv_stopped() and ignores the BQL "stack" pause (QUEUE_STATE_STACK_XOFF) flag. This allow us to put more pressure on the TX ring buffers. It is the ixgbe_maybe_stop_tx() call that stops the transmits, and pktgen respecting this in the call to netif_xmit_frozen_or_drv_stopped(txq). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 15:50:23 -07:00
Jiri Pirko	5b9e7e1607	openvswitch: introduce rtnl ops stub This stub now allows userspace to see IFLA_INFO_KIND for ovs master and IFLA_INFO_SLAVE_KIND for slave. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 14:40:17 -07:00
Jiri Pirko	b0ab2fabb5	rtnetlink: allow to register ops without ops->setup set So far, it is assumed that ops->setup is filled up. But there might be case that ops might make sense even without ->setup. In that case, forbid to newlink and dellink. This allows to register simple rtnl link ops containing only ->kind. That allows consistent way of passing device kind (either device-kind or slave-kind) to userspace. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 14:40:17 -07:00
Ying Xue	9bf2b8c280	net: fix some typos in comment In commit 371121057607e3127e19b3fa094330181b5b031e("net: QDISC_STATE_RUNNING dont need atomic bit ops") the __QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING, but the old names existing in comment are not replaced with the new name completely. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 14:20:32 -07:00
Ben Greear	d933319657	ipv6: Allow accepting RA from local IP addresses. This can be used in virtual networking applications, and may have other uses as well. The option is disabled by default. A specific use case is setting up virtual routers, bridges, and hosts on a single OS without the use of network namespaces or virtual machines. With proper use of ip rules, routing tables, veth interface pairs and/or other virtual interfaces, and applications that can bind to interfaces and/or IP addresses, it is possibly to create one or more virtual routers with multiple hosts attached. The host interfaces can act as IPv6 systems, with radvd running on the ports in the virtual routers. With the option provided in this patch enabled, those hosts can now properly obtain IPv6 addresses from the radvd. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 12:16:24 -07:00
Ben Greear	f2a762d8a9	ipv6: Add more debugging around accept-ra logic. This is disabled by default, just like similar debug info already in this module. But, makes it easier to find out why RA is not being accepted when debugging strange behaviour. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-01 12:16:24 -07:00

1 2 3 4 5 ...

455656 Commits All Branches Search

455656 Commits

All Branches