Commit Graph

590287 Commits

Author SHA1 Message Date
Shrikrishna Khare f0d437809d Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.

Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:28:05 -04:00
Ben Hutchings f43bfaeddc atl2: Disable unimplemented scatter/gather feature
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs.  This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.

Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working).  Therefore
this obscure bug has been designated CVE-2016-2117.

Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: ec5f061564 ("net: Kill link between CSUM and SG features.")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:12:23 -04:00
Alexander Duyck 7f348a6076 net: Add support for IP ID mangling TSO in cases that require encapsulation
This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports
NETIF_F_TSO.  This way if needed a device can then later enable the TSO
with IP ID mangling and the tunnels on top of that device can then also
make use of the IP ID mangling as well.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:11:07 -04:00
David S. Miller 1df845be65 Merge branch 'mlx5-next'
Saeed Mahameed says:

====================
Mellanox 100G mlx5 driver receive path optimizations

Changes from V2:
	- Rebased to 46e7b8d8d5 ("net: dsa: kill circular reference with slave priv")
	- Updated: ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
		* Per Eric Dumazet comment we changed the driver memory handling scheme to
		work with order-0 pages rather than order-5 via split_page().
		* This means that now a mlx5e rx skb can hold one or (more in case of HW LRO)
                skb frag each pointing to a 4K order-0 page rather than one frag with order-5 page.
	- Updated: ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
		* Code refactoring and code reuse due the split_page() mechanism,
		  now the MPWQE and fragmented MPWQE handling almost look the same,
		  and share most of the code.
	- In some cases we see 2%-3% packet rate degradation in comparison to the order-5 pages approach,
	  due to split_page() cpu consumption, but still we do see 3%-10% improvement in comparison to the
          current linear SKB approach.
	- We do believe that now the driver memory scheme is significantly less vulnerable
	  to the memory DOS attack Eric pointed at.

Changes from V1:
	- Rebased to efde611b0a ("Merge branch 'nfp-next'")
	- Dropped: ("net/mlx5: Refactor mlx5_core_mr to mkey")
                Already merged into 4.6 from rdma tree.
	- Dropped: ("net/mlx5_core: Add ConnectX-5 to list of supported devices")
                Will be pushed to net as we want it in 4.6 release.
	- Dropped: ("net/mlx5e: Change RX moderation period to be based on CQE")
                Will be pushed in a later series with full software based adaptive moderation.
	- Added: ("net/mlx5e: Delay skb->data access")
		Small trivial optimization.
	- Updated: ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
	 	Changed Striding RQ defaults to:
			> 	NUM WQEs = 16
			> 	Strides Per WQE = 1024
			> 	Stride Size = 128
	- Updated: ("net/mlx5e: Use napi_alloc_skb for RX SKB allocations")
		Consider the IP packet alignment already done in napi_alloc_skb.

Changes from V0:
	- Fixed a typo in commit message reported by Sergei
	- Align SKB fragments truesize to stride size
	- Use skb_add_rx_frag and remove the use of SKB_TRUESIZE
	- Fix: # MTTs alignment on Power PC
	- Fix: Free original (unaligned) pointer of MTT array
	- Use dev_alloc_pages and dev_alloc_page
	- Extend the stats.buff_alloc_err counter
	- Reform the copying of packet header into skb linear data
	- Add compiler hints for conditional statements
	- Prefetch skd->data prior to copying packet header into it
	- Rework: mlx5e_complete_rx_fragmented_mpwqe
	- Handle SKB fragments before linear data
	- Dropped ("net/mlx5e: Prefetch next RX CQE") for now
	- Added a small patch that Adds ConnectX-5 devices to the list of supported devices
	- Rebased to 1cdba55055 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next")

This series includes Some RX modifications and optimizations for
the mlx5 Ethernet driver.

From Rana, we have one patch that adds the support for Connectx-4
queue counters.

From Tariq, several patches that are centralized around improving
RX path message rate, CPU and Memory utilization, in each patch
commit message you will find the performance improvements numbers
related to that specific patch.

In the 2nd patch we used a queue counter to report "out of buffer"
dropped packet count, "Dropped packets due to lack of software resources"

3rd patch modifies the driver's to RSS default value to be spread along the
close NUMA node cores only for better out of the box experience.

In the 4th and 5th patches we utilized the use of RX multi-packet WQE
(Striding RQ) for better memory utilization especially in case of hardware
LRO is enabled and for better message rate for small packets.

In the 6th and 7th patches we added a fallback mechanism to use fragmented
memory when allocating large WQE strides fails, using UMR
(User Memory Registration) and ICO (Internal Control Operations) SQs.

In the 8th to 11th patches we did some small modification which show some small
extra improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:06 -04:00
Tariq Toukan 5498440756 net/mlx5e: Add ethtool counter for RX buffer allocation failures
Counts the number of RX buffer allocation failures and shows it
in ethtool statistics.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:06 -04:00
Saeed Mahameed e20a0db304 net/mlx5e: Delay skb->data access
Move mlx5e_handle_csum and eth_type_trans to the end of
mlx5e_build_rx_skb to gain some more time before accessing
skb->data, to reduce cache misses.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:06 -04:00
Tariq Toukan 1bfec31627 net/mlx5e: Remove redundant barrier
The bit-op operation one line before is an explicit barrier
by itself.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan c5adb96f6c net/mlx5e: Use napi_alloc_skb for RX SKB allocations
Instead of netdev_alloc_skb, we use the napi_alloc_skb function
which is designated to allocate skbuff's for RX in a
channel-specific NAPI instance, and implies the IP packet alignment.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan bc77b240b3 net/mlx5e: Add fragmented memory support for RX multi packet WQE
If the allocation of a linear (physically continuous) MPWQE fails,
we allocate a fragmented MPWQE.

This is implemented via device's UMR (User Memory Registration)
which allows to register multiple memory fragments into ConnectX
hardware as a continuous buffer.
UMR registration is an asynchronous operation and is done via
ICO SQs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan d3c9bc2743 net/mlx5e: Added ICO SQs
Added ICO (Internal Control Operations) SQ per channel to be used
for driver internal operations such as memory registration for
fragmented memory and nop requests upon ifconfig up.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan 461017cb00 net/mlx5e: Support RX multi-packet WQE (Striding RQ)
Introduce the feature of multi-packet WQE (RX Work Queue Element)
referred to as (MPWQE or Striding RQ), in which WQEs are larger
and serve multiple packets each.

Every WQE consists of many strides of the same size, every received
packet is aligned to a beginning of a stride and is written to
consecutive strides within a WQE.

In the regular approach, each regular WQE is big enough to be capable
of serving one received packet of any size up to MTU or 64K in case of
device LRO is enabled, making it very wasteful when dealing with
small packets or device LRO is enabled.

For its flexibility, MPWQE allows a better memory utilization
(implying improvements in CPU utilization and packet rate) as packets
consume strides according to their size, preserving the rest of
the WQE to be available for other packets.

MPWQE default configuration:
	Num of WQEs	= 16
	Strides Per WQE = 2048
	Stride Size	= 64 byte

The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
16 * 2048 * 64 = 2MB per ring.
However, HW LRO can now be supported at no additional cost in memory
footprint, and hence we turn it on by default and get an even better
performance.

Performance tested on ConnectX4-Lx 50G.
To isolate the feature under test, the numbers below were measured with
HW LRO turned off. We verified that the performance just improves when
LRO is turned back on.

* Netperf single TCP stream:
- BW raised by 10-15% for representative packet sizes:
  default, 64B, 1024B, 1478B, 65536B.

* Netperf multi TCP stream:
- No degradation, line rate reached.

* Pktgen: packet rate raised by 2-10% for traffic of different message
sizes: 64B, 128B, 256B, 1024B, and 1500B.

* Pktgen: packet loss in bursts of small messages (64byte),
single stream:
- | num packets | packets loss before | packets loss after
  |     2K      |       ~ 1K          |       0
  |     8K      |       ~ 6K          |       0
  |     16K     |       ~13K          |       0
  |     32K     |       ~28K          |       0
  |     64K     |       ~57K          |     ~24K

As expected as the driver can receive as many small packets (<=64B) as
the number of total strides in the ring (default = 2048 * 16) vs. 1024
(default ring size regardless of packets size) before this feature.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan 2f48af128d net/mlx5e: Use function pointers for RX data path handling
In preparation for Striding RQ feature, which will need its own
RX handlers.
This patch does not change any functionality.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Tariq Toukan d8c9660dac net/mlx5e: Use only close NUMA node for default RSS
Distribute default RSS table uniformly over the rings of the
close NUMA node, instead of all available channels.
This way we enforce the preference of close rings over far ones.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Rana Shahout 593cf33829 net/mlx5e: Allocate set of queue counters per netdev
Connect all netdev RQs to this set of queue counters.
Also, add an "rx_out_of_buffer" counter to ethtool,
which indicates RX packet drops due to lack of receive
buffers.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Tariq Toukan 237cd21809 net/mlx5: Introduce device queue counters
A queue counter can collect several statistics for one or more
hardware queues (QPs, RQs, etc ..) that the counter is attached to.

For Ethernet it will provide an "out of buffer" counter which
collects the number of all packets that are dropped due to lack
of software buffers.

Here we add device commands to alloc/query/dealloc queue counters.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
David S. Miller b8fd789afb Merge branch 'bcmsysport-napi-updates'
Florian Fainelli says:

====================
net: bcmsysport: utilize newer NAPI APIs

These two patches are very analoguous to what was already submitted for
BCMGENET and switch the SYSTEMPORT driver to utilizing __napi_schedule_irqoff()
and napi_complete_done for the RX NAPI context.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:06:05 -04:00
Florian Fainelli c82f47efa0 net: bcmsysport: use napi_complete_done()
By using napi_complete_done(), we allow fine tuning of
/sys/class/net/ethX/gro_flush_timeout for higher GRO aggregation
efficiency for a Gbit NIC.

Check commit 24d2e4a507 ("tg3: use napi_complete_done()") for details.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:06:04 -04:00
Florian Fainelli ba90950c94 net: bcmsysport: use __napi_schedule_irqoff()
Both bcm_sysport_tx_isr() and bcm_sysport_rx_isr() run in hard irq
context, we do not need to block irq again.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:06:04 -04:00
David S. Miller 669c00c009 Merge branch 'mlx4-fixes'
Or Gerlitz says:

====================
Mellaox 40G driver fixes for 4.6-rc

With the fix for ARM bug being under the works, these are
few other fixes for mlx4 we have ready to go.

Eran addressed the problematic/wrong reporting of dropped packets, Daniel
fixed some matters related to PPC EEH's and Jenny's patch makes sure
VFs can't change the port's pause settings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:02:41 -04:00
Eran Ben Elisha d21ed3a311 net/mlx4_en: Split SW RX dropped counter per RX ring
Count SW packet drops per RX ring instead of a global counter. This
will allow monitoring the number of rx drops per ring.

In addition, SW rx_dropped counter was overwritten by HW rx_dropped
counter, sum both of them instead to show the accurate value.

Fixes: a3333b35da ('net/mlx4_en: Moderate ethtool callback to [...] ')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:02:40 -04:00
Eugenia Emantayev 2a500090a4 net/mlx4_core: Don't allow to VF change global pause settings
Currently changing global pause settings is done via SET_PORT
command with input modifier GENERAL. This command is allowed
for each VF since MTU setting is done via the same command.

Change the above to the following scheme: before passing the
request to the FW, the PF will check whether it was issued
by a slave. If yes, don't change global pause and warn,
otherwise change to the requested value and store for
further reference.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:02:40 -04:00
Daniel Jurgens 4bfd2e6e53 net/mlx4_core: Avoid repeated calls to pci enable/disable
Maintain the PCI status and provide wrappers for enabling and disabling
the PCI device.  Performing the actions more than once without doing
its opposite results in warning logs.

This occurred when EEH hotplugged the device causing a warning for
disabling an already disabled device.

Fixes: 2ba5fbd62b ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:02:40 -04:00
Daniel Jurgens c12833acff net/mlx4_core: Implement pci_resume callback
Move resume related activities to a new pci_resume function instead of
performing them in mlx4_pci_slot_reset.  This change is needed to avoid
a hotplug during EEH recovery due to commit f2da4ccf8b ("powerpc/eeh:
More relaxed hotplug criterion").

Fixes: 2ba5fbd62b ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:02:39 -04:00
Mark Brown a1459c1c9e net: phy: spi_ks8895: Don't leak references to SPI devices
The ks8895 driver is using spi_dev_get() apparently just to take a copy
of the SPI device used to instantiate it but never calls spi_dev_put()
to free it.  Since the device is guaranteed to exist between probe() and
remove() there should be no need for the driver to take an extra
reference to it so fix the leak by just using a straight assignment.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:00:27 -04:00
Neil Armstrong 210990b05a net: ethernet: davinci_emac: Fix platform_data overwrite
When the DaVinci emac driver is removed and re-probed, the actual
pdev->dev.platform_data is populated with an unwanted valid pointer saved by
the previous davinci_emac_of_get_pdata() call, causing a kernel crash when
calling priv->int_disable() in emac_int_disable().

Unable to handle kernel paging request at virtual address c8622a80
...
[<c0426fb4>] (emac_int_disable) from [<c0427700>] (emac_dev_open+0x290/0x5f8)
[<c0427700>] (emac_dev_open) from [<c04c00ec>] (__dev_open+0xb8/0x120)
[<c04c00ec>] (__dev_open) from [<c04c0370>] (__dev_change_flags+0x88/0x14c)
[<c04c0370>] (__dev_change_flags) from [<c04c044c>] (dev_change_flags+0x18/0x48)
[<c04c044c>] (dev_change_flags) from [<c052bafc>] (devinet_ioctl+0x6b4/0x7ac)
[<c052bafc>] (devinet_ioctl) from [<c04a1428>] (sock_ioctl+0x1d8/0x2c0)
[<c04a1428>] (sock_ioctl) from [<c014f054>] (do_vfs_ioctl+0x41c/0x600)
[<c014f054>] (do_vfs_ioctl) from [<c014f2a4>] (SyS_ioctl+0x6c/0x7c)
[<c014f2a4>] (SyS_ioctl) from [<c000ff60>] (ret_fast_syscall+0x0/0x1c)

Fixes: 42f59967a0 ("net: ethernet: davinci_emac: add OF support")
Cc: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:57:47 -04:00
Neil Armstrong 99164f9e62 net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
In order to avoid an Unbalanced pm_runtime_enable in the DaVinci
emac driver when the device is removed and re-probed, and a
pm_runtime_disable() call in davinci_emac_remove().

Actually, using unbind/bind on a TI DM8168 SoC gives :
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/unbind
net eth1: DaVinci EMAC: davinci_emac_remove()
$ echo 4a120000.ethernet > /sys/bus/platform/drivers/davinci_emac/bind
davinci_emac 4a120000.ethernet: Unbalanced pm_runtime_enable

Cc: Brian Hutchinson <b.hutchman@gmail.com>
Fixes: 3ba9738134 ("net: ethernet: davinci_emac: add pm_runtime support")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:57:47 -04:00
Rafael J. Wysocki 395da1259a Merge branch 'pm-cpufreq-fixes'
* pm-cpufreq-fixes:
  cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
  intel_pstate: Avoid getting stuck in high P-states when idle
2016-04-21 20:57:46 +02:00
David S. Miller 3ad977992f Merge branch 'qed-fixes'
Manish Chopra says:

====================
qede: Bug fixes

This series fixes -

* various memory allocation failure flows for fastpath
* issues with respect to driver GRO packets handling

V1->V2

* Send series against net instead of net-next.

Please consider applying this series to "net"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:51:29 -04:00
Manish Chopra ee2fa8e6b3 qede: Fix single MTU sized packet from firmware GRO flow
In firmware assisted GRO flow there could be a single MTU sized
segment arriving due to firmware aggregation timeout/last segment
in an aggregation flow, which is not expected to be an actual gro
packet. So If a skb has zero frags from the GRO flow then simply
push it in the stack as non gso skb.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:51:29 -04:00
Manish Chopra aad94c0408 qede: Fix setting Skb network header
Skb's network header needs to be set before extracting IPv4/IPv6
headers from it.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:51:28 -04:00
Manish Chopra f86af2dfde qede: Fix various memory allocation error flows for fastpath
This patch handles memory allocation failures for fastpath
gracefully in the driver.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:51:28 -04:00
David S. Miller 5bec11cf44 Merge branch 'tcp-coalesce-merge-timestamps'
Martin KaFai Lau says:

====================
tcp: Merge timestamp info when coalescing skbs

This series is separated from the RFC series related to
tcp_sendmsg(MSG_EOR) and it is targeting for the net branch.
This patchset is focusing on fixing cases where TCP
timestamp could be lost after coalescing skbs.

A BPF prog is used to kprobe to sock_queue_err_skb()
and print out the value of serr->ee.ee_data.  The BPF
prog (run-able from bcc) is attached here:

BPF prog used for testing:
~~~~~

from __future__ import print_function
from bcc import BPF

bpf_text = """

int trace_err_skb(struct pt_regs *ctx)
{
	struct sk_buff *skb = (struct sk_buff *)ctx->si;
	struct sock *sk = (struct sock *)ctx->di;
	struct sock_exterr_skb *serr;
	u32 ee_data = 0;

	if (!sk || !skb)
		return 0;

	serr = SKB_EXT_ERR(skb);
	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
	bpf_trace_printk("ee_data:%u\\n", ee_data);

	return 0;
};
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:40:56 -04:00
Martin KaFai Lau cfea5a688e tcp: Merge tx_flags and tskey in tcp_shifted_skb
After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible.  tx_flags and tskey also have to be
merged.

This patch reuses the tcp_skb_collapse_tstamp() to handle
them.

BPF Output Before:
~~~~~
<no-output-due-to-missing-tstamp-event>

BPF Output After:
~~~~~
<...>-2024  [007] d.s.    88.644374: : ee_data:14599

Packetdrill Script:
~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140

0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1

0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:40:55 -04:00
Martin KaFai Lau 082ac2d51d tcp: Merge tx_flags and tskey in tcp_collapse_retrans
If two skbs are merged/collapsed during retransmission, the current
logic does not merge the tx_flags and tskey.  The end result is
the SCM_TSTAMP_ACK timestamp could be missing for a packet.

The patch:
1. Merge the tx_flags
2. Overwrite the prev_skb's tskey with the next_skb's tskey

BPF Output Before:
~~~~~~
<no-output-due-to-missing-tstamp-event>

BPF Output After:
~~~~~~
packetdrill-2092  [001] d.s.   453.998486: : ee_data:1459

Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
0.200 write(4, ..., 11680) = 11680
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0

0.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 1

0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 13141 win 257

0.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:40:55 -04:00
Grygorii Strashko 3fa88c51c7 drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
The cpsw_ndo_open() could try to access CPSW registers before
calling pm_runtime_get_sync(). This will trigger L3 error:

 WARNING: CPU: 0 PID: 21 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x220/0x34c()
 44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_FAST (Idle): Data Access in Supervisor mode during Functional access

and CPSW will stop functioning.

Hence, fix it by moving pm_runtime_get_sync() before the first access
to CPSW registers in cpsw_ndo_open().

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:30:39 -04:00
David S. Miller c57107c7f5 Merge branch 'nlattr_align'
Nicolas Dichtel says:

====================
libnl: enhance API to ease 64bit alignment for attribute

Here is a proposal to add more helpers in the libnetlink to manage 64-bit
alignment issues.
Note that this series was only tested on x86 by tweeking
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.

The first patch adds helpers for 64bit alignment and other patches
use them.

We could also add helpers for nla_put_u64() and its variants if needed.

v1 -> v2:
 - remove patch #1
 - split patch #2 (now #1 and #2)
 - add nla_need_padding_for_64bit()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:22:14 -04:00
Nicolas Dichtel 3d6b66c1d1 ip6mr: align RTA_MFC_STATS on 64-bit
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:22:13 -04:00
Nicolas Dichtel a9a080422e ipmr: align RTA_MFC_STATS on 64-bit
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:22:13 -04:00
Nicolas Dichtel 58414d32a3 rtnl: use the new API to align IFLA_STATS*
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:22:13 -04:00
Nicolas Dichtel 089bf1a6a9 libnl: add more helpers to align attributes on 64-bit
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:22:12 -04:00
Alexander Duyck 732912d727 veth: Update features to include all tunnel GSO types
This patch adds support for the checksum enabled versions of UDP and GRE
tunnels.  With this change we should be able to send and receive GSO frames
of these types over the veth pair without needing to segment the packets.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:14:59 -04:00
Alexander Duyck b1c20f0b97 netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE
This patch folds NETIF_F_ALL_TSO into the bitmask for NETIF_F_GSO_SOFTWARE.
The idea is to avoid duplication of defines since the only difference
between the two was the GSO_UDP bit.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:14:58 -04:00
Dan Carpenter 1ba64facae geneve: testing the wrong variable in geneve6_build_skb()
We intended to test "err" and not "skb".

Fixes: aed069df09 ('ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 14:08:00 -04:00
Peter Heise f937572925 NLA_BINARY misuse bug in HSR
Removed .type field from NLA to do proper length checking.
Reported by Daniel Borkmann and Julia Lawall.

Signed-off-by: Peter Heise <peter.heise@airbus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:59:08 -04:00
Xin Long b7de529c79 net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag
EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates
back to before jiffies_to_msecs() has been introduced.

Now we can remove it and use jiffies_to_msecs().

Suggested-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:55:33 -04:00
Alexei Starovoitov 85b67bcb7e perf, bpf: minimize the size of perf_trace_() tracepoint handler
move trace_call_bpf() into helper function to minimize the size
of perf_trace_*() tracepoint handlers.
    text	   data	    bss	    dec	 	   hex	filename
10541679	5526646	2945024	19013349	1221ee5	vmlinux_before
10509422	5526646	2945024	18981092	121a0e4	vmlinux_after

It may seem that perf_fetch_caller_regs() can also be moved,
but that is incorrect, since ip/sp will be wrong.

bpf+tracepoint performance is not affected, since
perf_swevent_put_recursion_context() is now inlined.
export_symbol_gpl can also be dropped.

No measurable change in normal perf tracepoints.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:48:20 -04:00
Martin KaFai Lau 479f85c366 tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
it could incorrectly think that a skb has already
been acked and queue a SCM_TSTAMP_ACK cmsg to the
sk->sk_error_queue.

In tcp_ack_tstamp(), it checks
'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
script, between() returns true but the tskey is actually not acked.
e.g. try between(3, 2, 1).

The fix is to replace between() with one before() and one !before().
By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
removed.

A packetdrill script is used to reproduce the dup ack scenario.
Due to the lacking cmsg support in packetdrill (may be I
cannot find it),  a BPF prog is used to kprobe to
sock_queue_err_skb() and print out the value of
serr->ee.ee_data.

Both the packetdrill and the bcc BPF script is attached at the end of
this commit message.

BPF Output Before Fix:
~~~~~~
      <...>-2056  [001] d.s.   433.927987: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   433.929563: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   433.930765: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   434.028177: : ee_data:1459
packetdrill-2056  [001] d.s.   434.029686: : ee_data:14599

BPF Output After Fix:
~~~~~~
      <...>-2049  [000] d.s.   113.517039: : ee_data:1459
      <...>-2049  [000] d.s.   113.517253: : ee_data:14599

BCC BPF Script:
~~~~~~
#!/usr/bin/env python

from __future__ import print_function
from bcc import BPF

bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
#include <linux/errqueue.h>

#ifdef memset
#undef memset
#endif

int trace_err_skb(struct pt_regs *ctx)
{
	struct sk_buff *skb = (struct sk_buff *)ctx->si;
	struct sock *sk = (struct sock *)ctx->di;
	struct sock_exterr_skb *serr;
	u32 ee_data = 0;

	if (!sk || !skb)
		return 0;

	serr = SKB_EXT_ERR(skb);
	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
	bpf_trace_printk("ee_data:%u\\n", ee_data);

	return 0;
};
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()

Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 1460) = 1460
0.200 write(4, ..., 13140) = 13140

0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1

0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:45:43 -04:00
Vivien Didelot c60c984042 net: dsa: remove tag_protocol from dsa_switch
Having the tag protocol in dsa_switch_driver for setup time and in
dsa_switch_tree for runtime is enough. Remove dsa_switch's one.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:43:11 -04:00
Joe Stringer 49e261a8a2 openvswitch: Orphan skbs before IPv6 defrag
This is the IPv6 counterpart to commit 8282f27449 ("inet: frag: Always
orphan skbs inside ip_defrag()").

Prior to commit 029f7f3b87 ("netfilter: ipv6: nf_defrag: avoid/free
clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
cloned (implicitly orphaning) prior to queueing for reassembly. As such,
when the IPv6 message is eventually reassembled, the skb->sk for all
fragments would be NULL. After that commit was introduced, rather than
cloning, the original skbs were queued directly without orphaning. The
end result is that all frags except for the first and last may have a
socket attached.

This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
prevent BUG_ON(skb->sk) during a later call to ip6_fragment().

kernel BUG at net/ipv6/ip6_output.c:631!
[...]
Call Trace:
 <IRQ>
 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
 [<ffffffffa042c7c0>] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
 [<ffffffff810bb8a2>] ? __lock_is_held+0x52/0x70
 [<ffffffffa042c587>] ovs_fragment+0x1f7/0x280 [openvswitch]
 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
 [<ffffffff81697ea0>] ? dst_discard_out+0x20/0x20
 [<ffffffff81697e80>] ? dst_ifdown+0x80/0x80
 [<ffffffffa042c703>] do_output.isra.28+0xf3/0x1b0 [openvswitch]
 [<ffffffffa042d279>] do_execute_actions+0x709/0x12c0 [openvswitch]
 [<ffffffffa04340a4>] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
 [<ffffffffa04340d1>] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
 [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
 [<ffffffffa042de75>] ovs_execute_actions+0x45/0x120 [openvswitch]
 [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
 [<ffffffff817be387>] ? _raw_spin_unlock+0x27/0x40
 [<ffffffffa042def4>] ovs_execute_actions+0xc4/0x120 [openvswitch]
 [<ffffffffa0432d65>] ovs_dp_process_packet+0x85/0x150 [openvswitch]
 [<ffffffffa04337f2>] ? key_extract+0x442/0xc10 [openvswitch]
 [<ffffffffa043b26d>] ovs_vport_receive+0x5d/0xb0 [openvswitch]
 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
 [<ffffffff810be8f7>] ? __lock_acquire+0x927/0x20a0
 [<ffffffff817be416>] ? _raw_spin_unlock_irqrestore+0x36/0x50
 [<ffffffffa043c11d>] internal_dev_xmit+0x6d/0x150 [openvswitch]
 [<ffffffffa043c0b5>] ? internal_dev_xmit+0x5/0x150 [openvswitch]
 [<ffffffff8168fb5f>] dev_hard_start_xmit+0x2df/0x660
 [<ffffffff8168f5ea>] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
 [<ffffffff81690925>] __dev_queue_xmit+0x8f5/0x950
 [<ffffffff81690080>] ? __dev_queue_xmit+0x50/0x950
 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
 [<ffffffff81690990>] dev_queue_xmit+0x10/0x20
 [<ffffffff8169a418>] neigh_resolve_output+0x178/0x220
 [<ffffffff81752759>] ? ip6_finish_output2+0x219/0x7b0
 [<ffffffff81752759>] ip6_finish_output2+0x219/0x7b0
 [<ffffffff817525a5>] ? ip6_finish_output2+0x65/0x7b0
 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
 [<ffffffff817796cc>] icmpv6_push_pending_frames+0xac/0xe0
 [<ffffffff8177a4be>] icmpv6_echo_reply+0x42e/0x500
 [<ffffffff8177acbf>] icmpv6_rcv+0x4cf/0x580
 [<ffffffff81755ac7>] ip6_input_finish+0x1a7/0x690
 [<ffffffff81755925>] ? ip6_input_finish+0x5/0x690
 [<ffffffff817567a0>] ip6_input+0x30/0xa0
 [<ffffffff81755920>] ? ip6_rcv_finish+0x1a0/0x1a0
 [<ffffffff817557ce>] ip6_rcv_finish+0x4e/0x1a0
 [<ffffffff8175640f>] ipv6_rcv+0x45f/0x7c0
 [<ffffffff81755fe6>] ? ipv6_rcv+0x36/0x7c0
 [<ffffffff81755780>] ? ip6_make_skb+0x1c0/0x1c0
 [<ffffffff8168b649>] __netif_receive_skb_core+0x229/0xb80
 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
 [<ffffffff8168c07f>] ? process_backlog+0x6f/0x230
 [<ffffffff8168bfb6>] __netif_receive_skb+0x16/0x70
 [<ffffffff8168c088>] process_backlog+0x78/0x230
 [<ffffffff8168c0ed>] ? process_backlog+0xdd/0x230
 [<ffffffff8168db43>] net_rx_action+0x203/0x480
 [<ffffffff810bdab5>] ? mark_held_locks+0x75/0xa0
 [<ffffffff817c156e>] __do_softirq+0xde/0x49f
 [<ffffffff81752768>] ? ip6_finish_output2+0x228/0x7b0
 [<ffffffff817c070c>] do_softirq_own_stack+0x1c/0x30
 <EOI>
 [<ffffffff8106f88b>] do_softirq.part.18+0x3b/0x40
 [<ffffffff8106f946>] __local_bh_enable_ip+0xb6/0xc0
 [<ffffffff81752791>] ip6_finish_output2+0x251/0x7b0
 [<ffffffff81754af1>] ? ip6_fragment+0xba1/0xc50
 [<ffffffff816cde2b>] ? ip_idents_reserve+0x6b/0x80
 [<ffffffff8175488f>] ? ip6_fragment+0x93f/0xc50
 [<ffffffff81754af1>] ip6_fragment+0xba1/0xc50
 [<ffffffff81752540>] ? ip6_flush_pending_frames+0x40/0x40
 [<ffffffff81754c6b>] ip6_finish_output+0xcb/0x1d0
 [<ffffffff81754dcf>] ip6_output+0x5f/0x1a0
 [<ffffffff81754ba0>] ? ip6_fragment+0xc50/0xc50
 [<ffffffff81797fbd>] ip6_local_out+0x3d/0x80
 [<ffffffff817554df>] ip6_send_skb+0x2f/0xc0
 [<ffffffff817555bd>] ip6_push_pending_frames+0x4d/0x50
 [<ffffffff81778558>] rawv6_sendmsg+0xa28/0xe30
 [<ffffffff81719097>] ? inet_sendmsg+0xc7/0x1d0
 [<ffffffff817190d6>] inet_sendmsg+0x106/0x1d0
 [<ffffffff81718fd5>] ? inet_sendmsg+0x5/0x1d0
 [<ffffffff8166d078>] sock_sendmsg+0x38/0x50
 [<ffffffff8166d4d6>] SYSC_sendto+0xf6/0x170
 [<ffffffff8100201b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
 [<ffffffff8166e38e>] SyS_sendto+0xe/0x10
 [<ffffffff817bebe5>] entry_SYSCALL_64_fastpath+0x18/0xa8
Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 00 00 49 8#
RIP  [<ffffffff8175468a>] ip6_fragment+0x73a/0xc50
 RSP <ffff880072803120>

Fixes: 029f7f3b87 ("netfilter: ipv6: nf_defrag: avoid/free clone
operations")
Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:42:05 -04:00
David S. Miller 2dabf0c4d0 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2016-04-20

This series contains updates to fm10k only.

Jacob provides majority of the changes in this series, starting with the
addition of helper functions to reduce code duplication and the amount
of code indentation.  Fixed the use or should we say abuse of the ethtool
stats API, which could result in corrupt memory or misleading statistic
output.  Added the appropriate rtnl_lock() and rtnl_unlock() to avoid
RCU warnings during AER events.  Come to find out, the PTP/1588 support
is not working with the current version of switch management software
and possibly never worked, so just remove support for PTP/1588 for now.
Fixed how error responses from the switch manager after a LPORT_MAP
request are handled, originally which were silently being ignored.
Fixed up code documentation to hopefully ease the code and comment
comprehension.  Fixed a possible NULL pointer dereference after a
kcalloc(), where when writing a new default redirection table, and we
needed to populate a new RSS table using ethtool_rxfh_indir_default().
We populate this table into a region of memory allocated using kcalloc()
but never check it for NULL.

Alex adds support for bulk transmit cleanup for fm10k, like he did for
all of our other drivers.

Ngai-Mint fixes a number of issues with the unicast and multicast address
syncs.  Where an issue would occur when the netdev is pre-configured to
either multicast mode and is enabled for the first time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 11:52:05 -04:00