Commit Graph

496712 Commits

Author SHA1 Message Date
Shaohui Xie ca43e58ca2 net/fsl: drop in_be32() & out_be32() in xgmac_mdio
Use ioread32be() & iowrite32be() instead.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 23:36:05 -08:00
Eric Dumazet 24f87d4ce1 bonding: handle more gso types
In commit 5a7baa7885 ("bonding: Advertize vxlan offload features when
supported"), Or Gerlitz added support conditional vxlan offload.

In this patch I also add support for all kind of tunnels,
but we allow a bonding device to not require segmentation,
as it is always better to make this segmentation at the very last stage,
if a particular slave device requires it.

Tested:

 Setup a GRE tunnel,
 on a physical NIC not having tx-gre-segmentation.
 Results on bnx2x are even better, as we no longer have to segment
 in software.

ethtool -K bond0 tx-gre-segmentation off

super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
7538.32

ethtool -K bond0 tx-gre-segmentation on

super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15
10200.5

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 23:34:23 -08:00
Dan Carpenter 1b846f9282 bridge: simplify br_getlink() a bit
Static checkers complain that we should maybe set "ret" before we do the
"goto out;".  They interpret the NULL return from br_port_get_rtnl() as
a failure and forgetting to set the error code is a common bug in this
situation.

The code is confusing but it's actually correct.  We are returning zero
deliberately.  Let's re-write it a bit to be more clear.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 23:32:49 -08:00
Martin KaFai Lau b0a1ba5992 ipv6: Fix __ip6_route_redirect
In my last commit (a3c00e4: ipv6: Remove BACKTRACK macro), the changes in
__ip6_route_redirect is incorrect.  The following case is missed:
1. The for loop tries to find a valid gateway rt. If it fails to find
   one, rt will be NULL.
2. When rt is NULL, it is set to the ip6_null_entry.
3. The newly added 'else if', from a3c00e4, will stop the backtrack from
   happening.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 22:09:51 -08:00
Linus Torvalds 26bc420b59 Linux 3.19-rc6 2015-01-25 20:04:41 -08:00
Linus Torvalds 14746306af Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Hopefully the last round of fixes for 3.19

   - regression fix for the LDT changes
   - regression fix for XEN interrupt handling caused by the APIC
     changes
   - regression fixes for the PAT changes
   - last minute fixes for new the MPX support
   - regression fix for 32bit UP
   - fix for a long standing relocation issue on 64bit tagged for stable
   - functional fix for the Hyper-V clocksource tagged for stable
   - downgrade of a pr_err which tends to confuse users

  Looks a bit on the large side, but almost half of it are valuable
  comments"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Change Fast TSC calibration failed from error to info
  x86/apic: Re-enable PCI_MSI support for non-SMP X86_32
  x86, mm: Change cachemode exports to non-gpl
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86, mpx: Strictly enforce empty prctl() args
  x86, mpx: Fix potential performance issue on unmaps
  x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels
  x86, hyperv: Mark the Hyper-V clocksource as being continuous
  x86: Don't rely on VMWare emulating PAT MSR correctly
  x86, irq: Properly tag virtualization entry in /proc/interrupts
  x86, boot: Skip relocs when load address unchanged
  x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi
  ACPI: pci: Do not clear pci_dev->irq in acpi_pci_irq_disable()
  x86/xen: Treat SCI interrupt as normal GSI interrupt
2015-01-25 18:11:17 -08:00
Linus Torvalds 4d2f0ef1c9 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "From the irqchip departement you get:

   - regression fix for omap-intc

   - regression fix for atmel-aic-common

   - functional correctness fix for hip04

   - type mismatch fix for gic-v3-its

   - proper error pointer check for mtd-sysirq

  Mostly one and two liners except for the omap regression fix which is
  slightly larger than desired"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: atmel-aic-common: Prevent clobbering of priority when changing IRQ type
  irqchip: omap-intc: Fix legacy DMA regression
  irqchip: gic-v3-its: Fix use of max with decimal constant
  irqchip: hip04: Initialize hip04_cpu_map to 0xffff
  irqchip: mtk-sysirq: Use IS_ERR() instead of NULL pointer check
2015-01-25 18:07:01 -08:00
Linus Torvalds b73f0c8f4b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A set of small fixes:

   - regression fix for exynos_mct clocksource

   - trivial build fix for kona clocksource

   - functional one liner fix for the sh_tmu clocksource

   - two validation fixes to prevent (root only) data corruption in the
     kernel via settimeofday and adjtimex.  Tagged for stable"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  clocksource: sh_tmu: Set cpu_possible_mask to fix SMP broadcast
  clocksource: kona: fix __iomem annotation
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
2015-01-25 17:47:34 -08:00
Linus Torvalds 71a59b1272 ARM: SoC fixes
A week's worth of fixes for various ARM platforms. Diff wise, the
 largest fix is for OMAP to deal with how GIC now registers interrupts
 (irq_domain_add_legacy() -> irq_domain_add_linear() changes).
 
 Besides this, a few more renesas platforms needed the GIC instatiation
 done for legacy boards. There's also a fix that disables coherency of
 mvebu due to issues, and a few other smaller fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUxA52AAoJEIwa5zzehBx3TL0P/ArOnYhDuaZIAQA+/tAKLt4Z
 CZJmngf7cOA42No1C6ZGvbnORZvptcYoAS/vbnVkGQUFb9H+48RHVFB2/9RYf7JR
 18SbFV594odtDfVQ4fA6ZQzx42h5rVnFPxE74Qir1LJiCO50h+Q+3+ufUBIzIrD9
 3JvUSDa/g/zkr4OEnscuZznaNzp9HH5i8pZs82PZLKn0IdOR5BuWGd0mwKul1aQt
 oR41ijskC4XTXGGLa5PvD9GFoVQ5rNaTkmwjKACRxzp+K36y21pOHDv+NPEqyEM1
 EtiXnZ0biBY4S1ICgO69NzEI3GSRTtya7z53tPxE7B4AhYkrGsweqPjNxGhvglon
 rOPxEdCNA4s1iUNgRCAkxiwEkCxXfPf4Gsl24qdcHkIaRbUhCOCrlb1QCMwDEi4v
 9uiCWEVElWhzqtj+nEpFC202w6sXUlufFMa5O97+N/qVuCAe3LCTdA8J1iD0CYmB
 rjz6bc6SclKyaSjZY2mQfwjZNFMoPNn9SItenZ4qvUi6Al/VSxpvg5DnpOwDBTnW
 eiO27Zl9B3lfys09LAKWP1q+XJOtuOt6x3GhtmFP4UaTQIHVmL0ZK3ekf6sB0P/y
 dt+Pko/NqeW7Eg7cYZcRn5mpf5DMeHHLWP13PF72SZGvOyunS/fTLDSZNPmexghT
 9ShEkFrXcX2BzAf12pxB
 =nrT2
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A week's worth of fixes for various ARM platforms.  Diff wise, the
  largest fix is for OMAP to deal with how GIC now registers interrupts
  (irq_domain_add_legacy() -> irq_domain_add_linear() changes).

  Besides this, a few more renesas platforms needed the GIC instatiation
  done for legacy boards.  There's also a fix that disables coherency of
  mvebu due to issues, and a few other smaller fixes"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: add baud rate to Juno stdout-path
  ARM: dts: imx25: Fix PWM "per" clocks
  bus: mvebu-mbus: fix support of MBus window 13
  Merge tag 'mvebu-fixes-3.19-3' of git://git.infradead.org/linux-mvebu into fixes
  ARM: mvebu: completely disable hardware I/O coherency
  ARM: OMAP: Work around hardcoded interrupts
  ARM: shmobile: r8a7779: Instantiate GIC from C board code in legacy builds
  ARM: shmobile: r8a7778: Instantiate GIC from C board code in legacy builds
  arm: boot: dts: dra7: enable dwc3 suspend PHY quirk
2015-01-25 17:29:06 -08:00
Linus Torvalds 80a755545d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A couple of fixes - deadlock in CIFS and build breakage in cris serial
  driver (resurfaced f_dentry in there)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Convert file->f_dentry->d_inode to file_inode()
  fix deadlock in cifs_ioctl_clone()
2015-01-25 17:27:18 -08:00
Linus Torvalds bfc835b571 2 stable fixes for dm-cache and 1 3.19 DM core fix:
- Fix potential for dm-cache metadata corruption via stale metadata
   buffers being used when switching an inactive cache table to active;
   this could occur due to each table having it's own bufio client rather
   than sharing the client between tables.
 
 - Fix dm-cache target to properly account for discard IO while
   suspending otherwise IO quiescing could complete prematurely.
 
 - Fix DM core's handling of multiple internal suspends by maintaining an
   'internal_suspend_count' and only resuming the device when this count
   drops to zero.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEbBAABAgAGBQJUxEXBAAoJEMUj8QotnQNaejYH+LQY5iTHaBQxvDq78pJIItAc
 s8Uh+L0NdHXiDbS+2sSnAfYVfbL+YQ7ivUaXv8I9zfW+mVDREYk91VTSYP8SaM7r
 n5ja/PMWe3DhbZo39rZCJzRZpt1TTzIGRPFXzflKbco5HVTHqXUEurGnF4Z5ivXB
 w1I89WqHPY1Y9W7AiahJk12474IhHMa9urb9l5y/SUMSA3kNos1yubRfAlxBaWN/
 2eiD5bZB50Qvy+DGJz00yBvP/uauyIxjI0C9LM8ELCBOyoENWPP8H8IqRY6fg6hW
 K5ke8nN3/G8W6UiAI7KCdUFP8tM+RrOjPSxL0JtZFj1xuE4oUnB6VqO9769v+A==
 =wHbY
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
 "Two stable fixes for dm-cache and one 3.19 DM core fix:

   - fix potential for dm-cache metadata corruption via stale metadata
     buffers being used when switching an inactive cache table to
     active; this could occur due to each table having it's own bufio
     client rather than sharing the client between tables.

   - fix dm-cache target to properly account for discard IO while
     suspending otherwise IO quiescing could complete prematurely.

   - fix DM core's handling of multiple internal suspends by maintaining
     an 'internal_suspend_count' and only resuming the device when this
     count drops to zero"

* tag 'dm-3.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix handling of multiple internal suspends
  dm cache: fix problematic dual use of a single migration count variable
  dm cache: share cache-metadata object across inactive and active DM tables
2015-01-25 17:25:01 -08:00
Linus Torvalds 8e908e9904 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull two block layer fixes from Jens Axboe:
 "Two small patches that should make it into 3.19:

   - a fixup from me for NVMe, making the cq_vector a signed variable.
     Otherwise our -1 comparison fails, and commit 2b25d98179 doesn't
     do what it was supposed to.

   - a fixup for the hotplug handling for blk-mq from Ming Lei, using
     the proper kobject referencing to ensure we release resources at
     the right time"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: fix hctx/ctx kobject use-after-free
  NVMe: cq_vector should be signed
2015-01-25 17:23:34 -08:00
David S. Miller 5c66cfe078 Merge branch 'phy_dsa'
Florian Fainelli says:

====================
net: phy and dsa random fixes/cleanups

These two patches were already present as part of my attempt to make
DSA modules work properly, these are the only two "valid" patches at
this point which should not need any further rework.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 16:02:33 -08:00
Florian Fainelli 691c9a8fdc net: dsa: bcm_sf2: factor interrupt disabling in a function
Factor the interrupt disabling in a function: bcm_sf2_intr_disable()
since we are doing the same thing in the setup and suspend paths.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 16:02:13 -08:00
Florian Fainelli 799d44442c net: phy: fixed: allow setting no update_link callback
fixed_phy_set_link_update() contains an early check against a NULL
callback pointer, which basically prevents us from removing any
previous callback we may have set. The users of the fp->link_update
callback deal with a NULL callback just fine, so we really want to allow
"removing" a link_update callback to avoid dangling callback pointers
during e.g: module removal.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 16:02:13 -08:00
Vivien Didelot 24df8986f3 net: dsa: set slave MII bus PHY mask
When registering a mdio bus, Linux assumes than every port has a PHY and tries
to scan it. If a switch port has no PHY registered, DSA will fail to register
the slave MII bus. To fix this, set the slave MII bus PHY mask to the switch
PHYs mask.

As an example, if we use a Marvell MV88E6352 (which is a 7-port switch with no
registered PHYs for port 5 and port 6), with the following declared names:

	static struct dsa_chip_data switch_cdata = {
		[...]
		.port_names[0] = "sw0",
		.port_names[1] = "sw1",
		.port_names[2] = "sw2",
		.port_names[3] = "sw3",
		.port_names[4] = "sw4",
		.port_names[5] = "cpu",
	};

DSA will fail to create the switch instance. With the PHY mask set for the
slave MII bus, only the PHY for ports 0-4 will be scanned and the instance will
be successfully created.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 16:00:54 -08:00
Christophe Ricard a968639bca NFC: st21nfca: Remove unreachable code
kfree_skb(skb) in st21nfca_hci_event_received is never reach.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-26 00:09:22 +01:00
Christophe Ricard 9446f248d2 NFC: st21nfcb: Fix "WARNING: invalid free of devm_ allocated data"
ndlc pointer got allocated with devm_kzalloc in ndlc_probe function.

This gives this error message:
drivers/nfc/st21nfcb/ndlc.c:296:1-6: WARNING: invalid free of devm_ allocated data.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-26 00:09:22 +01:00
Christophe Ricard 1a94cb6025 NFC: dts: st21nfcb: Fix compatible string spelling to follow other drivers
Other drivers are following the following compatible string format for dts:
s/_/-/

Because some devices may still use the previous string, the new corrected
string is added to the of_device_id table.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-26 00:09:22 +01:00
Christophe Ricard 6b5fba4eb4 NFC: dts: st21nfca: Fix compatible string spelling to follow other drivers
Other drivers are following the following compatible string format for dts:
s/_/-/

Because some devices may still use the previous string, the new corrected
string is added to the of_device_id table.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-26 00:09:22 +01:00
Harout Hedeshian c2943f1453 net: ipv6: Add sysctl entry to disable MTU updates from RA
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.

Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:54:41 -08:00
David S. Miller 46a93af29f Merge branch 'fib_trie_next'
Alexander Duyck says:

====================
Fixes and improvements for recent fib_trie updates

While performing testing and prepping the next round of patches I found a
few minor issues and improvements that could be made.

These changes should help to reduce the overall code size and improve the
performance slighlty as I noticed a 20ns or so improvement in my worst-case
testing which will likely only result in a 1ns difference with a standard
sized trie.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:25 -08:00
Alexander Duyck 64c6272351 fib_trie: Various clean-ups for handling slen
While doing further work on the fib_trie I noted a few items.

First I was using calls that were far more complicated than they needed to
be for determining when to push/pull the suffix length.  I have updated the
code to reflect the simplier logic.

The second issue is that I realised we weren't necessarily handling the
case of a leaf_info struct surviving a flush.  I have updated the logic so
that now we will call pull_suffix in the event of having a leaf info value
left in the leaf after flushing it.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck 02525368f4 fib_trie: Move fib_find_alias to file where it is used
The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck 30cfe7c9c8 fib_trie: Use empty_children instead of counting empty nodes in stats collection
It doesn't make much sense to count the pointers ourselves when
empty_children already has a count for the number of NULL pointers stored
in the tnode.  As such save ourselves the cycles and just use
empty_children.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck 95f60ea3e9 fib_trie: Add collapse() and should_collapse() to resize
This patch really does two things.

First it pulls the logic for determining if we should collapse one node out
of the tree and the actual code doing the collapse into a separate pair of
functions.  This helps to make the changes to these areas more readable.

Second it encodes the upper 32b of the empty_children value onto the
full_children value in the case of bits == KEYLENGTH.  By doing this we are
able to handle the case of a 32b node where empty_children would appear to
be 0 when it was actually 1ul << 32.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck a80e89d4c6 fib_trie: Fall back to slen update on inflate/halve failure
This change corrects an issue where if inflate or halve fails we were
exiting the resize function without at least updating the slen for the
node.  To correct this I have moved the update of max_size into the while
loop so that it is only decremented on a successful call to either inflate
or halve.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck 69fa57b1e4 fib_trie: Fix RCU bug and merge similar bits of inflate/halve
This patch addresses two issues.

The first issue is the fact that I believe I had the RCU freeing sequence
slightly out of order.  As a result we could get into an issue if a caller
went into a child of a child of the new node, then backtraced into the to be
freed parent, and then attempted to access a child of a child that may have
been consumed in a resize of one of the new nodes children.  To resolve this I
have moved the resize after we have freed the oldtnode.  The only side effect
of this is that we will now be calling resize on more nodes in the case of
inflate due to the fact that we don't have a good way to test to see if a
full_tnode on the new node was there before or after the allocation.  This
should have minimal impact however since the node should already be
correctly size so it is just the cost of calling should_inflate that we
will be taking on the node which is only a couple of cycles.

The second issue is the fact that inflate and halve were essentially doing
the same thing after the new node was added to the trie replacing the old
one.  As such it wasn't really necessary to keep the code in both functions
so I have split it out into two other functions, called replace and
update_children.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:15 -08:00
Alexander Duyck b3832117b4 fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits
In doing performance testing and analysis of the changes I recently found
that by shifting the index I had created an unnecessary dependency.

I have updated the code so that we instead shift a mask by bits and then
just test against that as that should save us about 2 CPU cycles since we
can generate the mask while the key and pos are being processed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:15 -08:00
David S. Miller bc579ae5f9 Merge branch 'mlx4-next'
Or Gerlitz says:

====================
mlx4: Fix and enhance the device reset flow

This series from Yishai Hadas fixes the device reset flow and adds SRIOV support.

Reset flows are required whenever a device experiences errors, is unresponsive,
or is not in a deterministic state. In such cases, the driver is expected to
reset the HW and continue operation. When SRIOV is enabled, these requirements
apply both to PF and VF devices.

Currently, the mlx4 reset flow doesn't work properly: when a fatal error is
detected on the FW internal buffer the chip is not reset and stays in its
bad state. There are cases that assumed to be fatal such as non-responsive FW,
errors via closing commands but are not handled today.

The AER mechanism should also be fixed:
- It should use mlx4_load_one instead of __mlx4_init_one which is done
  upon HCA probing.
- It must be aligned with concurrent catas flow, mark device to be in
  an error state, reset chip, etc.
- Port types should be restored to their original values before error occurred.

In addition, there the SRIOV use-case isn't supported.

In above cases when the device state becomes fatal we must act as follows:
1) Reset the chip and mark the HW device state as in fatal error.
2) Wake up any pending commands, preventing new ones to come in.
3) Restart the software stack.

We also address the SRIOV mode as follows: In case the PF detects a fatal error,
it lets VFs know about that, then both itself and VFs are restarted asynchronously.
However, in case only the VF encountered a fatal case or forced to be reset, they
reset the VF stuff and then restart software.

changes from V0:

No need to call pci_disable_device upon permanent PCI error. This will
be done as part of mlx4_remove_one which is called later once we
return PCI_ERS_RESULT_DISCONNECT from the pci error handler.

Initial toggle value should use only the T bit and not the whole byte value.
Not doing so sometimes broke SRIOV as of junky value seen by the VF as a
non-ready comm channel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:19 -08:00
Yishai Hadas 0cd9302734 net/mlx4_core: Reset flow activation upon SRIOV fatal command cases
When SRIOV commands are executed over the comm-channel and get
a fatal error (e.g. timeout, closing command failure) the VF enters
into error state and reset flow is activated.

To be able to recognize whether the failure was on a closing command, the
operational code for the given VHCR command is used. Once the device entered
into an error state we prevent redundant error messages from being printed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:15 -08:00
Yishai Hadas 55ad359225 net/mlx4_core: Enable device recovery flow with SRIOV
In SRIOV, both the PF and the VF may attempt device recovery whenever they
assume that the device is not functioning.  When the PF driver resets the
device, the VF should detect this and attempt to reinitialize itself.

The VF must be able to reset itself under all circumstances, even
if the PF is not responsive.

The VF shall reset itself in the following cases:

1. Commands are not processed within reasonable time over the communication channel.
This is done considering device state and the correct return code based on
the command as was done in the native mode, done in the next patch.

2. The VF driver receives an internal error event reported by the PF on the
communication channel. This occurs when the PF driver resets the device or
when VF is out of sync with the PF.

Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
PF is not responsive.

As PF and VF may run their reset flow simulantanisly, there are several cases
that are handled:
- Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
- Prevent PF getting VF commands before it has finished initializing its resources.
- Upon VF startup, check that comm-channel is online before sending
  commands to the PF and getting timed-out.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas 2ba5fbd62b net/mlx4_core: Handle AER flow properly
Fix AER callbacks to work properly, it includes:
- Refractoring AER to be aligned with Reset flow support.
- Sync with concurrent catas flow.

In addition, fix the shutdown PCI callback to sync with
concurrent catas flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas c69453e294 net/mlx4_core: Manage interface state for Reset flow cases
We need to manage interface state to sync between reset flow and some other
relative cases such as remove_one. This has to be done to prevent certain
races. For example in case software stack is down as a result of unload call,
the remove_one should skip the unload phase.

Implement the remove_one case, handling AER and other cases comes next.

The interface can be up/down, upon remove_one, the state will include an extra
bit indicating that the device is cleaned-up, forcing other tasks to finish
before the final cleanup.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas f5aef5aa35 net/mlx4_core: Activate reset flow upon fatal command cases
We activate reset flow upon command fatal errors, when the device enters an
erroneous state, and must be reset.

The cases below are assumed to be fatal: FW command timed-out, an error from FW
on closing commands, pci is offline when posting/pending a command.

In those cases we place the device into an error state: chip is reset, pending
commands are awakened and completed immediately. Subsequent commands will
return immediately.

The return code in the above cases will depend on the command. Commands which
free and close resources will return success (because the chip was reset, so
callers may safely free their kernel resources). Other commands will return -EIO.

Since the device's state was marked as error, the catas poller will
detect this and restart the device's software stack (as is done when a FW
internal error is directly detected). The device state is protected by a
persistent mutex lives on its mlx4_dev, as such no need any more for the
hcr_mutex which is removed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas f6bc11e426 net/mlx4_core: Enhance the catas flow to support device reset
This includes:

- resetting the chip when a fatal error is detected (the current code
  does not do this).

- exposing the ability to enter error state from outside the catas code
  by calling its functionality. (E.g. FW Command timeout, AER error).

- managing a persistent device state. This is needed to sync between
  reset flow cases.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas ad9a0bf08f net/mlx4_core: Refactor the catas flow to work per device
Using a WQ per device instead of a single global WQ, this allows
independent reset handling per device even when SRIOV is used.

This comes as a pre-patch for supporting chip reset
for both native and SRIOV.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas dd0eefe3ab net/mlx4_core: Set device configuration data to be persistent across reset
When an HCA enters an internal error state, this is detected by the driver.
The driver then should reset the HCA and restart the software stack.

Keep ports information and some SRIOV configuration in a persistent area
to have it valid across reset.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:13 -08:00
Yishai Hadas 872bf2fb69 net/mlx4_core: Maintain a persistent memory for mlx4 device
Maintain a persistent memory that should survive reset flow/PCI error.
This comes as a preparation for coming series to support above flows.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:13 -08:00
Mahesh Bandewar 2aab9525c3 ipvlan: fix incorrect usage of IS_ERR() macro in IPv6 code path.
The ip6_route_output() always returns a valid dst pointer unlike in IPv4
case. So the validation has to be different from the IPv4 path. Correcting
that error in this patch.

This was picked up by a static checker with a following warning -

   drivers/net/ipvlan/ipvlan_core.c:380 ipvlan_process_v6_outbound()
        warn: 'dst' isn't an ERR_PTR

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 00:24:19 -08:00
Sasha Levin 6b8d9117cc net: llc: use correct size for sysctl timeout entries
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 00:23:21 -08:00
Eric Dumazet 6088beef3f netxen: fix netxen_nic_poll() logic
NAPI poll logic now enforces that a poller returns exactly the budget
when it wants to be called again.

If a driver limits TX completion, it has to return budget as well when
the limit is hit, not the number of received packets.

Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade56 ("net: less interrupt masking in NAPI")
Cc: Manish Chopra <manish.chopra@qlogic.com>
Acked-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 00:21:45 -08:00
Andy Shevchenko 7aee42c676 cxgb3: re-use native hex2bin()
Call hex2bin() library function instead of doing conversion here.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 00:09:41 -08:00
Andy Shevchenko 51487ae736 usbnet: re-use native hex2bin()
Call hex2bin() library function, instead of doing conversion here.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 00:09:41 -08:00
David S. Miller bc0247a4ab Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-01-22

This series contains updates to e1000, e1000e, igb, fm10k and virtio_net.

Asaf Vertz provides a fix for e1000 to future-proof the time comparisons
by using time_after_eq() instead of plain math.

Mathias Koehrer provides a fix for e1000e to add a check to e1000_xmit_frame()
to ensure a work queue will not be scheduled that has not been initialized.

Jacob adds the use of software timestamping via the virtio_net driver.

Alex Duyck cleans up page reuse code in igb and fm10k.  Cleans up the
page reuse code from getting into a state where all the workarounds
needed are in place as well as cleaning up oversights, such as using
__free_pages instead of put_page to drop a locally allocated page.

Richard Cochran provides 4 patches for igb dealing with time sync.
First provides a helper function since the code that handles the time
sync interrupt is repeated in three different places.  Then serializes
the access to the time sync interrupt since the registers may be
manipulated from different contexts.  Enables the use of i210 device
interrupt to generate an internal PPS event for adjusting the kernel
system time.  The i210 device offers a number of special PTP hardware
clock features on the Software Defined Pins (SDPs), so added support for
two of the possible functions (time stamping external events and
periodic output signals).

Or Gerlitz fixes fm10k from double setting of NETIF_F_SG since the
networking core does it for the driver during registration time.

Joe Stringer adds support for up to 104 bytes of inner+outer headers in
fm10k and adds an initial check to fail encapsulation offload if these
are too large.

Matthew increases the timeout for the data path reset based on feedback
from the hardware team, since 100us is too short of a time to wait for
the data path reset to complete.

Alexander Graf provides a fix for igb to indicate failure on VF reset
for an empty MAC address, to mirror the behavior of ixgbe.

Florian Westphal updates e1000 and e1000e to support txtd update delay
via xmit_more, this way we won't update the Tx tail descriptor if the
queue has not been stopped and we know at least one more skb will be
sent right away.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:24:36 -08:00
David S. Miller 86b368b4b4 Merge branch 'vxlan_tx'
Tom Herbert says:

====================
vxlan: Don't use UDP socket for transmit

UDP socket is not pertinent to transmit for UDP tunnels, checksum
enablement can be done without a socket. This patch set eliminates
reference to a socket in udp_tunnel_xmit functions and in VXLAN
transmit.

Also, make GBP, RCO, can CSUM6_RX flags visible to receive socket
and only match these for shareable socket.

v2: Fix geneve to call udp_tunnel_xmit with good arguments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:46 -08:00
Tom Herbert af33c1adae vxlan: Eliminate dependency on UDP socket in transmit path
In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:40 -08:00
Tom Herbert d998f8efa4 udp: Do not require sock in udp_tunnel_xmit_skb
The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:40 -08:00
Govindarajulu Varadarajan f104fedc0d enic: fix rx napi poll return value
With the commit d75b1ade56 ("net: less interrupt masking in NAPI") napi repoll
is done only when work_done == budget. When we are in busy_poll we return 0 in
napi_poll. We should return budget.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 22:39:03 -08:00
David S. Miller 5d7b045b73 ath9k:
* fix an IRQ storm caused by commit 872b5d814f
 
 iwlwifi:
 
 * A fix for scan that fixes a firmware assertion
 
 * A fix that improves roaming behavior. Same fix has been tested for
   a while in iwldvm. This is a bit of a work around, but the real fix
   should be in mac80211 and will come later.
 
 * A fix for BARs that avoids a WARNING.
 
 * one fix for rfkill while scheduled scan is running.
   Linus's system hit this issue. WiFi would be unavailable
   after this has happpened because of bad state in cfg80211.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJUvhf2AAoJEG4XJFUm622blNIH/3JlfJZxjfFs5nCkLPRS03m1
 GNMhwHuoFDUSHNUCO57kROVcoryvd3D9kNA5bDGXkNHQIS1DQj4K44mZFKfF6L1K
 Kon/OP+pxJXpDV+5G42zF5QSRLg6uGb/cvxKEXyU9MhISXcWIIyncqAwZZWzoaFS
 ZcjqhvO0iUbdywrbU8nAAOH8+8zwL16A5nZxadeBF6yMf939EUsDzcDW9WoSNSsE
 vYZlHTRsymx2TANquoFBo8/mSeB0jcd+1eBr6mMetzUJLfjvxihbyP9Ci+C31ov4
 592s9dGQpxgri/qbRMt0XjwxAXRYnXluu2Rcf4jmmilQON7cZiZKZQnukpgIu3I=
 =cq2e
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2015-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

ath9k:

* fix an IRQ storm caused by commit 872b5d814f

iwlwifi:

* A fix for scan that fixes a firmware assertion

* A fix that improves roaming behavior. Same fix has been tested for
  a while in iwldvm. This is a bit of a work around, but the real fix
  should be in mac80211 and will come later.

* A fix for BARs that avoids a WARNING.

* one fix for rfkill while scheduled scan is running.
  Linus's system hit this issue. WiFi would be unavailable
  after this has happpened because of bad state in cfg80211.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 21:55:26 -08:00