Commit Graph

575623 Commits

Author SHA1 Message Date
Shuah Khan 64f0085001 MAINTAINERS: update Kselftest Framework mailing list
Kselftest Framework now has a dedicated mailing list linux-kselftest.
Update the entry in MAINTAINERS file.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Toshi Kani 9273a8bbf5 devm_memremap_release(): fix memremap'd addr handling
The pmem driver calls devm_memremap() to map a persistent memory range.
When the pmem driver is unloaded, this memremap'd range is not released
so the kernel will leak a vma.

Fix devm_memremap_release() to handle a given memremap'd address
properly.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Vaishali Thakkar f8b74815a4 mm/hugetlb.c: fix incorrect proc nr_hugepages value
Currently incorrect default hugepage pool size is reported by proc
nr_hugepages when number of pages for the default huge page size is
specified twice.

When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages
indicates the current number of pre-allocated huge pages of the default
size.  Basically /proc/sys/vm/nr_hugepages displays default_hstate->
max_huge_pages and after boot time pre-allocation, max_huge_pages should
equal the number of pre-allocated pages (nr_hugepages).

Test case:

Note that this is specific to x86 architecture.

Boot the kernel with command line option 'default_hugepagesz=1G
hugepages=X hugepagesz=2M hugepages=Y hugepagesz=1G hugepages=Z'.  After
boot, 'cat /proc/sys/vm/nr_hugepages' and 'sysctl -a | grep hugepages'
returns the value X.  However, dmesg output shows that Z huge pages were
pre-allocated.

So, the root cause of the problem here is that the global variable
default_hstate_max_huge_pages is set if a default huge page size is
specified (directly or indirectly) on the command line.  After the command
line processing in hugetlb_init, if default_hstate_max_huge_pages is set,
the value is assigned to default_hstae.max_huge_pages.  However,
default_hstate.max_huge_pages may have already been set based on the
number of pre-allocated huge pages of default_hstate size.

The solution to this problem is if hstate->max_huge_pages is already set
then it should not set as a result of global max_huge_pages value.
Basically if the value of the variable hugepages is set multiple times on
a command line for a specific supported hugepagesize then proc layer
should consider the last specified value.

Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Hugh Dickins 457a98b080 mm, x86: fix pte_page() crash in gup_pte_range()
Commit 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings") has
moved up the pte_page(pte) in x86's fast gup_pte_range(), for no
discernible reason: put it back where it belongs, after the pte_flags
check and the pfn_valid cross-check.

That may be the cause of the NULL pointer dereference in
gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a
qemu-kvm based VM.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Michael Long <Harn-Solo@gmx.de>
Tested-by: Michael Long <Harn-Solo@gmx.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Jeff Layton 0918f1c309 fsnotify: turn fsnotify reaper thread into a workqueue job
We don't require a dedicated thread for fsnotify cleanup.  Switch it
over to a workqueue job instead that runs on the system_unbound_wq.

In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Jeff Layton 13d34ac6e5 Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
This reverts commit c510eff6be ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").

Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.

The above commit changed the code to use call_srcu to clean up the
marks.  While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.

There's also another potential problem with using call_srcu here.  While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.

It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys.  While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.

This patch reverts the above patch.  A later patch will take a different
approach to eliminated the dedicated thread here.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Kirill A. Shutemov 48f7df3294 mm: fix regression in remap_file_pages() emulation
Grazvydas Ignotas has reported a regression in remap_file_pages()
emulation.

Testcase:
	#define _GNU_SOURCE
	#include <assert.h>
	#include <stdlib.h>
	#include <stdio.h>
	#include <sys/mman.h>

	#define SIZE    (4096 * 3)

	int main(int argc, char **argv)
	{
		unsigned long *p;
		long i;

		p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
				MAP_SHARED | MAP_ANONYMOUS, -1, 0);
		if (p == MAP_FAILED) {
			perror("mmap");
			return -1;
		}

		for (i = 0; i < SIZE / 4096; i++)
			p[i * 4096 / sizeof(*p)] = i;

		if (remap_file_pages(p, 4096, 0, 1, 0)) {
			perror("remap_file_pages");
			return -1;
		}

		if (remap_file_pages(p, 4096 * 2, 0, 1, 0)) {
			perror("remap_file_pages");
			return -1;
		}

		assert(p[0] == 1);

		munmap(p, SIZE);

		return 0;
	}

The second remap_file_pages() fails with -EINVAL.

The reason is that remap_file_pages() emulation assumes that the target
vma covers whole area we want to over map.  That assumption is broken by
first remap_file_pages() call: it split the area into two vma.

The solution is to check next adjacent vmas, if they map the same file
with the same flags.

Fixes: c8d78c1823 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Kirill A. Shutemov 69a8ec2d81 thp, dax: do not try to withdraw pgtable from non-anon VMA
DAX doesn't deposit pgtables when it maps huge pages: nothing to
withdraw. It can lead to crash.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18 16:23:24 -08:00
Alexander Duyck ffcc55c0c2 i40e: Add support for ATR w/ IPv6 extension headers
This patch updates the code for determining the L4 protocol and L3 header
length so that when IPv6 extension headers are being used we can determine
the offset and type of the L4 protocol.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:57:26 -08:00
Alexander Duyck f608e6a60f i40evf: Update feature flags to reflect newly enabled features
Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums.  As such we can update the
feature flags to reflect that.

In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.

I also found one spot where we were setting the same flag twice.  It looks
like it was probably a git merge error that resulted in the line being
duplicated.  As such I have dropped it in this patch.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:30:55 -08:00
Alexander Duyck bc5d252b36 i40e: Update feature flags to reflect newly enabled features
Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums.  As such we can update the
feature flags to reflect that.

In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:26:28 -08:00
Alexander Duyck 84d5946d49 i40e: Do not drop support for IPv6 VXLAN or GENEVE tunnels
All of the documentation in the datasheets for the XL710 do not call out
any reason to exclude support for IPv6 based tunnels.  As such I am
dropping the code that was excluding these tunnel types from having their
port numbers recognized.  This way we can take advantage of things such as
checksum offload for inner headers over IPv6 based VXLAN or GENEVE
tunnels.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:14:59 -08:00
Alexander Duyck 6b037cd465 i40e: Fix ATR in relation to tunnels
This patch contains a number of fixes to make certain that we are using
the correct protocols when parsing both the inner and outer headers of a
frame that is mixed between IPv4 and IPv6 for inner and outer.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:01:35 -08:00
Alexander Duyck 5453205cd0 i40e/i40evf: Enable support for SKB_GSO_UDP_TUNNEL_CSUM
The XL722 has support for providing the outer UDP tunnel checksum on
transmits.  Make use of this feature to support segmenting UDP tunnels with
outer checksums enabled.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 14:46:23 -08:00
Alexander Duyck fad57330b6 i40e/i40evf: Clean-up Rx packet checksum handling
This is mostly a minor clean-up for the Rx checksum path in order to avoid
some of the unnecessary conditional checks that were being applied.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 14:38:56 -08:00
David S. Miller f58ee41e83 Merge branch 'qed-vlan-filtering'
Yuval Mintz says:

====================
qed{,e}: Add vlan filtering offload

This series adds vlan filtering offload to qede.
First patch introduces small additional infrastructure needed in
qed to support it, while second contains the main bulk of driver changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 16:07:45 -05:00
Sudarsana Reddy Kalluru 7c1bfcad9f qede: Add vlan filtering offload support
Device would start receiving only vlan-tagged traffic with tags matching
that of one of the configured vlan IDs, unless:
  - Device is expliicly placed in PROMISC mode.
  - Device exhausts its vlan filter credits.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 16:07:45 -05:00
Yuval Mintz 3f9b4a6972 qed: Lay infrastructure for vlan filtering offload
Today, interfaces are working in vlan-promisc mode; But once
vlan filtering offloaded would be supported, we'll need a method to
control it directly [e.g., when setting device to PROMISC, or when
running out of vlan credits].

This adds the necessary API for L2 client to manually choose whether to
accept all vlans or only those for which filters were configured.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 16:07:44 -05:00
Arnd Bergmann f3bb23764f USB: cdc_subset: only build when one driver is enabled
This avoids a harmless randconfig warning I get when USB_NET_CDC_SUBSET
is enabled, but all of the more specific drivers are not:

drivers/net/usb/cdc_subset.c:241:2: #warning You need to configure some hardware for this driver

The current behavior is clearly intentional, giving a warning when
a user picks a configuration that won't do anything good. The only
reason for even addressing this is that I'm getting close to
eliminating all 'randconfig' warnings on ARM, and this came up
a couple of times.

My workaround is to not even build the module when none of the
configurations are enable.

Alternatively we could simply remove the #warning (nothing wrong
for compile-testing), turn it into a runtime warning, or
change the Kconfig options into a menu to hide CONFIG_USB_NET_CDC_SUBSET.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 15:59:45 -05:00
Andrew F. Davis e12a285c9b net: phy: dp83848: Fix sysfs naming collision warning
Files in sysfs are created using the name from the phy_driver struct,
when two names are the same we may get a duplicate filename warning,
fix this.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 15:53:10 -05:00
Alexander Duyck 9e74a6dadb net: Optimize local checksum offload
This patch takes advantage of several assumptions we can make about the
headers of the frame in order to reduce overall processing overhead for
computing the outer header checksum.

First we can assume the entire header is in the region pointed to by
skb->head as this is what csum_start is based on.

Second, as a result of our first assumption, we can just call csum_partial
instead of making a call to skb_checksum which would end up having to
configure things so that we could walk through the frags list.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 15:29:25 -05:00
Benjamin Poirier e550785c30 ipv6: Annotate change of locking mechanism for np->opt
follows up commit 45f6fad84c ("ipv6: add complete rcu protection around
np->opt") which added mixed rcu/refcount protection to np->opt.

Given the current implementation of rcu_pointer_handoff(), this has no
effect at runtime.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 15:27:25 -05:00
Jiri Benc f468a729a2 vxlan: do not use fdb in metadata mode
In metadata mode, the vxlan interface is not supposed to use the fdb control
plane but an external one (openvswitch or static routes). With the current
code, packets may leak into the fdb handling code which usually causes them
to be dropped anyway but may have strange side effects.

Just drop the packets directly when in metadata mode if the destination data
are not correctly provided on egress.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 15:01:14 -05:00
Anton Protopopov e60b13e4f5 mISDN: prevent possible NULL pointer dereference
A return value of the bchannel_get_rxbuf() function is compared with the
positive ENOMEM value instead of the negative -ENOMEM value to detect a
memory allocation problem. Thus, after a possible memory allocation
failure the bc->bch.rx_skb will be NULL which will lead to a NULL
pointer dereference.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:59:35 -05:00
Anton Protopopov 449f14f01f net: caif: fix erroneous return value
The cfrfml_receive() function might return positive value EPROTO

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:59:35 -05:00
Anton Protopopov 48bb230e87 appletalk: fix erroneous return value
The atalk_sendmsg() function might return wrong value ENETUNREACH
instead of -ENETUNREACH.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:59:34 -05:00
Amitoj Kaur Chawla a09f4af177 lance: Return correct error code
Failure of kzalloc should cause the enclosing function
to return -ENOMEM, not -ENODEV.

Additionally, removed the following checkpatch warnings:
ERROR: spaces required around that '==' (ctx:VxV)
ERROR: space required before the open parenthesis '('
CHECK: Comparison to NULL could be written "!lp"

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:58:47 -05:00
Phil Sutter a813104d92 IFF_NO_QUEUE: Fix for drivers not calling ether_setup()
My implementation around IFF_NO_QUEUE driver flag assumed that leaving
tx_queue_len untouched (specifically: not setting it to zero) by drivers
would make it possible to assign a regular qdisc to them without having
to worry about setting tx_queue_len to a useful value. This was only
partially true: I overlooked that some drivers don't call ether_setup()
and therefore not initialize tx_queue_len to the default value of 1000.
Consequently, removing the workarounds in place for that case in qdisc
implementations which cared about it (namely, pfifo, bfifo, gred, htb,
plug and sfb) leads to problems with these specific interface types and
qdiscs.

Luckily, there's already a sanitization point for drivers setting
tx_queue_len to zero, which can be reused to assign the fallback value
most qdisc implementations used, which is 1.

Fixes: 348e3435cb ("net: sched: drop all special handling of tx_queue_len == 0")
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:56:53 -05:00
Jiri Benc d13b161c2c gre: clear IFF_TX_SKB_SHARING
ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre
as it modifies the skb on xmit.

Also, clean up whitespace in ipgre_tap_setup when we're already touching it.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:43:48 -05:00
Jiri Benc fc41cdb322 geneve: clear IFF_TX_SKB_SHARING
ether_setup sets IFF_TX_SKB_SHARING but this is not supported by
geneve as it modifies the skb on xmit.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:43:47 -05:00
Jiri Benc 82a0f6b4ab vxlan: clear IFF_TX_SKB_SHARING
ether_setup sets IFF_TX_SKB_SHARING but this is not supported by vxlan
as it modifies the skb on xmit.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:43:47 -05:00
David S. Miller f0a404505c Merge branch 'iptunnel-pkt-scrub-consolidate'
Jiri Benc says:

====================
iptunnel: scrub packet in iptunnel_pull_header

As every IP tunnel has to scrub skb on decapsulation, iptunnel_pull_header
tried to do that and open coded part of skb_scrub_packet. Various tunneling
protocols (VXLAN, Geneve) then called full skb_scrub_packet on their own,
duplicating part of the scrubbing already done.

Consolidate the code, calling skb_scrub_packet from iptunnel_pull_header.
This will allow additional cleanups in VXLAN code, as the packet is scrubbed
early during rx processing after this patchset and VXLAN can start filling
out skb fields earlier.

The full picture of vxlan cleanup patches can be seen at:
https://github.com/jbenc/linux-vxlan/commits/master
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:35:02 -05:00
Jiri Benc 7f290c9435 iptunnel: scrub packet in iptunnel_pull_header
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:54 -05:00
Jiri Benc c9e78efb6f vxlan: move vxlan device lookup before iptunnel_pull_header
This is in preparation for iptunnel_pull_header calling skb_scrub_packet.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:54 -05:00
Jiri Benc 9fc4754582 geneve: move geneve device lookup before iptunnel_pull_header
This is in preparation for iptunnel_pull_header calling skb_scrub_packet.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:31 -05:00
Jiri Benc 1e9f12ec92 geneve: implement geneve_get_sk_family helper
Similarly to the existing vxlan_get_sk_family.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:31 -05:00
Vivien Didelot 7c25b16dbb net: bridge: log port STP state on change
Remove the shared br_log_state function and print the info directly in
br_set_state, where the net_bridge_port state is actually changed.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:20:08 -05:00
David S. Miller 37a6351a64 Merge branch 'cxgb4-addr-sync'
Hariprasad Shenai says:

====================
cxgb4: Use __dev_[um]c_[un]sync for MAC address syncing

This patch series adds support to use __dev_uc_sync/__dev_mc_sync to add
MAC address and __dev_uc_unsync/__dev_mc_unsync to delete MAC address.

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:16:13 -05:00
Hariprasad Shenai fe5d2709b0 cxgb4vf: Use __dev_uc_sync/__dev_mc_sync to sync MAC address
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:16:12 -05:00
Hariprasad Shenai fc08a01a69 cxgb4: Use __dev_uc_sync/__dev_mc_sync to sync MAC address
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:16:12 -05:00
Alexander Duyck 529f1f652e i40e/i40evf: Add exception handling for Tx checksum
Add exception handling to the Tx checksum path so that we can handle cases
of TSO where the frame is bad, or Tx checksum where we didn't recognize a
protocol

Drop I40E_TX_FLAGS_CSUM as it is unused, move the CHECKSUM_PARTIAL check
into the function itself so that we can decrease indent.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 11:05:58 -08:00
Alexander Duyck 475b4205aa i40e/i40evf: Do not write to descriptor unless we complete
This patch defers writing to the Tx descriptor bits until we know we have
successfully completed a given operation.  So for example we defer updating
the tunnelling portion of the context descriptor until we have fully
identified the type.

The advantage to this approach is that we can assemble values as we go
instead of having to try and kludge everything together all at once.  As a
result we can significantly clean up the tunneling configuration for
instance as we can just do a pointer walk and do the math for the distance
between each set of points.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 11:00:56 -08:00
David Wragg aeee0e66c6 geneve: Refine MTU limit
Calculate the maximum MTU taking into account the size of headers
involved in GENEVE encapsulation, as for other tunnel types.

Changes in v3:
- Correct comment style
Changes in v2:
- Conform more closely to ip_tunnel_change_mtu
- Exclude GENEVE options from max MTU calculation

Signed-off-by: David Wragg <david@weave.works>
Acked-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 13:57:15 -05:00
Jiri Benc 07dabf20d9 vxlan: tun_id is 64bit, not 32bit
The tun_id field in struct ip_tunnel_key is __be64, not __be32. We need to
convert the vni to tun_id correctly.

Fixes: 54bfd872bf ("vxlan: keep flags and vni in network byte order")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 13:55:24 -05:00
Alexander Duyck a3fd9d8876 i40e/i40evf: Handle IPv6 extension headers in checksum offload
This patch adds support for IPv6 extension headers in setting up the Tx
checksum.  Without this patch extension headers would cause IPv6 traffic to
fail as the transport protocol could not be identified.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 10:49:42 -08:00
Alexander Duyck a0064728f8 i40e/i40evf: Add support for IPv4 encapsulated in IPv6
This patch fixes two issues.  First was the fact that iphdr(skb)->protocl
was being used to test for the outer transport protocol.  This completely
breaks IPv6 support.  Second was the fact that we cleared the flag for v4
going to v6, but we didn't take care of txflags going the other way.  As
such we would have the v6 flag still set even if the inner header was v4.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 10:45:12 -08:00
Alexander Duyck b96b78f2b7 i40e/i40evf: Replace header pointers with unions of pointers in Tx checksum path
The Tx checksum path was maintaining a set of 3 pointers and two lengths in
order to prepare the packet for being checksummed.  The thing is we only
really needed 2 pointers, and the lengths that were being maintained can
easily be computed.

As such we can replace the IPv4 and IPv6 header pointers with one single
union that represents both, or a generic pointer to the start of the
network header.  For the L4 headers we can do the same with TCP and a
generic pointer to the start of the transport header.  The length of the
TCP header is obtained by simply multiplying doff by 4, and the network
header length can be obtained by subtracting the network header pointer
from the transport header pointer.

While I was at it I renamed l4_hdr to l4_proto to make it a bit more clear
and less likely to be confused with l4.hdr which is the transport header
pointer.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 10:40:22 -08:00
Alexander Duyck c777019af1 i40e/i40evf: Consolidate all header changes into TSO function
This patch goes through and pulls all of the spots where we were updating
either the TCP or IP checksums in the TSO and checksum path into the TSO
function.  The general idea here is that we should only be updating the
header after we verify we have completed a skb_cow_head check to verify the
head is writable.

One other advantage to doing this is that it makes things much more
obvious.  For example, in the case of IPv6 there was one spot where the
offset of the IPv4 header checksum was being updated which is obviously
incorrect.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 10:37:15 -08:00
Alexander Duyck c49a7bc330 i40e/i40evf: Factor out L4 header and checksum from L3 bits in TSO path
This patch makes it so that the L4 header offsets and such can be ignored
when dealing with the L3 checksum and length update.  This is done making
use of two things.

First we can just use the offset from the L4 header to the start of the
packet to determine the L4 offset, and from that we can then make use of
the data offset to determine the full length of the headers.

As far as adjusting the checksum to remove the length we can simply add the
inverse of the length instead of having to recompute the entire
pseudo-header without the length.  In the case of an IPv6 header this
should be significantly cheaper since we can make use of a value we already
needed instead of having to read the source and destination address out of
the packet.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 10:34:05 -08:00
Alexander Duyck 03f9d6a59f i40e/i40evf: Use u64 values instead of casting them in TSO function
Instead of casing u32 values to u64 it makes more sense to just start out
with u64 values in the first place.  This way we don't need to create a
mess with all of the casts needed to populate a 64b value.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 10:30:55 -08:00