Commit Graph

51585 Commits

Author SHA1 Message Date
Stephen Hemminger 3a8963acc7 Revert "hv_netvsc: make inline functions static"
These functions are used by other code misc-next tree.

This reverts commit 30d1de08c8.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:23:00 -07:00
Mohamad Haj Yahia 737a234bb6 net/mlx5: Introduce attach/detach to interface API
Add attach/detach callbacks to interface API.
This is crucial for implementing seamless reset flow which releases the
hardware and it's resources upon detach while keeping software
structures and state (e.g netdev) then reset and reallocate the hardware
needed resources upon attach.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia 6b6adee3da net/mlx5: SRIOV core code refactoring
Simplify the code and makes it look modular and symmetric.
Split sriov enable/disable to two levels: device level and pci level.
When user enable/disable sriov (via sriov_configure driver callback) we
will enable/disable both device and pci sriov.
When driver load/unload we will enable/disable (on demand) only device
sriov while keeping the PCI sriov enabled for next driver load.
On internal/pci error, VFs will be kept enabled on PCI and the reset
is done only in device level.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Daniel Borkmann f3694e0012 bpf: add BPF_CALL_x macros for declaring helpers
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
that are used today. Motivation for this is to hide all the register handling
and all necessary casts from the user, so that it is done automatically in the
background when adding a BPF_CALL_<n>() call.

This makes current helpers easier to review, eases to write future helpers,
avoids getting the casting mess wrong, and allows for extending all helpers at
once (f.e. build time checks, etc). It also helps detecting more easily in
code reviews that unused registers are not instrumented in the code by accident,
breaking compatibility with existing programs.

BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
fundamental differences, for example, for generating the actual helper function
that carries all u64 regs, we need to fill unused regs, so that we always end up
with 5 u64 regs as an argument.

I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
they look all as expected. No sparse issue spotted. We let this also sit for a
few days with Fengguang's kbuild test robot, and there were no issues seen. On
s390, it barked on the "uses dynamic stack allocation" notice, which is an old
one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
to the call wrapper, just telling that the perf raw record/frag sits on stack
(gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
and they were fine as well. All eBPF helpers are now converted to use these
macros, getting rid of a good chunk of all the raw castings.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Daniel Borkmann f035a51536 bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
which otherwise often result in overly long bytes_to_bpf_size(sizeof())
and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
check in convert_bpf_extensions(), but we should rather make that generic
as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
users to detect any rewriter size issues at compile time. Note, there are
currently none, but we want to assert that it stays this way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 19:36:04 -07:00
Yaogong Wang 9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
Eric Garver fe19c4f971 vlan: Check for vlan ethernet types for 8021.q or 802.1ad
This is to simplify using double tagged vlans. This function allows all
valid vlan ethertypes to be checked in a single function call.
Also replace some instances that check for both ETH_P_8021Q and
ETH_P_8021AD.

Patch based on one originally by Thomas F Herbert.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:28 -07:00
Lorenzo Colitti d545caca82 net: inet: diag: expose the socket mark to privileged processes.
This adds the capability for a process that has CAP_NET_ADMIN on
a socket to see the socket mark in socket dumps.

Commit a52e95abf7 ("net: diag: allow socket bytecode filters to
match socket marks") recently gave privileged processes the
ability to filter socket dumps based on mark. This patch is
complementary: it ensures that the mark is also passed to
userspace in the socket's netlink attributes.  It is useful for
tools like ss which display information about sockets.

Tested: https://android-review.googlesource.com/270210
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:13:09 -07:00
Tomer Tayar e0971c832a qed*: Add support for the ethtool get_regs operation
Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-07 17:46:59 -07:00
Tomer Tayar c965db4446 qed: Add support for debug data collection
This patch adds the support for dumping and formatting the HW/FW debug data.

Signed-off-by: Tomer Tayar <Tomer.Tayar@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-07 17:46:59 -07:00
David S. Miller 60175ccdf4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree.  Most relevant updates are the removal of per-conntrack timers to
use a workqueue/garbage collection approach instead from Florian
Westphal, the hash and numgen expression for nf_tables from Laura
Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag,
removal of ip_conntrack sysctl and many other incremental updates on our
Netfilter codebase.

More specifically, they are:

1) Retrieve only 4 bytes to fetch ports in case of non-linear skb
   transport area in dccp, sctp, tcp, udp and udplite protocol
   conntrackers, from Gao Feng.

2) Missing whitespace on error message in physdev match, from Hangbin Liu.

3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang.

4) Add nf_ct_expires() helper function and use it, from Florian Westphal.

5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also
   from Florian.

6) Rename nf_tables set implementation to nft_set_{name}.c

7) Introduce the hash expression to allow arbitrary hashing of selector
   concatenations, from Laura Garcia Liebana.

8) Remove ip_conntrack sysctl backward compatibility code, this code has
   been around for long time already, and we have two interfaces to do
   this already: nf_conntrack sysctl and ctnetlink.

9) Use nf_conntrack_get_ht() helper function whenever possible, instead
   of opencoding fetch of hashtable pointer and size, patch from Liping Zhang.

10) Add quota expression for nf_tables.

11) Add number generator expression for nf_tables, this supports
    incremental and random generators that can be combined with maps,
    very useful for load balancing purpose, again from Laura Garcia Liebana.

12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King.

13) Introduce a nft_chain_parse_hook() helper function to parse chain hook
    configuration, this is used by a follow up patch to perform better chain
    update validation.

14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the
    nft_set_hash implementation to honor the NLM_F_EXCL flag.

15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(),
    patch from Florian Westphal.

16) Don't use the DYING bit to know if the conntrack event has been already
    delivered, instead a state variable to track event re-delivery
    states, also from Florian.

17) Remove the per-conntrack timer, use the workqueue approach that was
    discussed during the NFWS, from Florian Westphal.

18) Use the netlink conntrack table dump path to kill stale entries,
    again from Florian.

19) Add a garbage collector to get rid of stale conntracks, from
    Florian.

20) Reschedule garbage collector if eviction rate is high.

21) Get rid of the __nf_ct_kill_acct() helper.

22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger.

23) Make nf_log_set() interface assertive on unsupported families.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 12:45:26 -07:00
Alexei Starovoitov aa6a5f3cb2 perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs
Allow attaching BPF_PROG_TYPE_PERF_EVENT programs to sw and hw perf events
via overflow_handler mechanism.
When program is attached the overflow_handlers become stacked.
The program acts as a filter.
Returning zero from the program means that the normal perf_event_output handler
will not be called and sampling event won't be stored in the ring buffer.

The overflow_handler_context==NULL is an additional safety check
to make sure programs are not attached to hw breakpoints and watchdog
in case other checks (that prevent that now anyway) get accidentally
relaxed in the future.

The program refcnt is incremented in case perf_events are inhereted
when target task is forked.
Similar to kprobe and tracepoint programs there is no ioctl to
detach the program or swap already attached program. The user space
expected to close(perf_event_fd) like it does right now for kprobe+bpf.
That restriction simplifies the code quite a bit.

The invocation of overflow_handler in __perf_event_overflow() is now
done via READ_ONCE, since that pointer can be replaced when the program
is attached while perf_event itself could have been active already.
There is no need to do similar treatment for event->prog, since it's
assigned only once before it's accessed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:44 -07:00
Alexei Starovoitov 0515e5999a bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type
Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to
HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE
correspondingly in uapi/linux/perf_event.h)

The program visible context meta structure is
struct bpf_perf_event_data {
    struct pt_regs regs;
     __u64 sample_period;
};
which is accessible directly from the program:
int bpf_prog(struct bpf_perf_event_data *ctx)
{
  ... ctx->sample_period ...
  ... ctx->regs.ip ...
}

The bpf verifier rewrites the accesses into kernel internal
struct bpf_perf_event_data_kern which allows changing
struct perf_sample_data without affecting bpf programs.
New fields can be added to the end of struct bpf_perf_event_data
in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-02 10:46:44 -07:00
Nikolay Aleksandrov b6cb5ac833 net: bridge: add per-port multicast flood flag
Add a per-port flag to control the unknown multicast flood, similar to the
unknown unicast flood flag and break a few long lines in the netlink flag
exports.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 22:48:33 -07:00
Roopa Prabhu d297653dd6 rtnetlink: fdb dump: optimize by saving last interface markers
fdb dumps spanning multiple skb's currently restart from the first
interface again for every skb. This results in unnecessary
iterations on the already visited interfaces and their fdb
entries. In large scale setups, we have seen this to slow
down fdb dumps considerably. On a system with 30k macs we
see fdb dumps spanning across more than 300 skbs.

To fix the problem, this patch replaces the existing single fdb
marker with three markers: netdev hash entries, netdevs and fdb
index to continue where we left off instead of restarting from the
first netdev. This is consistent with link dumps.

In the process of fixing the performance issue, this patch also
re-implements fix done by
commit 472681d57a ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
(with an internal fix from Wilson Kok) in the following ways:
- change ndo_fdb_dump handlers to return error code instead
of the last fdb index
- use cb->args strictly for dump frag markers and not error codes.
This is consistent with other dump functions.

Below results were taken on a system with 1000 netdevs
and 35085 fdb entries:
before patch:
$time bridge fdb show | wc -l
15065

real    1m11.791s
user    0m0.070s
sys 1m8.395s

(existing code does not return all macs)

after patch:
$time bridge fdb show | wc -l
35085

real    0m2.017s
user    0m0.113s
sys 0m1.942s

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 16:56:15 -07:00
David S. Miller 6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Linus Torvalds 1f6a563ee0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Segregate namespaces properly in conntrack dumps, from Liping Zhang.

 2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet.

 3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz.

 4) Fix use-after-free in tcp_xmit_retransmit_queue().

 5) Userspace header fixups (use of __u32, missing includes, etc.) from
    Mikko Rapeli.

 6) Further refinements to fragmentation wrt gso and tunnels, from
    Shmulik Ladkani.

 7) Trigger poll correctly for zero length UDP packets, from Eric
    Dumazet.

 8) TCP window scaling fix, also from Eric Dumazet.

 9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets.

10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet.

11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng.

12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet.

13) Add new device ID to alx driver, from Owen Lin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  Add Killer E2500 device ID in alx driver.
  net: smc91x: fix SMC accesses
  Documentation: networking: dsa: Remove platform device TODO
  net/mlx5: Increase number of ethtool steering priorities
  net/mlx5: Add error prints when validate ETS failed
  net/mlx5e: Fix memory leak if refreshing TIRs fails
  net/mlx5e: Add ethtool counter for TX xmit_more
  net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
  net/mlx5e: Don't wait for SQ completions on close
  net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
  net/mlx5e: Don't wait for RQ completions on close
  net/mlx5e: Limit UMR length to the device's limitation
  rhashtable: fix a memory leak in alloc_bucket_locks()
  sfc: fix potential stack corruption from running past stat bitmask
  team: loadbalance: push lacpdus to exact delivery
  net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
  8139cp: Fix one possible deadloop in cp_rx_poll
  i40e: Change some init flow for the client
  Revert "phy: IRQ cannot be shared"
  net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
  ...
2016-08-29 12:29:13 -07:00
Russell King 2fb04fdf30 net: smc91x: fix SMC accesses
Commit b70661c708 ("net: smc91x: use run-time configuration on all ARM
machines") broke some ARM platforms through several mistakes.  Firstly,
the access size must correspond to the following rule:

(a) at least one of 16-bit or 8-bit access size must be supported
(b) 32-bit accesses are optional, and may be enabled in addition to
    the above.

Secondly, it provides no emulation of 16-bit accesses, instead blindly
making 16-bit accesses even when the platform specifies that only 8-bit
is supported.

Reorganise smc91x.h so we can make use of the existing 16-bit access
emulation already provided - if 16-bit accesses are supported, use
16-bit accesses directly, otherwise if 8-bit accesses are supported,
use the provided 16-bit access emulation.  If neither, BUG().  This
exactly reflects the driver behaviour prior to the commit being fixed.

Since the conversion incorrectly cut down the available access sizes on
several platforms, we also need to go through every platform and fix up
the overly-restrictive access size: Arnd assumed that if a platform can
perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
size needed to be specified - not so, all available access sizes must
be specified.

This likely fixes some performance regressions in doing this: if a
platform does not support 8-bit accesses, 8-bit accesses have been
emulated by performing a 16-bit read-modify-write access.

Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
accesses, which was broken by the original commit.

Fixes: b70661c708 ("net: smc91x: use run-time configuration on all ARM machines")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:44:55 -04:00
Tom Herbert 0294b625ad net: Add read_sock proto_op
Add new function in proto_ops structure. This includes moving the
typedef got sk_read_actor into net.h and removing the definition from
tcp.h.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:32:41 -04:00
Linus Torvalds 25d0d91af7 Merge tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "A bunch of fixes covering i915, amdgpu, one tegra and some core DRM
  ones.  Nothing too strange at this point"

* tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux: (21 commits)
  drm/atomic: Don't potentially reset color_mgmt_changed on successive property updates.
  drm: Protect fb_defio in drivers with CONFIG_KMS_FBDEV_EMULATION
  drm/amdgpu: skip TV/CV in display parsing
  drm/amdgpu: avoid a possible array overflow
  drm/amdgpu: fix lru size grouping v2
  drm/tegra: dsi: Enhance runtime power management
  drm/i915: Fix botched merge that downgrades CSR versions.
  drm/i915/skl: Ensure pipes with changed wms get added to the state
  drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
  drm/i915/skl: Add support for the SAGV, fix underrun hangs
  drm/i915/gen6+: Interpret mailbox error flags
  drm/i915: Reattach comment, complete type specification
  drm/i915: Unconditionally flush any chipset buffers before execbuf
  drm/i915/gen9: Drop invalid WARN() during data rate calculation
  drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2)
  drm: Reject page_flip for !DRIVER_MODESET
  drm/amdgpu: fix timeout value check in amd_sched_job_recovery
  drm/amdgpu: fix sdma_v2_4_ring_test_ib
  drm/amdgpu: fix amdgpu_move_blit on 32bit systems
  drm/radeon: fix radeon_move_blit on 32bit systems
  ...
2016-08-28 14:31:36 -07:00
Linus Torvalds af56ff27eb * ARM fixes:
** fixes for ITS init issues, error handling, IRQ leakage, race conditions
 ** An erratum workaround for timers
 ** Some removal of misleading use of errors and comments
 ** A fix for GICv3 on 32-bit guests
 * MIPS fix where the guest could wrongly map the first page of physical memory
 * x86 nested virtualization fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXtyfVAAoJEL/70l94x66Dhe4IAIOGI/OYVWU5IfUQ01oeRgD3
 7wN222OmyC/K0/hSZc7ndRdcQfr5ombgM9XsS/EbkcRacWxAUHDX2FaYMpKgjT2M
 Dnh2tJHuPz/4VtByGQ2fZ4hziK7amn18/MtPFCee+mIj0ya2fcWZ4qHVU8pKC6Ps
 mVVZ0kxXsdV4pw9y6XgBLz/4bTLeASKvhFZrWOnjJoa+GeH2MFwocS0xaEI0HwxP
 HVwcgoRdGXJuKUB9jE9FDWmWOgdoLnCG1bNUOvXKPcE0ZaFQDT4I4dImkBys3rqz
 jbqnhLrpGEY2ZC3Rj+VyD2MOXbYOOSi59GRwYmCkqD96ZarHxSu3PdyCxmIFWzM=
 =+4WK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM:
   - fixes for ITS init issues, error handling, IRQ leakage, race
     conditions
   - an erratum workaround for timers
   - some removal of misleading use of errors and comments
   - a fix for GICv3 on 32-bit guests

  MIPS:
   - fix for where the guest could wrongly map the first page of
     physical memory

  x86:
   - nested virtualization fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  MIPS: KVM: Check for pfn noslot case
  kvm: nVMX: fix nested tsc scaling
  KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
  KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
  arm64: KVM: report configured SRE value to 32-bit world
  arm64: KVM: remove misleading comment on pmu status
  KVM: arm/arm64: timer: Workaround misconfigured timer interrupt
  arm64: Document workaround for Cortex-A72 erratum #853709
  KVM: arm/arm64: Change misleading use of is_error_pfn
  KVM: arm64: ITS: avoid re-mapping LPIs
  KVM: arm64: check for ITS device on MSI injection
  KVM: arm64: ITS: move ITS registration into first VCPU run
  KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic
  KVM: arm64: vgic-its: Plug race in vgic_put_irq
  KVM: arm64: vgic-its: Handle errors from vgic_add_lpi
  KVM: arm64: ITS: return 1 on successful MSI injection
2016-08-27 15:51:50 -07:00
Linus Torvalds 5e608a0270 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: silently skip readahead for DAX inodes
  dax: fix device-dax region base
  fs/seq_file: fix out-of-bounds read
  mm: memcontrol: avoid unused function warning
  mm: clarify COMPACTION Kconfig text
  treewide: replace config_enabled() with IS_ENABLED() (2nd round)
  printk: fix parsing of "brl=" option
  soft_dirty: fix soft_dirty during THP split
  sysctl: handle error writing UINT_MAX to u32 fields
  get_maintainer: quiet noisy implicit -f vcs_file_exists checking
  byteswap: don't use __builtin_bswap*() with sparse
2016-08-26 23:12:12 -07:00
Linus Torvalds fd1ae51452 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Here's a set of block fixes for the current 4.8-rc release.  This
  contains:

   - a fix for a secure erase regression, from Adrian.

   - a fix for an mmc use-after-free bug regression, also from Adrian.

   - potential zero pointer deference in bdev freezing, from Andrey.

   - a race fix for blk_set_queue_dying() from Bart.

   - a set of xen blkfront fixes from Bob Liu.

   - three small fixes for bcache, from Eric and Kent.

   - a fix for a potential invalid NVMe state transition, from Gabriel.

   - blk-mq CPU offline fix, preventing us from issuing and completing a
     request on the wrong queue.  From me.

   - revert two previous floppy changes, since they caused a user
     visibile regression.  A better fix is in the works.

   - ensure that we don't send down bios that have more than 256
     elements in them.  Fixes a crash with bcache, for example.  From
     Ming.

   - a fix for deferencing an error pointer with cgroup writeback.
     Fixes a regression.  From Vegard"

* 'for-linus' of git://git.kernel.dk/linux-block:
  mmc: fix use-after-free of struct request
  Revert "floppy: refactor open() flags handling"
  Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
  fs/block_dev: fix potential NULL ptr deref in freeze_bdev()
  blk-mq: improve warning for running a queue on the wrong CPU
  blk-mq: don't overwrite rq->mq_ctx
  block: make sure a big bio is split into at most 256 bvecs
  nvme: Fix nvme_get/set_features() with a NULL result pointer
  bdev: fix NULL pointer dereference
  xen-blkfront: free resources if xlvbd_alloc_gendisk fails
  xen-blkfront: introduce blkif_set_queue_limits()
  xen-blkfront: fix places not updated after introducing 64KB page granularity
  bcache: pr_err: more meaningful error message when nr_stripes is invalid
  bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
  bcache: register_bcache(): call blkdev_put() when cache_alloc() fails
  block: Fix race triggered by blk_set_queue_dying()
  block: Fix secure erase
  nvme: Prevent controller state invalid transition
2016-08-26 18:50:07 -07:00
Linus Torvalds 219c04cea3 PCI updates for v4.8:
Resource management
     Update "pci=resource_alignment" documentation (Mathias Koehrer)
 
   MSI
     Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig)
     Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig)
 
   Intel VMD host bridge driver
     Fix infinite loop executing irq's (Keith Busch)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXvvHgAAoJEFmIoMA60/r8y+MP/3cXG5FG3fdRB9Pn8q5BvdD4
 hmIgOpuEH859iYFOUJNxVB6haxG73Z+z3aFvNUDKTOh7GXQ7UHzNw1qj1OEmjXS0
 ex45ynjxP2DTBIO3eoyLpTKaRuOCAz03sDIucK9NzSBkrazBGoaxRRYY5gi+V8K9
 CNsbT8pPSGOrfRX6nk2QjiGMHVw+ExurvskbwuQmRYmHGU1Y1nnkmlnT5AMUJsn+
 BgSLOLFkv9UOIfLVY2+/KCnPFfi0nIxspCJHinosLl5BsgULgFPdRFKr6SiY+yhx
 W3d8MKnhDX5wsajtE7V/HurYmETiyimLL1ZBPJOCzGXwNGrwvUrt1rji+83S+1Zx
 LbrPGDRNBLW51prcrxOx1Pr/RDaHT0+/2UrSvaeUnTqniV5xW1daBPvB6h8jbKih
 C+GGqfiHURvJi1sd9QUMP95yVaxVgSBM9BrKfNYAmVYarnCpx3xbdeMv8gA0+QHx
 Ts0QKJ7ph80mcztdk2H0Ov7OK+OGjDhZPDijXiCTSPfKgK0HPsaytDCULt9LcqAv
 M8daeD42nwe36xOlNgcosj++gL+VxVYk/4BAiOWCgwF+k2EDh1ol8/eK6a+hXAqC
 qslLMNNucZP5BSsSIYc5KVGbkEXuHTw8viUNvBhLE2CWD3SY378IlzrF7QxjgKNl
 AawNqjfXY7obX3m5xr9g
 =qJG4
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Resource management:
   - Update "pci=resource_alignment" documentation (Mathias Koehrer)

  MSI:
   - Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig)
   - Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig)

  Intel VMD host bridge driver:
   - Fix infinite loop executing irq's (Keith Busch)"

* tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: VMD: Fix infinite loop executing irq's
  PCI: Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors()
  PCI: Use positive flags in pci_alloc_irq_vectors()
  PCI: Update "pci=resource_alignment" documentation
2016-08-26 18:26:07 -07:00
Subash Abhinov Kasiviswanathan e7d316a02f sysctl: handle error writing UINT_MAX to u32 fields
We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels.  An entry which we write to here is
xfrm_aevent_rseqth which is u32.

  echo 4294967295  > /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs.  However, this does not apply to unsigned integers.

Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX.  This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax.  Alternatively,
we would need to change the datatype of the entry to 64 bit.

  static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
  {
      i = (unsigned long *) data;   //This cast is causing to read beyond the size of data (u32)
      vleft = table->maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.

Introduce a new proc handler proc_douintvec.  Individual proc entries
will need to be updated to use the new handler.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 230633d109 ("kernel/sysctl.c:detect overflows when converting to int")
Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-26 17:39:35 -07:00
Johannes Berg 101b29a204 byteswap: don't use __builtin_bswap*() with sparse
Although sparse declares __builtin_bswap*(), it can't actually do
constant folding inside them (yet).  As such, things like

  switch (protocol) {
  case htons(ETH_P_IP):
          break;
  }

which we do all over the place cause sparse to warn that it expects a
constant instead of a function call.

Disable __HAVE_BUILTIN_BSWAP*__ if __CHECKER__ is defined to avoid this.

Fixes: 7322dd755e ("byteswap: try to avoid __builtin_constant_p gcc bug")
Link: http://lkml.kernel.org/r/1470914102-26389-1-git-send-email-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-26 17:39:34 -07:00
Ido Schimmel 6bc506b4fb bridge: switchdev: Add forward mark support for stacked devices
switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:36 -07:00
Pablo Neira Ayuso 5ca8cc5bf1 rhashtable: add rhashtable_lookup_get_insert_key()
This patch modifies __rhashtable_insert_fast() so it returns the
existing object that clashes with the one that you want to insert.
In case the object is successfully inserted, NULL is returned.
Otherwise, you get an error via ERR_PTR().

This patch adapts the existing callers of __rhashtable_insert_fast()
so they handle this new logic, and it adds a new
rhashtable_lookup_get_insert_key() interface to fetch this existing
object.

nf_tables needs this change to improve handling of EEXIST cases via
honoring the NLM_F_EXCL flag and by checking if the data part of the
mapping matches what we have.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-08-26 17:29:41 +02:00
Dave Airlie 30bffd1b44 drm/tegra: Fixes for v4.8-rc4
This contains one fix for DSI runtime power management support that was
 introduced in v4.8-rc1. This is slightly more elaborate than I would've
 wished, but there are a few corner cases that needed fixing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIwBAABCAAaBQJXvaiSExx0cmVkaW5nQG52aWRpYS5jb20ACgkQ3SOs138+s6Hv
 jg/6AnYkczQYraAb8Oy2YO3dIfz3r0TLGEEnY3AKudur867mNG0V59JcUXHQYKin
 FL7z4iIEx21RAtA9gUEc7xz9gitXP2W/GqRG6v1u0wLn+GS72TuNZCsqUVPQ6Glr
 yg0uLPoxsFyLa3rsTgif5jXxpsbSd6OV2NdkpImqt1EQLaOWYSNqLLq/YNvMsvCK
 QivRvvJEYznIyJayIN/PDekfYY3L5esMQNdDEI9jJKdJrTykQThx2eOIbsgY4MyC
 sdp/h9R23O1ZFvfGONrQHOOzDGumhDJtsRfew+q9JoBaOOgztk2LmPpRZFQsIrmV
 1rv1ahQh3boPWw3cSoQMCE8E3Eu2wD/zNrWYbKyhjChx/HTaSPyQGwk0v2WycizA
 KnNo3eQOTT/5Wb/KvhCaef4Q+yc4LbcAC8T3pkYyGVy+yhUOXr4SGOQLh4Za0Sag
 uaM20wk3TPYAiyUKZFE7RZ1WHtOp56gOIHbS5eV5ppuR2lmoCh4vK5yCojaz4gww
 DAHzldegReurS3du613KrV63lue1r3nwX1f0kKDsWd4yN0FH/74g/QC1jXKqJmmk
 ZevqKCQ1eoaPyOMQI2G0rvRL4B63s800oHO2L760e5SdbM1/Oraxzc+FobGZz9KW
 Y55SyNnn3QT5HcH6dntmisnuSbZVXVNonEJaPJWFLH/ChgI=
 =lyq1
 -----END PGP SIGNATURE-----

Merge tag 'drm/tegra/for-4.8-rc4' of git://anongit.freedesktop.org/tegra/linux into drm-fixes

drm/tegra: Fixes for v4.8-rc4

This contains one fix for DSI runtime power management support that was
introduced in v4.8-rc1. This is slightly more elaborate than I would've
wished, but there are a few corner cases that needed fixing.

* tag 'drm/tegra/for-4.8-rc4' of git://anongit.freedesktop.org/tegra/linux:
  drm/tegra: dsi: Enhance runtime power management
2016-08-25 12:49:22 +10:00
David S. Miller fff84d2a39 Mellanox ConnectX-4/Connect-IB shared code (SW part)
* net/mlx5: Add sniffer namespaces
 * net/mlx5: Introduce sniffer steering hardware capabilities
 * net/mlx5: Configure IB devices according to LAG state
 * net/mlx5: Vport LAG creation support
 * net/mlx5: Add LAG flow steering namespace
 * net/mlx5: LAG demux flow table support
 * net/mlx5: LAG and SRIOV cannot be used together
 * net/mlx5e: Avoid port remapping of mlx5e netdev TISes
 * net/mlx5: Get RoCE netdev
 * net/mlx5: Implement RoCE LAG feature
 * net/mlx5: Add HW interfaces used by LAG
 * net/mlx5: Separate query_port_proto_oper for IB and EN
 * net/mlx5: Expose mlx5e_link_mode
 * net/mlx5: Update struct mlx5_ifc_xrqc_bits
 * net/mlx5: Modify RQ bitmask from mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu20hAAoJEORje4g2cliniO8P/0nMxLemOxY63u7P6DqT+UZQ
 +LN62W+/iLicNayKkt8mtcjnDm768YcF3ADvx73vRvKEeUyyEqT5ChMA59eicf70
 rrumfNXB/kfBOaPh5rFWf4Tn8WWpKW+0559drm80NslFZF9jjF9pwv5QGg7xISb7
 fYLcDQWn+5fYDuZzYsSu8zZKUEyGN0AugdjfxT5OHfh4rw+6oqGDb2fhH6LdkD8q
 j3Qx1cPmdQQnjJ5veXJFJT5qHFDqJlNmy85s4l99ItdWD/bcU29ue3Q3vNf7+lHp
 XoJB4ZRWG7sf98yXYXnOUt3iGUMdSJzpLfZqh/Nx9U1LZpdJ8lmBf7pRuR1hpPIN
 yDitcz+CMcFVr2WxvwWaUPhRE7SJsZxxr6tQISgRicYcFVyy9e7mLjABMtkh9vEn
 CXXqiDGUb/27HqTi9ha5qRiLoeT8yFpOCkINL4omV2FJKoUEbC+Jbq5P0mjnPpS1
 ZdzTOzWCtkDQGtLbi+nCIF5SVTv7CCDU+6VpGZPmk6M4/ednwajhxGPsbw6bRpna
 ck5SglGO8dFAaUv1UVRq04PIt7Lj2FRakP7sHWx3tc9XEP8syLX0OEiVB+ZN3yRn
 y2TlpsREk7AqDdRulwM4qfuNd4AxaDklXyS3C79RiJtenYO4GUGrJ6J6ryesLg8u
 tGKVV3fXEr2Hve6cTkpu
 =+m21
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-24

This series contains some low level and API updates for mlx5 core
driver interface and mlx5_ifc.h, plus mlx5 LAG core driver support,
to be shared as base code for net-next and rdma mlx5 4.9 submissions.

From Alex and Artemy, Update mlx5_ifc for modify RQ and XRC bits.

From Noa, Expose mlx5 link modes so they can be used in RDMA tree for rdma tools.

From Aviv, LAG support needed for RDMA.
    - Add needed hardware structures, layouts and interface
    - mlx5 core driver LAG implementation
    - Introduce mlx5 core driver LAG API for mlx5_ib

From Maor, add two low level patches for mlx5 hardware sniffer QP
infrastructure bits and capabilities, plus added the namespace for sniffer
steering tables.  Needed for RDMA subtree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:35:35 -07:00
Thierry Reding 87904c3e82 drm/tegra: dsi: Enhance runtime power management
The MIPI DSI output on Tegra SoCs requires some external logic to
calibrate the MIPI pads before a video signal can be transmitted. This
MIPI calibration logic requires to be powered on while the MIPI pads are
being used, which is currently done as part of the DSI driver's probe
implementation.

This is suboptimal because it will leave the MIPI calibration logic
powered up even if the DSI output is never used.

On Tegra114 and earlier this behaviour also causes the driver to hang
while trying to power up the MIPI calibration logic because the power
partition that contains the MIPI calibration logic will be powered on
by the display controller at output pipeline configuration time. Thus
the power up sequence for the MIPI calibration logic happens before
it's power partition is guaranteed to be enabled.

Fix this by splitting up the API into a request/free pair of functions
that manage the runtime dependency between the DSI and the calibration
modules (no registers are accessed) and a set of enable, calibrate and
disable functions that program the MIPI calibration logic at points in
time where the power partition is really enabled.

While at it, make sure that the runtime power management also works in
ganged mode, which is currently also broken.

Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Tested-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2016-08-24 15:58:57 +02:00
Stephen Hemminger e3f74b841d hv_netvsc: report vmbus name in ethtool
Make netvsc on vmbus behave more like PCI.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 12:05:38 -07:00
Stephen Hemminger 30d1de08c8 hv_netvsc: make inline functions static
Several new functions were introduced into hyperv.h but only used in one file.
Move them and let compiler decide on inline.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 12:05:36 -07:00
David S. Miller 01555e6449 Mellanox ConnectX-4/Connect-IB shared code (HW part)
* net/mlx5: Introduce alloc_encap and dealloc_encap commands
 * net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
 * net/mlx5: Enable setting minimum inline header mode for VFs
 * net/mlx5: Improve driver log messages
 * net/mlx5: Unify and improve command interface
 * {net,IB}/mlx5: Modify QP commands via mlx5 ifc
 * {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
 * {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
 * {net,IB}/mlx5: CQ commands via mlx5 ifc
 * net/mlx5: EQ commands via mlx5 ifc
 * net/mlx5: Pages management commands via mlx5 ifc
 * net/mlx5: MCG commands via mlx5 ifc
 * net/mlx5: PD and UAR commands via mlx5 ifc
 * net/mlx5: Access register and MAD IFC commands via mlx5 ifc
 * net/mlx5: Init/Teardown hca commands via mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu2zqAAoJEORje4g2clin0dQP/3SF9+4lxVaWRDnhutwIdaxd
 GDWDCEcp1x8oC1ylKEfQW57tTG8mk6pEFD5xEZSAJMGjGm5zR8QnaIS9eiPTdDkf
 QIReMP9XJUUVDqXZ8F207PwVbgB4IkHB2VPyl2Sar1HULe6Mn3nAS40A1QfYpVzs
 cYC3SFOPuLsTDZkIVQrZzKvX4WVHjcyj0tAkXkutWQ+K8cPXmpx49+ngrzVm6xnw
 j6THx3kOAEwozW5NxMC7V6DOD7KfLWzPi96BLZ2h4eQynpgJnSLOCar3zyBPH5g3
 KAk99tVjD1kp+HreSNzCd+oP8Zqrw+RBt3WlrGX2GvQ0V7XIJrpkRbLDgWhbBjej
 O1ln/xr5pqLSKgxz41LsFlrLWbOgG7r4N212iMNv3rArb9e11tqZCAbR0OzX5vZ6
 fl2W7moYRB2273Y+MnB/e1e8xf7PEIppWnyvyPrzCz1lSdzw1BzLqz5tWz2nc1dB
 yQWosVTf4xTa3OQHhUqw6CbhpRpywQZx1ZhmAzZ7+hQ90Z4hwPWWXIx7MNa4g2sJ
 toUamuonbnib3wBLQzzW2ktTbdJUx8OTF5aiVNC06QG8KAvXeUAP2Ho95Am3JpLJ
 XZ14ZP0NxOFaGgOSDRxEVKuhnUnXuIG57NSgQpMD5rjSieMl+msasrydP8X4+qny
 HlwA4nwt2bHf9k7Cg1iM
 =QIAc
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-20

This series contains several low level and API updates for mlx5 core
commands interface and mlx5_ifc.h to be shared as base code for net-next and
rdma mlx5 4.9 submissions.

From Saeed, ten patches that refactors old layouts of firmware commands which
were manually generated before we introduced the mlx5_ifc, now all of the firmware
commands inbox/outbox layouts moved to use mlx5_ifc and we remove the old
manually generated structures.  Plus to those ten patches, we add two patches
that unifies mlx5 commands execution interface and improve the driver log messages
in that area.

From Hadar and Ilya, added the needed hardware bits and infrastructure for
minimum inline headers setting and encap/decap commands and capabilities,
needed for E-Switch offloads.

This series applies on top latest net-next and rdma/master, and smoothly merges with
the latest "Mellanox 100G mlx5 fixes 2016-08-16" series already applied into net branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 11:08:23 -07:00
Yuval Mintz f1ff8666ed qed: Fix address macros
Last FW submission reverted various macros into an older form,
where they generate compilation warnings on some architectures.

Bring back the newer macros instead.

Fixes: 05fafbfb3d ("qed: utilize FW 8.10.10.0")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 22:41:32 -07:00
Yuval Mintz d8c2c7e340 qed*: Add support for VFs over legacy PFs
Modern VFs can't run on old non-compatible as the fastpath HSI is
slightly changed - but as the HSI is actually very close [basically,
a single bit whose meaning flipped] this can be supported with small
modifications.

The major differences would be in:
  - Recognizing that VF is running on top of a legacy PF.
  - Returning some slowpath configurations that are no longer needed
    on top of modern PFs, but would be required when working over
    the legacy ones.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 18:24:52 -07:00
Yuval Mintz 05fafbfb3d qed: utilize FW 8.10.10.0
This new firmware for the qed* adpaters fixes several issues:
 - Better blocking of malicious VFs.
 - After FLR, Tx-switching [internal routing] of packets might
   be incorrect.
 - Deletion of unicast MAC filters would sometime have side-effect
   of corrupting the MAC filters configred for a device.
It also contains fixes for future qed* drivers that *hopefully* would be
sent for review in the near future.

In addition, it would allow driver some new functionality, including:
 - Allowing PF/VF driver compaitibility with old drivers [running
   pre-8.10.5.0 firmware].
 - Better debug facilities.

This would also bump the qed* driver versions to 8.10.9.20.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 17:57:06 -07:00
Herbert Xu 246779dd09 rhashtable: Remove GFP flag from rhashtable_walk_init
The commit 8f6fd83c6c ("rhashtable:
accept GFP flags in rhashtable_walk_init") added a GFP flag argument
to rhashtable_walk_init because some users wish to use the walker
in an unsleepable context.

In fact we don't need to allocate memory in rhashtable_walk_init
at all.  The walker is always paired with an iterator so we could
just stash ourselves there.

This patch does that by introducing a new enter function to replace
the existing init function.  This way we don't have to churn all
the existing users again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 14:40:24 -07:00
Daniel Borkmann 5293efe62d bpf: add bpf_skb_change_tail helper
This work adds a bpf_skb_change_tail() helper for tc BPF programs. The
basic idea is to expand or shrink the skb in a controlled manner. The
eBPF program can then rewrite the rest via helpers like bpf_skb_store_bytes(),
bpf_lX_csum_replace() and others rather than passing a raw buffer for
writing here.

bpf_skb_change_tail() is really a slow path helper and intended for
replies with f.e. ICMP control messages. Concept is similar to other
helpers like bpf_skb_change_proto() helper to keep the helper without
protocol specifics and let the BPF program mangle the remaining parts.
A flags field has been added and is reserved for now should we extend
the helper in future.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 23:38:16 -07:00
Rafał Miłecki 1cb94db3d1 net: bgmac: support Ethernet core on BCM53573 SoCs
BCM53573 is a new series of Broadcom's SoCs. It's based on ARM and can
be found in two packages (versions): BCM53573 and BCM47189. It shares
some code with the Northstar family, but also requires some new quirks.

First of all there can be up to 2 Ethernet cores on this SoC. If that is
the case, they are connected to two different switch ports allowing some
more complex/optimized setups. It seems the second unit doesn't come
fully configured and requires some IRQ quirk.

Other than that only the first core is connected to the PHY. For the
second one we have to register fixed PHY (similarly to the Northstar),
otherwise generic PHY driver would get some invalid info.

This has been successfully tested on Tenda AC9 (BCM47189B0).

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 23:36:07 -07:00
Hadar Hen Zion f6a6692769 flow_dissector: Get vlan priority in addition to vlan id
Add vlan priority check to the flow dissector by adding new flow
dissector struct, flow_dissector_key_vlan which includes vlan tag
fields.

vlan_id and flow_label fields were under the same struct
(flow_dissector_key_tags). It was a convenient setting since struct
flow_dissector_key_tags is used by struct flow_keys and by setting
vlan_id and flow_label under the same struct, we get precisely 24 or 48
bytes in flow_keys from flow_dissector_key_basic.

Now, when adding vlan priority support, the code will be cleaner if
flow_label and vlan tag won't be under the same struct anymore.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 23:13:13 -07:00
Yuval Mintz d194fd265e qed*: Fix pause setting
When moving into using ethtool's link_ksetting, qed started
supplying its own bitmask of speed/capabilities, but qede
is still checking for the SUPPORTED value to determine whether
it supports pause.

Fixes: 054c67d1c8 ("qed*: Add support for ethtool link_ksettings callbacks")
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 23:04:40 -07:00
David S. Miller 53409afd3e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter updates for your net tree,
they are:

1) Dump only conntrack that belong to this namespace via /proc file.
   This is some fallout from the conversion to single conntrack table
   for all netns, patch from Liping Zhang.

2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents
   module autoloading, also from Liping Zhang.

3) Report overquota event to the right netnamespace, again from Liping.

4) Fix tproxy listener sk refcount that leads to crash, from
   Eric Dumazet.

5) Fix racy refcounting on object deletion from nfnetlink and rule
   removal both for nfacct and cttimeout, from Liping Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18 18:45:34 -07:00
Linus Torvalds bd3fd451ff Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Two lockless_dereference() related fixes"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/barriers: Suppress sparse warnings in lockless_dereference()
  Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
2016-08-18 13:45:48 -07:00
Maor Gottlieb 87d22483ce net/mlx5: Add sniffer namespaces
Add sniffer TX and RX namespaces to receive ingoing and outgoing
traffic.

Each outgoing/incoming packet is duplicated and steered to the sniffer
TX/RX namespace in addition to the regular flow.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:59 +03:00
Maor Gottlieb cea824d416 net/mlx5: Introduce sniffer steering hardware capabilities
Define needed hardware capabilities for sniffer
RX and TX flow tables.

Add the following capabilities:
1. Sniffer RX flow table capabilities.
2. Sniffer TX flow table capabilities.
3. If same TIR can be used by multiple flow tables of different types.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:59 +03:00
Aviv Heller 3bc34f3bcb net/mlx5: Vport LAG creation support
Add interfaces for issuing CREATE_VPORT_LAG and
DESTROY_VPORT_LAG commands.

Used for receiving PF1's eth traffic on PF0's
root ft.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller 3e75d4ebaa net/mlx5: Add LAG flow steering namespace
This namespace is used for LAG demux flowtable.

The idea is to position the LAG demux ft between
bypass and kernel flowtables, allowing raw-eth
traffic from both ports to be received by the PF0
IB device.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller aaff1bea16 net/mlx5: LAG demux flow table support
Add interfaces to allow the creation and destruction of a
LAG demux flow table.

It is a special flow table used during LAG for redirecting
non user-mode packets from PF0 to PF1 root ft, if a packet was
received on phys port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller 6a32047a44 net/mlx5: Get RoCE netdev
Used by IB driver for determining the IB bond
device's netdev, when LAG is active.

Returns PF0's netdev if mode is not active-backup,
or the PF netdev of the active slave when mode is
active-backup.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00