While we're doing this, fix the error code for SIOCSHWTSTAMP ioctl on
non-timestamping hardware.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Pull slave-dmaengine changes from Vinod Koul:
"This brings for slave dmaengine:
- Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
dmaengine can only transfer and not verify validaty of dma
transfers
- Bunch of fixes across drivers:
- cppi41 driver fixes from Daniel
- 8 channel freescale dma engine support and updated bindings from
Hongbo
- msx-dma fixes and cleanup by Markus
- DMAengine updates from Dan:
- Bartlomiej and Dan finalized a rework of the dma address unmap
implementation.
- In the course of testing 1/ a collection of enhancements to
dmatest fell out. Notably basic performance statistics, and
fixed / enhanced test control through new module parameters
'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and
Linus [Walleij] for their review.
- Testing the raid related corner cases of 1/ triggered bugs in
the recently added 16-source operation support in the ioatdma
driver.
- Some minor fixes / cleanups to mv_xor and ioatdma"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
dma: mv_xor: Remove unneeded NULL address check
ioat: fix ioat3_irq_reinit
ioat: kill msix_single_vector support
raid6test: add new corner case for ioatdma driver
ioatdma: clean up sed pool kmem_cache
ioatdma: fix selection of 16 vs 8 source path
ioatdma: fix sed pool selection
ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
dmatest: verbose mode
dmatest: convert to dmaengine_unmap_data
dmatest: add a 'wait' parameter
dmatest: add basic performance metrics
dmatest: add support for skipping verification and random data setup
dmatest: use pseudo random numbers
dmatest: support xor-only, or pq-only channels in tests
dmatest: restore ability to start test at module load and init
dmatest: cleanup redundant "dmatest: " prefixes
dmatest: replace stored results mechanism, with uniform messages
Revert "dmatest: append verify result to results"
...
Pull networking fixes from David Miller:
"Mostly these are fixes for fallout due to merge window changes, as
well as cures for problems that have been with us for a much longer
period of time"
1) Johannes Berg noticed two major deficiencies in our genetlink
registration. Some genetlink protocols we passing in constant
counts for their ops array rather than something like
ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
using fixed IDs for their multicast groups.
We have to retain these fixed IDs to keep existing userland tools
working, but reserve them so that other multicast groups used by
other protocols can not possibly conflict.
In dealing with these two problems, we actually now use less state
management for genetlink operations and multicast groups.
2) When configuring interface hardware timestamping, fix several
drivers that simply do not validate that the hwtstamp_config value
is one the driver actually supports. From Ben Hutchings.
3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.
4) In dev_forward_skb(), set the skb->protocol in the right order
relative to skb_scrub_packet(). From Alexei Starovoitov.
5) Bridge erroneously fails to use the proper wrapper functions to make
calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
Makita.
6) When detaching a bridge port, make sure to flush all VLAN IDs to
prevent them from leaking, also from Toshiaki Makita.
7) Put in a compromise for TCP Small Queues so that deep queued devices
that delay TX reclaim non-trivially don't have such a performance
decrease. One particularly problematic area is 802.11 AMPDU in
wireless. From Eric Dumazet.
8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
here. Fix from Eric Dumzaet, reported by Dave Jones.
9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.
10) When computing mergeable buffer sizes, virtio-net fails to take the
virtio-net header into account. From Michael Dalton.
11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
bumping, this one has been with us for a while. From Eric Dumazet.
12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
Hugne.
13) 6lowpan bit used for traffic classification was wrong, from Jukka
Rissanen.
14) macvlan has the same issue as normal vlans did wrt. propagating LRO
disabling down to the real device, fix it the same way. From Michal
Kubecek.
15) CPSW driver needs to soft reset all slaves during suspend, from
Daniel Mack.
16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.
17) The xen-netfront RX buffer refill timer isn't properly scheduled on
partial RX allocation success, from Ma JieYue.
18) When ipv6 ping protocol support was added, the AF_INET6 protocol
initialization cleanup path on failure was borked a little. Fix
from Vlad Yasevich.
19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
blocks we can do the wrong thing with the msg_name we write back to
userspace. From Hannes Frederic Sowa. There is another fix in the
works from Hannes which will prevent future problems of this nature.
20) Fix route leak in VTI tunnel transmit, from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
genetlink: make multicast groups const, prevent abuse
genetlink: pass family to functions using groups
genetlink: add and use genl_set_err()
genetlink: remove family pointer from genl_multicast_group
genetlink: remove genl_unregister_mc_group()
hsr: don't call genl_unregister_mc_group()
quota/genetlink: use proper genetlink multicast APIs
drop_monitor/genetlink: use proper genetlink multicast APIs
genetlink: only pass array to genl_register_family_with_ops()
tcp: don't update snd_nxt, when a socket is switched from repair mode
atm: idt77252: fix dev refcnt leak
xfrm: Release dst if this dst is improper for vti tunnel
netlink: fix documentation typo in netlink_set_err()
be2net: Delete secondary unicast MAC addresses during be_close
be2net: Fix unconditional enabling of Rx interface options
net, virtio_net: replace the magic value
ping: prevent NULL pointer dereference on write to msg_name
bnx2x: Prevent "timeout waiting for state X"
bnx2x: prevent CFC attention
bnx2x: Prevent panic during DMAE timeout
...
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
VDvT9DEy6zjSMCLpMcdo
=xxC+
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma updates from Roland Dreier:
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
IB/core: Re-enable create_flow/destroy_flow uverbs
IB/core: extended command: an improved infrastructure for uverbs commands
IB/core: Remove ib_uverbs_flow_spec structure from userspace
IB/core: Use a common header for uverbs flow_specs
IB/core: Make uverbs flow structure use names like verbs ones
IB/core: Rename 'flow' structs to match other uverbs structs
IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
IB/ucma: Convert use of typedef ctl_table to struct ctl_table
IB/cm: Convert to using idr_alloc_cyclic()
IB/mlx5: Fix page shift in create CQ for userspace
IB/mlx4: Fix device max capabilities check
IB/mlx5: Fix list_del of empty list
IB/mlx5: Remove dead code
IB/core: Encorce MR access rights rules on kernel consumers
IB/mlx4: Fix endless loop in resize CQ
RDMA/cma: Remove unused argument and minor dead code
RDMA/ucma: Discard events for IDs not yet claimed by user space
IB/core: Add Cisco usNIC rdma node and transport types
RDMA/nes: Remove self-assignment from nes_query_qp()
IB/srp: Report receive errors correctly
...
Secondary unicast MAC addresses will get deleted only when the interface is UP.
When the interface is DOWN, though these secondary MAC addresses are unusable
and awaiting to be deleted, cause the firmware to believe that they are being used.
If the user intends to set a MAC address as primary MAC from one of these
secondary MAC addresses, the firmware returns a MAC address Collision error.
Delete these secondary MAC addresses during be_close.
The secondary MAC addresses list will be refreshed during interface open anyway.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently requests the firmware to enable rx_interface options
without considering if the interface was created with that capability.
This could cause commands to firmware to fail.
To avoid this, enable only those options on an interface if the interface
was created with that capability.
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current driver release rtnl lock in between DCB re-configuration.
As a result, other flows (e.g., mtu config) may enter in between and fail
due to halted tx path for dcb configuration.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During VF load, prior to sending messages on HW channel to PF the VF
checks its bulletin board to see whether the PF indicated it has closed;
If a closed PF is encountered, the VF skips sending the message.
Due to incorrect return values, there's a possible scenario in which the VF
finishes loading "successfully", while the PF hasn't actually fully configured
FW/HW for the VFs supposed configuration.
Once VF tries to send Tx packets, HW will raise an attention (and FW possibly
will start treat the VF as malicious).
The patch fails the loading process in such a scenario.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If chip enters a recovery flow just after the driver issues a DMAE request
the DMAE will timeout. Current code will cause a bnx2x_panic() as a result,
which means interface will no longer be usable (regardless of the recovery
results), as bnx2x_panic() is irreversible for the driver.
As this is a possible flow, the panic should be reached only when driver
is compiled with STOP_ON_ERROR.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While unloading, bnx2x needs to clean the sp_rtnl_state to prevent
configuration made before the unload to be applied afterwards with
stale values.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull dmaengine changes from Dan
1/ Bartlomiej and Dan finalized a rework of the dma address unmap
implementation.
2/ In the course of testing 1/ a collection of enhancements to dmatest
fell out. Notably basic performance statistics, and fixed / enhanced
test control through new module parameters 'run', 'wait', 'noverify',
and 'verbose'. Thanks to Andriy and Linus for their review.
3/ Testing the raid related corner cases of 1/ triggered bugs in the
recently added 16-source operation support in the ioatdma driver.
4/ Some minor fixes / cleanups to mv_xor and ioatdma.
Conflicts:
drivers/dma/dmatest.c
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...
During resume, use for_each_slave to walk the slaves of the cpsw, and
soft-reset each of them. This prevents oopses if there is only one
slave configured.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For archs with pages size of 4K, when the chunk is freed, fwp is not in the
list so avoid attempting to delete it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes bug 62491 (https://bugzilla.kernel.org/show_bug.cgi?id=62491).
After resuming some users got the following error flooding the kernel log:
alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff
Signed-off-by: Jonas Hahnfeld <linux@hahnjo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver fails to check the results of DMA mapping and results in
the following warning: (with kernel config "CONFIG_DMA_API_DEBUG" enable)
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:937 check_unmap+0x43c/0x7d8()
fec 2188000.ethernet: DMA-API: device driver failed to check map
error[device address=0x00000000383a8040] [size=2048 bytes] [mapped as single]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.17-16827-g9cdb0ba-dirty #188
[<80013c4c>] (unwind_backtrace+0x0/0xf8) from [<80011704>] (show_stack+0x10/0x14)
[<80011704>] (show_stack+0x10/0x14) from [<80025614>] (warn_slowpath_common+0x4c/0x6c)
[<80025614>] (warn_slowpath_common+0x4c/0x6c) from [<800256c8>] (warn_slowpath_fmt+0x30/0x40)
[<800256c8>] (warn_slowpath_fmt+0x30/0x40) from [<8026bfdc>] (check_unmap+0x43c/0x7d8)
[<8026bfdc>] (check_unmap+0x43c/0x7d8) from [<8026c584>] (debug_dma_unmap_page+0x6c/0x78)
[<8026c584>] (debug_dma_unmap_page+0x6c/0x78) from [<8038049c>] (fec_enet_rx_napi+0x254/0x8a8)
[<8038049c>] (fec_enet_rx_napi+0x254/0x8a8) from [<804dc8c0>] (net_rx_action+0x94/0x160)
[<804dc8c0>] (net_rx_action+0x94/0x160) from [<8002c758>] (__do_softirq+0xe8/0x1d0)
[<8002c758>] (__do_softirq+0xe8/0x1d0) from [<8002c8e8>] (do_softirq+0x4c/0x58)
[<8002c8e8>] (do_softirq+0x4c/0x58) from [<8002cb50>] (irq_exit+0x90/0xc8)
[<8002cb50>] (irq_exit+0x90/0xc8) from [<8000ea88>] (handle_IRQ+0x3c/0x94)
[<8000ea88>] (handle_IRQ+0x3c/0x94) from [<8000855c>] (gic_handle_irq+0x28/0x5c)
[<8000855c>] (gic_handle_irq+0x28/0x5c) from [<8000de00>] (__irq_svc+0x40/0x50)
Exception stack(0x815a5f38 to 0x815a5f80)
5f20: 815a5f80 3b9aca00
5f40: 0fe52383 00000002 0dd8950e 00000002 81e7b080 00000000 00000000 815ac4d8
5f60: 806032ec 00000000 00000017 815a5f80 80059028 8041fc4c 60000013 ffffffff
[<8000de00>] (__irq_svc+0x40/0x50) from [<8041fc4c>] (cpuidle_enter_state+0x50/0xf0)
[<8041fc4c>] (cpuidle_enter_state+0x50/0xf0) from [<8041fd94>] (cpuidle_idle_call+0xa8/0x14c)
[<8041fd94>] (cpuidle_idle_call+0xa8/0x14c) from [<8000edac>] (arch_cpu_idle+0x10/0x4c)
[<8000edac>] (arch_cpu_idle+0x10/0x4c) from [<800582f8>] (cpu_startup_entry+0x60/0x130)
[<800582f8>] (cpu_startup_entry+0x60/0x130) from [<80bc7a48>] (start_kernel+0x2d0/0x328)
[<80bc7a48>] (start_kernel+0x2d0/0x328) from [<10008074>] (0x10008074)
---[ end trace c6edec32436e0042 ]---
Because dma-debug add new interfaces to debug dma mapping errors, pls refer
to: http://lwn.net/Articles/516640/
After dma mapping, it must call dma_mapping_error() to check mapping error,
otherwise the map_err_type alway is MAP_ERR_NOT_CHECKED, check_unmap() define
the mapping is not checked and dump the error msg. So,add dma_mapping_error()
checking to fix the WARNING
And RX DMA buffers are used repeatedly and the driver copies it into an skb,
fec_enet_rx() should not map or unmap, use dma_sync_single_for_cpu()/dma_sync_single_for_device()
instead of dma_map_single()/dma_unmap_single().
There have another potential issue: fec_enet_rx() passes the DMA address to __va().
Physical and DMA addresses are *not* the same thing. They may differ if the device
is behind an IOMMU or bounce buffering was required, or just because there is a fixed
offset between the device and host physical addresses. Also fix it in this patch.
=============================================
V2: add net_ratelimit() to limit map err message.
use dma_sync_single_for_cpu() instead of dma_map_single().
fix the issue that pass DMA addresses to __va() to get virture address.
V1: initial send
=============================================
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hwtstamp_ioctl() should validate all fields of hwtstamp_config
before making any changes. Currently it sets the TX configuration
before validating the rx_filter field.
Untested as I don't have a cross-compiler to hand.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cpsw_hwtstamp_ioctl() should validate all fields of hwtstamp_config,
and the hardware version, before making any changes. Currently it
sets the TX configuration before validating the rx_filter field
or that the hardware supports timestamping.
Also correct the error code for hardware versions that don't
support timestamping. ENOTSUPP is used by the NFS implementation
and is not part of userland API; we want EOPNOTSUPP (which glibc
also calls ENOTSUP, with one 'P').
Untested as I don't have a cross-compiler to hand.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Mugunthan V N <mugunthanvnm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac_hwtstamp_ioctl() should validate all fields of hwtstamp_config
before making any changes. Currently it sets the TX configuration
before validating the rx_filter field.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hwtstamp_ioctl() should validate all fields of hwtstamp_config
before making any changes. Currently it sets the TX configuration
before validating the rx_filter field.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1000e_hwtstamp_ioctl() should validate all fields of hwtstamp_config
before making any changes. Currently it copies the configuration to
the e1000_adapter structure before validating it at all.
Change e1000e_config_hwtstamp() to take a pointer to the
hwstamp_config and to copy the config after validating it.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3_hwtstamp_ioctl() should validate all fields of hwtstamp_config
before making any changes. Currently it sets the TX configuration
before validating the rx_filter field.
Compile-tested only.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We assume that "mp->phy" can be NULL a couple lines before the
dereference.
Fixes: 1cce16d37d ('net: mv643xx_eth: Add missing phy_addr_set in DT mode')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull core locking changes from Ingo Molnar:
"The biggest changes:
- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.
- move the various kernel locking primitives to the new
kernel/locking/ directory"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...
Pull DMA mask updates from Russell King:
"This series cleans up the handling of DMA masks in a lot of drivers,
fixing some bugs as we go.
Some of the more serious errors include:
- drivers which only set their coherent DMA mask if the attempt to
set the streaming mask fails.
- drivers which test for a NULL dma mask pointer, and then set the
dma mask pointer to a location in their module .data section -
which will cause problems if the module is reloaded.
To counter these, I have introduced two helper functions:
- dma_set_mask_and_coherent() takes care of setting both the
streaming and coherent masks at the same time, with the correct
error handling as specified by the API.
- dma_coerce_mask_and_coherent() which resolves the problem of
drivers forcefully setting DMA masks. This is more a marker for
future work to further clean these locations up - the code which
creates the devices really should be initialising these, but to fix
that in one go along with this change could potentially be very
disruptive.
The last thing this series does is prise away some of Linux's addition
to "DMA addresses are physical addresses and RAM always starts at
zero". We have ARM LPAE systems where all system memory is above 4GB
physical, hence having DMA masks interpreted by (eg) the block layers
as describing physical addresses in the range 0..DMAMASK fails on
these platforms. Santosh Shilimkar addresses this in this series; the
patches were copied to the appropriate people multiple times but were
ignored.
Fixing this also gets rid of some ARM weirdness in the setup of the
max*pfn variables, and brings ARM into line with every other Linux
architecture as far as those go"
* 'for-linus-dma-masks' of git://git.linaro.org/people/rmk/linux-arm: (52 commits)
ARM: 7805/1: mm: change max*pfn to include the physical offset of memory
ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations
ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations
ARM: 7795/1: mm: dma-mapping: Add dma_max_pfn(dev) helper function
ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit()
ARM: DMA-API: better handing of DMA masks for coherent allocations
ARM: 7857/1: dma: imx-sdma: setup dma mask
DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks
DMA-API: dcdbas: update DMA mask handing
DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
DMA-API: usb: musb: use platform_device_register_full() to avoid directly messing with dma masks
DMA-API: crypto: remove last references to 'static struct device *dev'
DMA-API: crypto: fix ixp4xx crypto platform device support
DMA-API: others: use dma_set_coherent_mask()
DMA-API: staging: use dma_set_coherent_mask()
DMA-API: usb: use new dma_coerce_mask_and_coherent()
DMA-API: usb: use dma_set_coherent_mask()
DMA-API: parport: parport_pc.c: use dma_coerce_mask_and_coherent()
DMA-API: net: octeon: use dma_coerce_mask_and_coherent()
DMA-API: net: nxp/lpc_eth: use dma_coerce_mask_and_coherent()
...
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
Add two trivial helpers list_next_entry() and list_prev_entry(), they
can have a lot of users including list.h itself. In fact the 1st one is
already defined in events/core.c and bnx2x_sp.c, so the patch simply
moves the definition to list.h.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for deferred
probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0
R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi
huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20
PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN
2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa
Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8=
=GCbY
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
not compiled for ages, and recent versions of gcc for it are broken.
Remove support for it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSev31AAoJEMsfJm/On5mBzSAQAKRBYLqtf3nJGm9pXGDhZPGG
7KSQ8S11pg/wnXYW6P/XJhFRBrYkOOCeqVKQHtmxG8MmXQkkOz95rsIvBbUzU/FT
yJAPKpOHdh1yLhBGgCj3WhGtjVwpbut1/y9n2M5SpGautUgxfLj9fJiswSJx0n7t
VRWKwfIpBFPLPs9w6hdDf94tIXhSX8Me2gd3LDCPBEQ2SZYd8rtBasYtDeC2+FLa
Xow4ZQrCU7hpYscSUFzJpok35hl7weGhJ9jjXwtic4byFHvdiyHUwCOaEWC0hqNi
fOLWFbvBogqjyAktfZhfyL9R9/7lGlLshLQNmJWR3bO+nCJ21h9ATw0R4gLBdT4/
lzLRnJ/4GdtbvmdqRxNjxxR4zHkZ+tE8HmaCmUzvqGfQyA5sJNBRrBDcWLUOVlO9
0iIZsJBZjSQXKXSk9P5xH4G0tlbAFEUnEHKsrt/mgsD9Z3SgbPKAIWSBAJA0AMQk
DXZaXrBRilXOPUCZASZfmK8AQFC1GYB0tz7nT4x1mjT2/JClgG2kHCAGhNmI+CbK
l9VRIgBydppLFPOGhZLSNGQp29xBhw9JgOVns4a1k7kJQEw9ht38h8Q2ckRYxXhP
/z53eZKMQk62quWlyLRgR9mWqZc2CIifLVdFjiOELMh7wKPwL6eGrrrGBDbPtctS
PX5K26geb0oA3ZMjpBLr
=V6n6
-----END PGP SIGNATURE-----
Merge tag 'h8300-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull h8300 platform removal from Guenter Roeck:
"The patch series has been in -next for more than one relase cycle. I
did get a number of Acks, and no objections.
H8/300 has been dead for several years, the kernel for it has not
compiled for ages, and recent versions of gcc for it are broken.
Remove support for it"
* tag 'h8300-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
CREDITS: Add Yoshinori Sato for h8300
fs/minix: Drop dependency on H8300
Drop remaining references to H8/300 architecture
Drop MAINTAINERS entry for H8/300
watchdog: Drop references to H8300 architecture
net/ethernet: Drop H8/300 Ethernet driver
net/ethernet: smsc9194: Drop conditional code for H8/300
ide: Drop H8/300 driver
Drop support for Renesas H8/300 (h8300) architecture
Add missing platform_set_drvdata() in xtsonic_probe(), otherwise
calling platform_get_drvdata() in xtsonic_device_remove() may
returns NULL.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing platform_set_drvdata() in mace_probe(), otherwise
calling platform_get_drvdata() in mac_mace_device_remove() may
returns NULL.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing platform_set_drvdata() in arc_emac_probe(), otherwise
calling platform_get_drvdata() in arc_emac_remove() may returns NULL.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The max_vfs parameter has a limit of 63 and silently fails (adding 0 vfs) when
it is out of range. This patch adds a warning so that the user knows something
went wrong. Also, this patch moves the warning in ixgbe_enable_sriov() to where
max_vfs is checked, so that even an out of range value will show the deprecated
warning. Previously, an out of range parameter didn't even warn the user to use
the new sysfs interface instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes multiple problems in the link modes display in ethtool.
Newer parts have more complicated methods to determine actual link
capabilities. Older parts cannot communicate with their SFP modules.
Finally, all the available defines are not displayed by ethtool. This
updates the link modes to be as accurate as possible depending on what data
is available to the driver at any given time.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Connect-IB adapter has an inherent page size which equals 4K.
Define an new enum that equals the page shift and use it instead of
using the value 12 throughout the code.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Change optimal_reclaimed_pages() to increase the output size of each
reclaim pages command. This change reduces significantly the amount of
reclaim pages commands issued to FW when the driver is unloaded which
reduces the overall driver unload time.
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Firmware spec requires reserved fields to be cleared when calling
set_hca_cap. Current code queries and copy to the set area, possibly
resulting in reserved bits not cleared. This patch copies only
writable fields to the set area.
Fix also typo - msx => max
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Connect-IB firmware requires 4K pages to be communicated with the
driver. This patch breaks larger pages to 4K units to enable support
for architectures utilizing larger page size, such as PowerPC. This
patch also fixes several places that referred to PAGE_SHIFT instead of
explicit 12 which is the inherent page shift on Connect-IB.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If DMA mapping fails, the driver cleared the object that holds the
previously DMA mapped pages. Fix this by allocating a new object for
the command that reports back to firmware that pages can't be
supplied.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use asynchronous commands to execute up to eight concurrent create MR
commands. This is to fill memory caches faster so we keep consuming
from there. Also, increase timeout for shrinking caches to five
minutes.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The number of stations in use is kept in the num_rx_pools counter
in the ixgbe_adapter structure. This is in turn used by the queue
allocation scheme to determine how many queues are needed to support
the number of pools in use with the current feature set.
This works as long as the pools are added and destroyed in order
because (num_rx_pools * queues_per_pool) is equal to the last
queue in use by a pool. But as soon as you delete a pool out of
order this is no longer the case. So the above multiplication
allocates to few queues and a pool may reference a ring that has
not been allocated/initialized.
To resolve use the bit mask of in use pools to determine the final
pool being used and allocate enough queues so that we don't
inadvertently remove its queues.
# ip link add link eth2 \
numtxqueues 4 numrxqueues 4 txqueuelen 50 type macvlan
# ip link set dev macvlan0 up
# ip link add link eth2 \
numtxqueues 4 numrxqueues 4 txqueuelen 50 type macvlan
# ip link set dev macvlan1 up
# for i in {0..100}; do
ip link set dev macvlan0 down; ip link set dev macvlan0 up;
done;
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the recent support for layer 2 hardware acceleration, I added a
few references to real_num_rx_queues and num_rx_queues which are
only available with CONFIG_RPS.
The fix is first to remove unnecessary references to num_rx_queues.
Because the hardware offload case is limited to cases where RX queues
and TX queues are equal we only need a single check. Then wrap the
single case in an ifdef.
The patch that introduce this is here,
commit a6cc0cfa72
Author: John Fastabend <john.r.fastabend@intel.com>
Date: Wed Nov 6 09:54:46 2013 -0800
net: Add layer 2 hardware acceleration operations for macvlan devices
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing MTU size of an xgmac network interface while it is active can
cause a panic like
skbuff: skb_over_panic: text:c03bc62c len:1090 put:1090 head:edfb6900 data:edfb6942 tail:0xedfb6d84 end:0xedfb6bc0 dev:eth0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
Internal error: Oops - BUG: 0 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 762 Comm: python Tainted: G W 3.10.0-00015-g3e33cd7 #309
task: edcfe000 ti: ed67e000 task.ti: ed67e000
PC is at skb_panic+0x64/0x70
LR is at wake_up_klogd+0x5c/0x68
This happens because xgmac_change_mtu modifies dev->mtu before the
network interface is quiesced. And thus there still might be buffers
in use which have a buffer size based on the old MTU.
To fix this I moved the change of dev->mtu after the call to
xgmac_stop.
Another modification is required (in xgmac_stop) to ensure that
xgmac_xmit is really not called anymore (xgmac_tx_complete might wake
up the queue again).
I've tested the fix by switching MTU size every second between 600 and
1500 while network traffic was going on. The test box survived a test
of several hours (until I've stopped it) whereas w/o this fix above
panic occurs after several minutes (at most).
Change since v1:
- remove call to netif_stop_queue at beginning of xgmac_stop
- use netif_tx_disable instead of locking+netif_stop_queue
Signed-off-by: Andreas Herrmann <andreas.herrmann@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each RX/TX ring and its CQ, allocation is done on a NUMA node that
corresponds to the core that the data structure should operate on.
The assumption is that the core number is reflected by the ring index.
The affected allocations are the ring/CQ data structures,
the TX/RX info and the shared HW/SW buffer.
For TX rings, each core has rings of all UPs.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done to optimize FW/HW access to host memory.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all TX/RX rings and completion queues are part of the
netdev priv structure and are allocated statically. This patch
will change the priv to hold only arrays of pointers and therefore
all TX/RX rings and completetion queues will be allocated
dynamically. This is in preparation for NUMA aware allocations.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow immediate activate of VGT->VST and VST->VGT transitions, without
the need of rebinding in mlx4_master_immediate_activate_vlan_qos().
Also in struct res_qp: add qp parameters (vlan_index,fvl,vlan_cntrol..)
to the saved set, in order to restore when move to VGT.
- Clear at mlx4_RST2INIT_QP_wrapper()
- Save at mlx4_INIT2RTR_QP_wrapper()
- Restore at mlx4_vf_immed_vlan_work_handler()
Update mlx4_vf_immed_vlan_work_handler() to support VGT.
Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To guarantee that all unused fields in all FW commands for both inboxes
and outboxes are zeroed out, initialize the mailbox buffer to all zeroes.
This is especially important for SRIOV comm-channel virtual commands
(such as QUERY_FUNC_CAP), where if new fields are added to support new
features, the driver can depend on older kernels passing zeroes in these
fields.
In addition to zeroing out the mailbox buffer at allocation time, all
(now unnecessary) calls to memset by the callers of
mlx4_alloc_cmd_mailbox() are removed.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify RFS code to support applying filters for incoming UDP streams.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that l2 acceleration ops are in place from the prior patch,
enable ixgbe to take advantage of these operations. Allow it to
allocate queues for a macvlan so that when we transmit a frame,
we can do the switching in hardware inside the ixgbe card, rather
than in software.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
timecounter_init() was was called only after first potential
timecounter_read().
Moved mlx4_en_init_timestamp() before mlx4_en_init_netdev()
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If build_skb fails the memory associated with the ring buffer is freed but
the ri->data member is not zeroed in this case. This causes a double-free
of this memory in tg3_free_rings->... path. The patch moves this block after
setting ri->data to NULL.
It would be nice to fix this bug also in stable >= v3.4 trees.
Cc: Nithin Nayak Sujir <nsujir@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS updates from Ralf Baechle:
- Some minor work bringing the Cobalt MIPS platforms in line with other
MIPS platforms
- Make vmlinux.32 and vmlinux.64 build messages less verbose
- Always register the R4k clocksource when selected, the clock source's
rating will decide if this or another clock source is actually going
to be used
- Drop support for the Cisco (formerly Scientific Atlanta) PowerTV
platform. There appears to be nobody left who cares and the USB
driver went stale while waiting for years to be merged
- Some cleanup of Loongson 2 related #ifdefery
- Various minor cleanups
- Major rework on all things related to tracing / ptrace on MIPS,
including switching the MIPS ELF core dumper to regsets, enabling the
entries for SIGSYS in struct siginfo for MIPS, enabling ftrace
syscall trace points
- Some more work to bring DECstation support code in line with other
more modern code
- Report the name of the detected CPU, not just its CP0 PrID value
- Some more BCM 47xx and atheros ath79xx work
- Support for compressed kernels using the XZ compression scheme
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (53 commits)
MIPS: remove duplicate define
MIPS: Random whitespace clean-ups
MIPS: traps: Reformat notify_die invocations to 80 columns.
MIPS: Print correct PC in trace dump after NMI exception
MIPS: kernel: cpu-probe: Report CPU id during probe
MIPS: Remove unused defines in piix4.h
MIPS: Get rid of hard-coded values for Malta PIIX4 fixups
MIPS: Always register R4K clock when selected
MIPS: Loongson: Get rid of Loongson 2 #ifdefery all over arch/mips.
MIPS: cacheops.h: Increase indentation by one tab.
MIPS: Remove bogus BUG_ON()
MIPS: PowerTV: Remove support code.
MIPS: ftrace: Add support for syscall tracepoints.
MIPS: ptrace: Switch syscall reporting to tracehook_report_syscall_entry().
MIPS: Move audit_arch() helper function to __syscall_get_arch().
MIPS: Enable HAVE_ARCH_TRACEHOOK.
MIPS: Switch ELF core dumper to use regsets.
MIPS: Implement task_user_regset_view.
MIPS: ptrace: Use tracehook helpers.
MIPS: O32 / 32-bit: Always copy 4 stack arguments.
...
This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a more standard logging style.
Convert smsc_<level> macros to use netif_<level>.
Remove unused #define PFX
Add pr_fmt and neaten pr_<level> uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.
The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.
This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.
Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.
Feedback would be appreciated!
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit cc9d4598 'net: mv643xx_eth: use of_phy_connect if phy_node
present' made the call to phy_scan optional, if the DT has a link to
the phy node.
However phy_scan has the side effect of calling phy_addr_set, which
writes the phy MDIO address to the ethernet controller. If phy_addr_set
is not called, and the bootloader has not set the correct address then
the driver will fail to function.
Tested on Kirkwood.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements resource quota grant decision when resources are requested,
for the following resources: QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs,
and Counters.
When granting a resource, the quota system increases the allocated-count
for that slave.
When the slave later frees the resource, its allocated-count is reduced.
A spinlock is used to protect the integrity of each resource's free-pool counter.
(One slave may be in the process of being granted a resource while another
slave has crashed, initiating cleanup of that slave's resource quotas).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In current kernels, the mlx4 driver running on a VM does not
differentiate between max resource numbers for the HCA and
max quotas -- it simply takes the quota values passed to it
as max-resource values.
However, the driver actually requires the VFs to be aware of
the actual number of resources that the HCA was initialized with,
for QPs, CQs, SRQs and MPTs.
For QPs, CQs and SRQs, the reason is that in completion handling
the driver must know which of the 24 bits are the actual resource
number, and which are "padding" bits.
For MPTs, also, the driver assumes knowledge of the number of MPTs
in the system.
The previous commit fixes the quota logic on the VM for the quota values
passed to it by QUERY_FUNC_CAPS.
For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
from QUERY_HCA (and not QUERY_FUNC_CAPS). The quotas passed
in QUERY_FUNC_CAPS are used to report max resource number values
in the response to ib_query_device.
However, the Hypervisor driver must consider that VMs
may be running previous kernels, and compatibility must be preserved.
To resolve the incompatibility with previous kernels running on VMs,
we deprecated the quota fields in mlx4_QUERY_FUNC_CAP. In the
deprecated fields, we pass the max-resource values from INIT_HCA
The quota fields are moved to a new location, and the current kernel
driver takes the proper values from that location. There is
also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
if this flag is set, the (VM) driver takes the quota values from the
new location.
VMs running previous kernels will work properly, except that the max resource
numbers reported in ib_query_device for these resources will be
too high. The Hypervisor driver will, however, enforce the quotas
for these VMs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is step #1 for implementing SRIOV resource quotas for VFs.
Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.
Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC,
VLAN, and Counters.
The quota system works as follows:
Each entity (VF or PF) is given a max number of a given resource (its quota),
and a guaranteed minimum number for each resource (starvation prevention).
For QPs, CQs, SRQs, MPTs and MTTs:
50% of the available quantity for the resource is divided equally among
the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
The other 50% is the "free pool", allocated on a first-come-first-serve basis.
For each VF/PF, resources are first allocated from its "guaranteed-minimum"
pool. When that pool is exhausted, the driver attempts to allocate from
the resource "free-pool".
The quota (i.e., max) for the VFs and the PF is:
The free-pool amount (50% of the real max) + the guaranteed minimum
For MACs:
Guarantee 2 MACs per VF/PF per port. As a result, since we have only
128 MACs per port, reduce the allowable number of VFs from 64 to 63.
Any remaining MACs are put into a free pool.
For VLANs:
For the PF, the per-port quota is 128 and guarantee is 64
(to allow the PF to register at least a VLAN per VF in VST mode).
For the VFs, the per-port quota is 64 and the guarantee is 0.
We assume that VGT VFs are trusted not to abuse the VLAN resource.
For Counters:
For all functions (PF and VFs), the quota is 128 and the guarantee is 0.
In this patch, we define the needed structures, which are added to the
resource-tracker struct. In addition, we do initialization
for the resource quota, and adjust the query_device response to use quotas
rather than resource maxima.
As part of the implementation, we introduce a new field in
mlx4_dev: quotas. This field holds the resource quotas used
to report maxima to the upper layers (ib_core, via query_device).
The HCA maxima of these values are passed to the VFs (via
QUERY_HCA) so that they may continue to use these in handling
QPs, CQs, SRQs and MPTs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In procedure mlx4_init_mr_table(), slaves should do no processing,
but should return success. This initialization is hypervisor-only.
However, the check for num_mpts being a power-of-2 was performed
before the check to return immediately if the driver is for a slave.
This resulted in spurious failures.
The order of performing the checks is reversed, so that if the
driver is for a slave, no processing is done and success is returned.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In upstream kernels under SRIOV, the vlan register/unregister calls
were NOPs (doing nothing and returning OK). We detect these old
calls from guests (via the comm channel), since previously the
port number in mlx4_register_vlan was passed (improperly) in the
out_param. This has been corrected so that the port number is now
passed in bits 8..15 of the in_modifier field.
For old calls, these bits will be zero, so if the passed port
number is zero, we can still look at the out_param field to see
if it contains a valid port number. If yes, the VM is running
an old driver.
Since for old drivers, the register/unregister_vlan wrappers were
NOPs, we continue this policy -- the reason being that upstream
had an additional bug in eth driver running on guests (where
procedure mlx4_en_vlan_rx_kill_vid() had the following code:
if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx))
mlx4_unregister_vlan(mdev->dev, priv->port, idx);
else
en_err(priv, "could not find vid %d in cache\n", vid);
On a VM, mlx4_find_cached_vlan() will always fail, since the
vlan cache is located on the Hypervisor; on guests it is empty.
Therefore, if we allow upstream guests to register vlans, we will
have vlan leakage since the unregister will never be performed.
Leaving vlan reg/unreg for old guest drivers as a NOP is not a
feature regression, since in upstream the register/unregister
vlan wrapper is a NOP.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add resource tracker support for reg/unreg vlans calls done by VFs.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of vlan_index created problems unregistering vlans on guests.
In addition, tools delete vlan by tag, not by index, lets follow that.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac,
mlx4_unregister_mac all made illegal use of the out_param in multifunc mode
to pass the port number. The firmware spec specifies that the port number
should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and
FREE_RES (sections 20.15.1 and 20.15.2).
For MAC register/unregister, this patch contains workarounds so that guests
running previous kernels continue to work on a new Hypervisor, and guests
running the new kernel will continue to work on old hypervisors.
Vlan registeration capability is still not operational in multifunction mode,
since the vlan wrapper functions are not implemented in this patch.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reg/unreg vlan code was broken:
1. a wrapped function called another wrapped function, causing a deadlock.
2. unregister_vlan called cmd_box instead of cmd_box_imm, leading to
incorrectly passed parameters.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the platform data pointer before dereferencing it and error out of the
probe() method if it's NULL.
This has additional effect of preventing kernel oops with outdated platform data
containing zero PHY address instead (such as on SolutionEngine7710).
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
o 83xx and 84xx firmware is capable of multiple Tx queues.
This patch will enable multiple Tx queues for 83xx/84xx
series adapters. Max number of Tx queues supported will be 8.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Current driver has duplicate code for validating user input
for changing Tx/SDS rings using set_channel ethtool interface.
This patch removes duplicate code and refactored Tx/SDS ring
validation for 82xx/83xx/84xx series adapter.
o Refactored code now calculates maximum Tx/Rx ring driver can
support based on Default, NPAR and SRIOV PF/VF mode of driver.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Enhance ethtool statistics to display multiple Tx queue stats for
all supported adapters.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Without failing probe, register netdev when device is in FAILED state.
o Device will come up with minimum functionality and allow diagnostics and
repair of the adapter.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/netconsole.c
net/bridge/br_private.h
Three mostly trivial conflicts.
The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.
In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".
Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
In function mlx4_master_deactivate_admin_state() __mlx4_unregister_mac was
called using the MAC index. It should be called with the value of the MAC itself.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also fixes an incorrect function comment (probably copy/paste).
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
This series contains updates to e1000, igb, ixgbe and ixgbevf.
Hong Zhiguo provides a fix for e1000 where tx_ring and adapter->tx_ring
are already of type "struct e1000_tx_ring" so no need to divide by
e1000_tx_ring size in the idx calculation.
Emil provides a fix for ixgbevf to remove a redundant workaround related
to header split and a fix for ixgbe to resolve an issue where the MTA table
can be cleared when the interface is reset while in promisc mode.
Todd provides a fix for igb to prevent ethtool from writing to the iNVM
in i210/i211 devices. This issue was reported by Marek Vasut <marex@denx.de>.
Anton Blanchard provides a fix for ixgbe to reduce memory consumption
with larger page sizes, seen on PPC.
Don provides a cleanup in ixgbe to replace the IXGBE_DESC_UNUSED macro with
the inline function ixgbevf_desc_unused() to make the logic a bit more
readable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch resolves an issue where the MTA table can be cleared when the
interface is reset while in promisc mode. As result IPv6 traffic between
VFs will be interrupted.
This patch makes the update of the MTA table unconditional to avoid the
inconsistent clearing on reset.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch just replaces the IXGBE_DESC_UNUSED macro with a like named
inline function ixgbevf_desc_unused. The inline function makes the logic
a bit more readable.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The ixgbe driver allocates pages for its receive rings. It currently
uses 512 pages, regardless of page size. During receive handling it
adds the unused part of the page back into the rx ring, avoiding the
need for a new allocation.
On a ppc64 box with 64 threads and 64kB pages, we end up with
512 entries * 64 rx queues * 64kB = 2GB memory used. Even more of a
concern is that we use up 2GB of IOMMU space in order to map all this
memory.
The driver makes a number of decisions based on if PAGE_SIZE is less
than 8kB, so use this as the breakpoint and only allocate 128 entries
on 8kB or larger page sizes.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>