* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
When the mlx4 driver uses the same name for interrupts for every
device in the system. This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for. Change the driver
to add the PCI name of the device to the interrupt name.
Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On the error path of mlx4_init_hca(), mlx4_close_hca() is called,
followed by mlx4_free_icms() and mlx4_UNMAP_FA(). But both those
functions are also called from mlx4_close_hca(), which leads to a
double free.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs. However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.
This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot
SMP: Allowing 32 CPUs, 16 hotplug CPUs
so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.
Cc: <stable@kernel.org>
Reported-by: Eli Cohen <eli@mellanox.co.il>
Tested-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code used two calls to pci_request_region() to get the two BARs
for the mlx4 device, for no particularly good reason. Clean up the code
a little by converting this to a single call to pci_request_regions().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In a couple of cases collapse some extra code like:
int retval = NETDEV_TX_OK;
...
return retval;
into
return NETDEV_TX_OK;
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NETPOLL API requires that interrupts remain disabled in
netpoll_send_skb(). The use of "A functions set" in the NETPOLL API
callbacks causes the interrupts to get enabled and can lead to kernel
instability.
The solution is to use "B functions set" to prevent the irqs from
getting enabled while in netpoll_send_skb().
A functions set:
local_irq_disable()/local_irq_enable()
spin_lock_irq()/spin_unlock_irq()
spin_trylock_irq()/spin_unlock_irq()
B functions set:
local_irq_save()/local_irq_restore()
spin_lock_irqsave()/spin_unlock_irqrestore()
spin_trylock_irqsave()/spin_unlock_irqrestore()
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the length is less or equal to frag_prefix_size in the first iteration
we write skb_frags_rx[-1] and read from priv->frag_info[-1]
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use 1:1 mapping between QPs and SRQs on receive side,
so additional indirection level not required. Allocated the receive
buffers for the RSS QPs.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in using more QPs then actual number of receive rings.
If the RSS function for two streams gives the same result modulo number
of rings, they will arrive to the same RX ring anyway.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the net device is identified as "sender" (number of sent packets
is higher then the number of received packets and the incoming packets are
small), set the moderation time to its low limit.
We do it because the incoming packets are acks, and we don't want to delay them
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cases of fragmented skb, with the data pointers being wrapped around
the TX buffer, the completion handling code would not forward the data
pointer and the firs fragment was unmapped several times, while others
were not unmapped at all.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
sky2: Avoid races in sky2_down
drivers/net/mlx4: Adjust constant
drivers/net: Move a dereference below a NULL test
drivers/net: Move a dereference below a NULL test
connector: maintainer/mail update.
USB host CDC Phonet network interface driver
macsonic, jazzsonic: fix oops on module unload
macsonic: move probe function to .devinit.text
can: switch carrier on if device was stopped while in bus-off state
can: restart device even if dev_alloc_skb() fails
can: sja1000: remove duplicated includes
New device ID for sc92031 [1088:2031]
3c589_cs: re-initialize the multicast in the tc589_reset
Fix error return for setsockopt(SO_TIMESTAMPING)
netxen: fix thermal check and shutdown
netxen: fix deadlock on dev close
netxen: fix context deletion sequence
net: Micrel KS8851 SPI network driver
tcp: Use correct peer adr when copying MD5 keys
tcp: Fix MD5 signature checking on IPv4 mapped sockets
...
The values in the advertising field are typically ADVERTISED_xxx, not
SUPPORTED_xxx. Both SUPPORTED_10000baseT_Full and
ADVERTISED_1000baseT_Full have the same value.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
struct ethtool_cmd E;
@@
*E.advertising = SUPPORTED_10000baseT_Full
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
MT26468 (PCI ID 0x6764) devices can expose multiple physical
functions. The current driver only handles the primary physical
function. For other functions, the QUERY_FW firmware command will
fail with the CMD_STAT_MULTI_FUNC_REQ error code. Don't try to drive
such devices, but print a message saying the driver is skipping those
devices rather than just "QUERY_FW command failed."
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
[ Rather than keeping unsupported devices bound to the driver, simply
print a more informative error message and exit - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch converts the remaining occurences of raw return values to their
symbolic counterparts in ndo_start_xmit() functions that were missed by the
previous automatic conversion.
Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero
is changed to explicitly use NETDEV_TX_OK.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA: Add __init/__exit macros to addr.c and cma.c
IB/ehca: Bump version number
mlx4_core: Fix dma_sync_single_for_cpu() with matching for_device() calls
IB/mthca: Replace dma_sync_single() use with proper functions
RDMA/nes: Fix FIN state handling under error conditions
RDMA/nes: Fix max_qp_init_rd_atom returned from query device
IB/ehca: Ensure that guid_entry index is not negative
IB/ehca: Tolerate dynamic memory operations before driver load
Commit 5d23a1d2 ("net: replace dma_sync_single with
dma_sync_single_for_cpu") replaced uses of the deprectated function
dma_sync_single() with calls to dma_sync_single_for_cpu(). However,
to be correct, the code should do a sync for_cpu() before touching the
memory and for_device() after it's done.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Our RX rings are always full, there is no need to check whether
we need to fill them or not. If we fail to allocate a new socket
buffer, the incoming packet is dropped an the ring remains full.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
This check that verifies that the LSO header along with control
segment and first data segment do not cross 128 bytes is no longer
required.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When closing the port, we stop all transmit queues under the transmit
lock. It ensures that we will not attempt to transmit new packets after
the physical port was closed.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
After we moved to be a multi queue device, need to stop/start
all of our transmit queues.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need this check in the transmit function
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reporting the counter's value through 'ethtool -S'
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx
IB/mlx4: Add strong ordering to local inval and fast reg work requests
IB/ehca: Remove superfluous bitmasks from QP control block
RDMA/cxgb3: Limit fast register size based on T3 limitations
RDMA/cxgb3: Report correct port state and MTU
mlx4_core: Add module parameter for number of MTTs per segment
IB/mthca: Add module parameter for number of MTTs per segment
RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()
infiniband: Remove void casts
IB/ehca: Increment version number
IB/ehca: Remove unnecessary memory operations for userspace queue pairs
IB/ehca: Fall back to vmalloc() for big allocations
IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
When both MSI-X and legacy INTx fail to generate an interrupt, the
driver frees the MSI-X interrupts twice. Fix this by clearing the
have_irq flag for the MSI-X interrupts when they are freed the first
time. This is the same bug that was reported in ib_mthca by Yinghai
Lu <yhlu.kernel@gmail.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If mlx4_create_eq() would fail for one of EQ's assigned for
completion handling, the code would try to free the same EQ
we failed to create.
The crash was found by Christoph Lameter
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default the driver opens 8 TX queues (defined by MLX4_EN_NUM_TX_RINGS).
If the driver is configured to support Per Priority Flow Control, we open
8 additional TX rings.
dev->real_num_tx_queues is always set to be MLX4_EN_NUM_TX_RINGS.
The mlx4_en_select_queue() function uses standard hashing (skb_tx_hash)
in case that PPFC is not supported or the skb contain a vlan tag,
otherwise the queue is selected according to vlan priority.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt moderation should not depend on number of incoming
bytes, but on number of incoming packets.
The previous scheme caused very high interrupts rate for small
messages when big MTU was configured.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the initialization of one of the ports failed,
there is no need to fail the other one as well.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
en_params.c file now only handles Ethtool functionality
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each debug message, the message will show interface name in case
that the net device was registered, and PCI bus ID with port number
if we were not registered yet. Messages that are not port/netdev specific
stayed in the old format
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces dma_sync_single() with dma_sync_single_for_cpu() because
dma_sync_single() is an obsolete API; include/linux/dma-mapping.h says:
/* Backwards compat, remove in 2.7.x */
#define dma_sync_single dma_sync_single_for_cpu
#define dma_sync_sg dma_sync_sg_for_cpu
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commits 9d21493b4b
and 08baf56108
(net: tx scalability works : trans_start)
(net: txq_trans_update() helper)
Now that core network takes care of trans_start updates, dont do it
in drivers themselves, if possible. Multi queue drivers can
avoid one cache miss (on dev->trans_start) in their start_xmit()
handler.
Exceptions are NETIF_F_LLTX drivers (vxge & tehuti)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control. As a result, the size of memory that can be
registered is limited too. This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In case of allocation failure, the actual ring size is rounded down to
nearest power of 2. The remaining descriptors are freed.
The CQ and SRQ are allocated with the actual size and the mask is updated.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>