Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on a patch from Peter Oruba, convert myri10ge to use pcie_get_readrq()
and pcie_set_readrq() instead of our own PCI calls and arithmetics.
These driver changes incorporate the proposed PCI-X / PCI-Express read byte
count interface. Reading and setting those values doesn't take place
"manually", instead wrapping functions are called to allow quirks for some
PCI bridges.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off by: Peter Oruba <peter.oruba@amd.com>
Based on work by Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Use the pause counter to avoid a needless device reset, and
print a message telling the admin that our link partner is
flow controlling us down to 0 pkts/sec.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove nonsensical limit in the tx done routine. Specifically,
the loop will always terminate after processing <= 1 rings worth
of frames, as the mcp index is not refetched, so the removed
conditional could never be true.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
SET_NETDEV_DEV() in myri10ge to create the "/sys/class/net/<if>/device"
symlink.
Signed-off-by: Maik Hampel <m.hampel@gmx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Since Myri-10G boards may also run in Myrinet mode instead of Ethernet,
add a message when we detect that the link partner is not running in the
right mode.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Limit the number of recoveries from a NIC hw watchdog reset to 1 by default.
It enables detection of defective NICs immediately since these memory parity
errors are expected to happen very rarely (less than once per century*NIC).
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove the aligned-completion whitelist, and replace it by using the 1.4.16
firmware's auto-detection features to choose which firmware to load.
The driver now loads the aligned firmware, performs a MXGEFW_CMD_UNALIGNED_TEST,
and falls back to using the unaligned firmware if:
- The firmware is too old (ie, MXGEFW_CMD_UNALIGNED_TEST is an unknown command).
- The MXGEFW_CMD_UNALIGNED_TEST returns MXGEFW_CMD_ERROR_UNALIGNED, meaning
that it has seen an unaligned completion during the DMA test.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don't count on whatever implementation artifact preserves the
multicast list across a reset cmd, and setup multicast filtering
as part of our reset routine.
The setting of allmulti when adopting firmware with the rx-filter
broadcast bug is also moved into the multicast setup routine where
it belongs.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add dropped_pause, dropped_bad_phy, dropped_bad_crc32,
dropped_unicast_filtered to the set of ethtool counters.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to
avoid the longer, open coded equivalent.
Ditched a no-op in bnx2 in the process.
I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()...
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->h.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One less thing for drivers writers to worry about.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the Intel 5000 southbridge (aka Intel 6310/6311/6321ESB) PCIe ports
and the Intel E30x0 chipsets to the whitelist of aligned PCIe completion.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Simpler way of dealing with the firmware 4KB boundary crossing
restriction for rx buffers. This fixes a variety of memory
corruption issues when using an "uncommon" MTU with a 16KB
page size.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Correctly detect when TSO should be used on transmit by looking at the
skb->gso_size rather than seeing if the frame was larger than our MTU.
The old method causes problems when a host with a large (jumbo) MTU is
sending to a host with a small (standard) MTU.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix management of allocated physical pages when the architecture
page size is not 4kB since the firmware cannot cross 4K boundary.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a wc_enabled flag in the myri10ge_priv instead of relying
on mtrr >= 0.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Do not use 4k rdma request on SGI TIOCE chipset since this
bridge does not support it.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Allocate a specific page and use pci_map_page for dma test instead
of relying on another existing buffer.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix a missing error check in myri10ge_allocate_rings() and set status
to -ENOMEM before all actual allocations so that the error path returns
what it should.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix copyright and license ("regents" should not have ever been used).
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Work around a bug which occurs when adopting firmware versions
1.4.4 though 1.4.11 where broadcasts are filtered as if they
were multicasts.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove the NETIF_F_TSO #ifdef-ery in drivers/net; this was
for old-old-2.4 compat (even current 2.4 has NETIF_F_TSO)
but it's time to get rid of it by now.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Now that IRQ allocation is done in myri10ge_open(), we want to still
check when loading the driver that IRQ allocation could succeed later.
Additionaly, we fix the initialization and printing of netdev->irq.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Under some circumstances, using WC without the WC fifo is faster.
So we make it possible to tune wc_fifo with a module parameter.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
On suspend, handle pci_set_power_state errors, and on resume
handle failures in pci_resume_state().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The PCI MSI and express state are already saved and restored by the
current versions of pci_save_state/pci_restore_state.
Therefore it is no longer necessary for the driver to do it.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Now that IRQ are requested is called on open() and freed on close(),
we can safely switch from/to MSI without unloading the module.
We are guaranteed to correctly free IRQ even if the sysfs file got
written in the meantime since the MSI initialization is stored in
mgp->msi_enabled.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Request IRQ in myri10ge_open() and free in close() instead of probe()
and remove() to eliminate potential race between the watchdog and the
interrupt handler. Additionaly, the interrupt handler won't get called
on shared irq anymore when the interface is down.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Since pci_save_state() pushes MSI and PCIe states on a kind of stack,
myri10ge saving the state in advance for parity recovery will push the
state again on the stack on suspend. This leads to some memory leak.
We add a couple additional calls to save_state and restore_state so
that we don't leak anymore.
For the future, we are thinking of a better way to recover from parity
error without using pci_save_state().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix sizing of big_bytes in the case of vlan frames. The 4
VLAN_HLEN bytes were omitted, leading to sizing the big buffer
4 bytes smaller than it should be. Due to how rx buffers are
carved from pages, this was harmless for the common (9000, 1500)
byte MTUs, but could lead to data corruption for some MTUs.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Receive full vlan frames into smalls when running with a jumbo MTU.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Drop the old routines that used the physically contigous skb now
that we use the physical pages. And rename myri10ge_page_rx_done()
to myri10ge_rx_done() as it was previously.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Switch to physical page skb, by calling the new page-based
allocation routines and using myri10ge_page_rx_done().
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add physical page skb allocation routines and page based rx_done,
to be used by upcoming patches.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Indentation cleanups to synchronize to our tree which is automatically
indent'ed.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>