Eric Dumazet a écrit :
> Ingo Molnar a écrit :
>>> The following changes since commit 52989765629e7d182b4f146050ebba0abf2cb0b7:
>>> Linus Torvalds (1):
>>> Merge git://git.kernel.org/.../davem/net-2.6
>>>
>>> are available in the git repository at:
>>>
>>> master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master
>> Hm, something in this lot quickly wrecked networking here - see the
>> tx timeout dump below. It starts with:
>>
>> [ 351.004596] WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x10b/0x19c()
>> [ 351.011815] Hardware name: System Product Name
>> [ 351.016220] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out
>>
>> Config attached. Unfortunately i've got no time to do bisection
>> today.
>
>
>
> forcedeth might have a problem, in its netif_wake_queue() logic, but
> I could not see why a recent patch could make this problem visible now.
>
> CPU0/1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
> is not a new cpu either :)
>
> forcedeth uses an internal tx_stop without appropriate barrier.
>
> Could you try following patch ?
>
> (random guess as I dont have much time right now)
We might have a race in napi_schedule(), leaving interrupts disabled forever.
I cannot test this patch, I dont have the hardware...
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
forcedeth doesnt use properly dma api in its tx completion path
and in nv_loopback_test()
pci_map_single() should be paired with pci_unmap_single()
pci_map_page() should be paired with pci_unmap_page()
forcedeth xmit path uses pci_map_single() & pci_map_page(),
but tx completion path only uses pci_unmap_single()
nv_loopback_test() uses pci_map_single() & pci_unmap_page()
Add a dma_single field in struct nv_skb_map, and
define a helper function nv_unmap_txskb
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new device id for mcp89 chipset.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the device id macros and instead uses the constants
directly.
The areas in which logic expressions where using the macros now instead
use feature/workaround flags.
No new functionality has been introduced in this patch, only clean up of
flags and macros.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a phy_power_down parameter to forcedeth: set to 1 to power down the
phy and disable the link when an interface goes down; set to 0 to always
leave the phy powered up.
The phy power state persists across reboots; Windows, some BIOSes, and
older versions of Linux don't bother to power up the phy again, forcing
users to remove all power to get the interface working (see
http://bugzilla.kernel.org/show_bug.cgi?id=13072). Leaving the phy
powered on is the safest default behavior. Users accustomed to seeing
the link state reflect the interface state and/or wanting to minimize
power consumption can set phy_power_down=1 if compatibility with other
OSes is not an issue.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds new logic to support a clock gating feature found on the
latest set of chipsets. The clock gating is performed on the tx/rx
engines when the link is disconnected. Clock gating helps in reducing
power consumption.
* modified based on comments from netdev
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the tx_timeout() to properly handle the clean up of the
tx ring. It also sets the tx put pointer back to the correct position to
be in sync with HW.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all DMA_39BIT_MASK macro with DMA_BIT_MASK(39)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reset phy state on resume, fixing a regression caused by powering down
the phy on hibernate.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch clears the irqstatus register with the exact same events it
has read from it. Since the read-write operation is not atomic, a new
irqstatus bit could have been set in between these operations and would
then be cleared accidentally.
Secondly, we now don't need any spin lock protection when
scheduling/completing napi poll as the isr will not execute anymore (as
we turn off all interrupts now).
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies the throughput mode poll settings to reduce the
number of interrupts. This is only used by older hardware that need a
timer irq in throughput mode.
Secondly, this patch increases the default rx ring from 128 to 512. This
drastically improves bandwidth utilization for small packets sizes i.e
512 bytes.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the logic to moderate the interrupts by changing the
mode between throughput and poll. If there has been a large amount of
time without any burst of network load, the code will transition to pure
throughput mode (where each tx/rx/other will cause an interrupt). If
bursts of network load occurs, it will transition to poll based mode to
help reduce cpu utilization (it will not interrupt on each packet) while
maintaining the optimum network bandwidth utilization.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is only a subset of changes so that it is easier to see the
modifications. This patch removes the isr 'for' loop and shifts all the
logic to account for new tab spacing.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new optimization mode called Dynamic has been added. This will be mode
where interrupt moderation logic will dynamically switch between pure
throughput mode and poll based (called 'cpu') mode.
Also, for newer chipsets, the timer irq is not needed for throughput
mode. Secondly, since we are modifying the irqmask to change between
modes, msix is not supported.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The napi poll routine has been modified to handle all interrupt events
and process them accordingly. Therefore, the ISR will now only schedule
the napi poll and disable all interrupts instead of just disabling rx
interrupt.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two tx_done routines to handle tx completion processing. Both
these functions now take in a limit value and return the amount of tx
completions. This will be used by a future patch to determine the total
amount of work done.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes unnecessary overhead code. Firstly, there is no nead
to mask off unwanted interrupts as we will be checking against the
irqmask field anyways. Secondly, there has been no value in last few
years from detecting error or unknown interrupts.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch will save the irq events in the driver's context so that the
napi routine knows which interrupts have occurred. Subsequent changes
will be moving all interrupt processing into the napi poll routine.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes support for msix running in conjunction with napi.
There has been reported issues regarding the behaviour of irqmask and
generation of interrupts by the HW when in MSIX mode. When running napi,
the driver is constantly turning off/on the irqmask. For the time being,
I am going to disable it until I can root cause the issue.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds missing napi enable/disable calls.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer versions of the stats feature would not encompass all older
versions. This would result in only retreiving a subset of all available
stats in HW.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f55c21fd9a ("forcedeth: call
restore mac addr in nv_shutdown path"), which was introduced to fix
the regression tracked at
http://bugzilla.kernel.org/show_bug.cgi?id=11358 causes the
wake-on-lan mac to be reversed in the shutdown path. Apparently the
forcedeth situation is rather messy in that the mac we need to
writeback for a subsequent modprobe to work is exactly the reverse of
what is needed for proper wake-on-lan.
The following patch explains the situation in the comments and
makes the call to nv_restore_mac_addr() conditional (only called if
we are not really going for poweroff).
Tobias Diedrich wrote:
> Hmm, I had not tried WOL for some time.
> With 2.6.29-rc3 is see the following behaviour:
>
> State WOL Behaviour
> ------------------------------
> shutdown reversed MAC
> disk/shutdown reversed MAC
> disk/platform OK
>
> Apparently nv_restore_mac_addr() restores the MAC in the wrong order
> for WOL (at least for my PCI_DEVICE_ID_NVIDIA_NVENET_15). platform
> works, because the MAC is not touched in the nv_suspend() path.
>
> A possible fix might be to only call nv_restore_mac_addr() if
> system_state != SYSTEM_POWER_OFF.
With the following patch:
shutdown OK
disk/shutdown OK
disk/platform OK
kexec OK
Signed-off-by: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Tested-by: Philipp Matthias Hahn <pmhahn@titan.lahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds another type of recoverable error to the driver. It also
modifies the sequence for recovery to include a mac reset and clearing
of interrupts.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the ethtool tx csum "set" command. A recent patch was
submitted to remove HW_CSUM and use IP_CSUM instead. Therefore, the
corresponding ethtool command should also be modified.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an issue with the suspend/resume cycle with msi
interrupts. See bugzilla number 10487 for more details. The fix is to
re-setup a private msi pci config offset field.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the logic used to communicate with the mgmt unit. It
also adds a version check for a newer mgmt unit firmware.
* Fixed udelay to schedule_timeout_uninterruptible
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: change default
msix and napic can work again
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: clean up
schedule it later after disable it.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: fix bug
for msix, we still need that flag to enable irq respectively
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: make /proc/interrupts could show more info which irq is rx or other for msi-x
add three name fields for rx, tx, other
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a potential race condition between scheduling napi and
completing napi poll. The call to netif_rx_schedule should be under
protection of the lock (as is the completion), otherwise, interrupts
could be masked off.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the feature flag for mgmt unit as it is not used for
this chipset.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch bumps up the version number and adds current year to copyright.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a potential race condition between xmit thread and xmit
completion thread. The calculation of empty tx descriptors is not
performed under the lock. This could cause it to set the stop flag while
the completion thread finishes all tx's. This will result in the tx
queue in stopped state and no one to wake it up.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Polling doesn't seem to be necessary on my hardware, at
least I haven't seen any bad effects testing it a while.
Remove the polling so the CPU doesn't have to wake up a
hundred times per second.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter. This patch cleans up that api by
properly removing it..
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last shoot of this series.
After I removing all directly reference of netdev->priv, I am killing
"priv" of "struct net_device" and fixing relative comments/docs.
Anyone will not be allowed to reference netdev->priv directly.
If you want to reference the memory of private data, use netdev_priv()
instead.
If the private data is not allocted when alloc_netdev(), use
netdev->ml_priv to point that memory after you creating that private
data.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bring the physical link down when the interface is down by placing the PHY
in power-down state, unless WOL is enabled. This mirrors the behavior of
other drivers including e1000 and tg3.
Without the patch, ifconfig down leaves the physical link up, which confuses
datacenter users who expect the link lights both on the NIC and the switch to
go out when they bring an interface down.
Furthermore, even though the phy is powered on, autonegotiation stops working,
so a normally gigabit link might suddenly become 100 Mbit half-duplex when the
interface goes down, and become gigabit when it comes up again.
Ayaz said:
I would not include this patch until further testing is performed. NVIDIA
MCP chips use 3rd party PHY vendors. By powering down the phy, it could
have adverse affects on certain phys.
Arthur Jones said:
I just ran across this patch. Tested on a Marvell 88E1121R (GigE PHY)
and works great. This is a very important feature for me.
Signed-off-by: Ed Swierk <eswierk@arastra.com>
Tested-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert this driver to network device ops. Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.
Drivers need not do it any more.
Some cases had to be skipped over because the drivers
were making use of the ->last_rx value themselves.
Signed-off-by: David S. Miller <davem@davemloft.net>
This eliminates the following often-generated warning from my 64 bit
Opteron SMP test stand:
eth0: too many iterations (6) in nv_nic_irq
According to the web, the problem is that the forcedeth driver has a
too-low value for max_interrupt_work. Grepping the kernel I see that
forcedeth has the second lowest value of all ethernet drivers (ie, 6).
Most are in the 20-40 range. So this patch increases this a bit, from 6
to 15 (at 15 forcedeth becomes the driver with third-lowest
max_interrupt_work value).
My test stand, which used to print out the above warnings repetitively
whenever it was under heavy net load, no longer does so.
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Cc: Ayaz Abdulla <aabdulla@nvidia.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>