We need to ensure the loads from the descriptor are done after the
MMIO store clearing the interrupts has completed, otherwise we
might still miss work.
A read back from the MMIO register will "push" the posted store and
ioread32 has a barrier on weakly aordered architectures that will
order subsequent accesses.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This uses the standard phy-mode property
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just call the interrupt handler with interrupts locally disabled
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip supports HW vlan tag insertion and extraction. Add support
for it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the ndo_set_rx_mode() callback to configure the
multicast filters, promisc and allmulti options.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hopefully my understanding of how the hardware works is correct,
as the documentation isn't completely clear. So far I have seen
no obvious issue. Pause seem to also work with NC-SI.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A non-wired up implementation accidentally made its way in
a previous patch (Make ring sizes configurable via ethtool).
This removes it and wires up the generic phy_ethtool_nway_reset
instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I haven't seen any improvement above that size on the machines
I've tested with.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We set an arbitrary max at 1024 since we pre-allocate the actual
descriptor arrays and skb arrays to the full size to keep the
code a bit simpler and avoid allocation failures in the reset
task.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clear stale interrupts on entry, configure FIFO sizes, set FIFO
thresholds, configure interrupt mitigation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The helpers just take space but don't provide much value. Simple
one line comments are more explanatory.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To remove more confusion. This function is about obtaining the
initial MAC address at driver probe time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid confusion with the ndo callback and generally be
clearer about the purpose of that function
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So features can be turned on/off via ethtool
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We found out that HW checksum generation only works from AST2500
onward. This disables it on AST2400 and removes the "no-hw-checksum"
properties in the device-trees. The problem we had wasn't related
to NC-SI.
Also rework the logic testing for that property so it can be used
to disable HW checksum generation and checking regardless of whether
NC-SI is used or not in case other variants out there need this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We test for aspeed chips to handle a couple of special cases,
but we do that by checking the machine type which isn't right.
Instead check the actual device compatible property. This also
updates the dtsi files for the aspeed SoC to match.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation describes NETIF_F_IP_CSUM as deprecated
so let's switch to NETIF_F_HW_CSUM and use the helper to
handle unhandled protocols.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Directly access the fields when needed. The accessors add clutter
not clarity and in some cases cause unnecessary read-modify-write
type access on the slow (uncached) descriptor memory.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NETIF_F_SG and create multiple TX ring entries for skb fragments.
On reclaim, the skb is only freed on the segment marked as "last".
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Those are non-cachable stores, let's avoid those we don't need. Remove
the helper, it's not particularly helpful and since it uses "priv"
I can't move it to the header file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves the packet freeing to a separate function
which is also used by ftgmac100_free_buffers() and will
be used more in the error path of fragmented sends.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'll use variants of this accessor without barriers when
building series of descriptors for fragmented sends
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a private lock which isn't terribly useful, and we maintain
a "tx_pending" counter for information that's already available
via a trivial arithmetic operation. Then we unconditionaly wake
the queue even when not stopped. Finally our code in tx isn't
really safe vs. a concurrent reclaim. The aspeed chips aren't SMP
today but I prefer the code being right and future proof.
So rip that out and replace it with more "standard" queue handling,
currently with a threshold of 1 queue element, which will be
increased when we implement fragmented sends.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than in the descriptor. The descriptor is mapped non-cachable
and rather slow to access.
Since to do that we need to keep track of the tx "pointer" we also
have no use of all the accesors to manipulate it, just open code
it, it's as clear and will help when adding fragmented sends.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than just transmitting garbage past the end of the small
packet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a simple goto to a drop path at the tail of the function,
it will be used in a few more cases soon
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will make subsequent rework of the tx path simpler
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move it below ftgmac100_xmit() and the rest of the tx path
No code change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a reset task to reset our chip, use it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW incorrectly calculates the frame size without the vlan
tag and compares that against 64. It will thus flag 64-bytes
frames with a vlan tag as 60-bytes frames "runt" packets
which we'll then drop. Thus we end up dropping ARP packets
on vlan's ...
It does that whether vlan tag stripping is enabled or not.
This works around it by ignoring the "runt" error bit of the
frame has been vlan tagged and is at least 60 bytes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Directly access the fields when needed. The accessors add clutter
not clarity and in some cases cause unnecessary read-modify-write
type access on the slow (uncached) descriptor memory.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current driver receive path allocates pages and stashes
them into SKB fragments. This is not particularly useful as
we don't support jumbo frames (which wouldn't be great with
the small FIFOs on all the known implementations) anyway.
It also makes us flush the caches and allocate more memory
for RX than necessary.
So set our RX buf to our max packet size instead (which we
bump to 1536 bytes to account for packets with vlan tags
etc...) like most other ethernet drivers.
Then allocate skbs when populating the receive ring and DMA
directly into them.
This simplifies the RX path further.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't handle fragmented RX packets, so the "looping"
helpers to locate the first segment of a packet or to
drop a packet aren't actually helping.
Take them out and simplify ftgmac100_rx_packet() further
as a result.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fast path has a single unlikely() test for any error bit,
calling into a helper that sets the appropriate statistics.
The various netdev_info aren't particularly interesting. If
we want to differentiate the various length errors later we
can introduce driver specific stats using ethtool.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Read the descriptor field only once and check for IP header
checksum errors as well
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can occasionally fail to allocate new RX buffers at
runtime or when starting the driver. At the moment the
latter just fails to open which is fine but the former
leaves stale DMA pointers in the ring.
Instead, use a scratch page and have all RX ring descriptors
point to it by default unless a proper buffer can be allocated.
It will help later on when re-initializing the whole ring
at runtime on link changes since there is no clean failure
path there unlike open().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't support jumbo frames, we will never receive a
fragmented packet, the RX buffer is always big enough,
if not then it's a runaway packet that can be dropped.
So take out the loop that handles such things in
ftgmac100_rx_packet() which will help with subsequent
simplifications and improvements to the RX path
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
First, don't look at the interrupt status in the poll loop
to decide what to poll. It's wrong. If we have run out of
budget, we may still have RX packets to unqueue but no more
RX interrupt pending.
So instead move the code looking at the interrupt status
into the interrupt handler where it belongs. That avoids a slow
MMIO read in the NAPI fast path. We keep the abnormal interrupts
enabled while NAPI is scheduled.
While at it, actually do something useful in the "error" cases:
On AHB bus error, trigger the new reset task, that's about all
we can do. On RX packet fifo or descriptor overflows, we need
to restart the MAC after having freed things up. So set a flag
that NAPI will see and use to perform that restart after
harvesting the RX ring.
Finally, we shouldn't complete NAPI if there are still outgoing
packets that will need harvesting. Waiting for more interrupts
is less efficient than letting NAPI run a while longer while
the queue drains.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt is neither enabled nor registered when the interface
isn't running (regardless of whether we use nc-si or not) so the
test isn't useful.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HW requires a full MAC reset when changing the speed.
Additionally the Aspeed documentation spells out that the
MAC needs to be reset twice with a 10us interval.
We thus move the speed setting and top level reset code
into a new ftgmac100_reset_and_config_mac() function which
handles both. Move the ring pointers initialization there
too in order to reflect the HW change.
Also reduce the timeout for the MAC reset as it shouldn't
take more than 300 clock cycles according to the doc.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Link speed changes require a full HW reset. This isn't done
properly at the moment. It will involve delays and thus isn't
suitable to do from the link poll callback.
So let's create a reset_task that we can queue up when the
link changes. It will be useful for various cases of error
handling as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link monitoring and error handling code will have to
redo the ring inits and HW setup so move the code out of
ftgmac100_open() into a dedicated function.
This forces a bit of re-ordering of ftgmac100_open() but
nothing dramatic.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt isn't shared, so this will keep it masked
until we have the HW in a known sane state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, a single function is used to allocate the rings
themselves, initialize them, populate the rx ring, and
allocate the rx buffers. The same happens on free.
This splits them into separate functions. This will be
useful when properly implementing re-initialization on
link changes and error handling when the rings will be
repopulated but not freed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep track of both the current speed and duplex settings
instead of only speed and properly apply the duplex setting
to the HW.
This reworks the adjust_link() function to also avoid trying
to reconfigure the HW when there is no link and to display
the link state to the user.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>