Jarek pointed pppoe can call back dev_queue_xmit(), and might need
skb->dst, so its safer to unset IFF_XMIT_DST_RELEASE on ppp devices.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).
CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.
It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.
David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().
List of devices that must clear this flag is :
- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function
- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sysfs files for a network device can not unconditionally take the
rtnl_lock as the bonding sysfs files do. If someone accesses those
sysfs files while the network device is being unregistered with the
rtnl_lock held we will deadlock.
So use trylock and restart_syscall to avoid this problem.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered. So use trylock and syscall_restart to avoid this
deadlock.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holding rtnl_lock when we are unregistering the sysfs files can
deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix
it with the now familiar patter of: rtnl_trylock and syscall_restart()
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctls are unregistered with the rntl_lock held making
it unsafe to unconditionally grab the the rtnl_lock. Instead
we need to call rtnl_trylock and restart the system call
if we can not grab it. Otherwise we could deadlock at unregistration
time.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just returning -ERESTARTSYS without a signal pending is not
good that will just leak it to userspace. We need return
-ERESTARTNOINTR so we always restart and set signal pending
so that we fall of the fast path of syscall return and setup
the system call restart.
So use restart_syscall() which does all of this for us.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The earlier patch to fix the deadlock between a network device going
away and writing to sysfs attributes was incomplete.
- It did not set signal_pending so we would leak ERSTARTSYS to user space.
- It used ERESTARTSYS which only restarts if sigaction configures it to.
- It did not cover store and show for ifalias.
So fix all of these up and use the new helper restart_syscall so we get
the details correct on what it takes.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when we have a signal pending we have the functionality
to restart that the current system call. There are other cases
such as nasty lock ordering issues where it makes sense to have
a simple fix that uses try lock and restarts the system call.
Buying time to figure out how to rework the locking strategy.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New packet socket feature that makes packet socket more efficient for
transmission.
- It reduces number of system call through a PACKET_TX_RING mechanism,
based on PACKET_RX_RING (Circular buffer allocated in kernel space
which is mmapped from user space).
- It minimizes CPU copy using fragmented SKB (almost zero copy).
Signed-off-by: Johann Baudy <johann.baudy@gnu-log.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pdev->irq was not saved in netxen_adapter, causing request_irq()
with invalid irq number.
This was broken in commit be339aee63
("netxen: fix irq tear down and msix leak.").
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)
Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Per Dalen <per.dalen@cnw.se>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds support for the one or two channel CPC-PCI and CPC-PCIe
cards from EMS Dr. Thomas Wuensche (http://www.ems-wuensche.de).
Signed-off-by: Sebastian Haas <haas@ems-wuensche.com>
Signed-off-by: Markus Plessing <plessing@ems-wuensche.com>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver adds support for the SJA1000 chips connected to the
"platform bus", which can be found on various embedded systems.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the generic Socket-CAN driver for the Philips SJA1000
full CAN controller.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CAN network device driver interface provides a generic interface to
setup, configure and monitor CAN network devices. It exports a set of
common data structures and functions, which all real CAN network device
drivers should use. Please have a look to the SJA1000 or MSCAN driver
to understand how to use them. The name of the module is can-dev.ko.
Furthermore, it adds a Netlink interface allowing to configure the CAN
device using the program "ip" from the iproute2 utility suite.
For further information please check "Documentation/networking/can.txt"
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch documents the CAN netowrk device drivers interface, removes
obsolete documentation and adds some useful links to CAN resources.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for TI DaVinci EMAC driver.
TI DaVinci Ethernet Media Access Controller module is based upon
TI CPPI 3.0 DMA engine and supports 10/100 Mbps on all and Gigabit modes on
some TI devices. It supports MII/RMII and has up to 8Kbytes of internal
descriptor memory. This driver has been working on several TI devices including
DM644x, DM646x and DA830 platforms. The specs of this device are available at:
http://www.ti.com/litv/pdf/sprue24a
Signed-off-by: Anant Gole <anantgole@ti.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for net/icmp.h header in net/ipv4/fib_frontend.c.
This patch removes the #include net/icmp.h from it.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can update netdev_queue tx_bytes/tx_packets/tx_dropped counters instead
of dev->stats ones, to reduce number of cache lines dirtied in xmit path.
This fixes a performance problem on SMP when many different cpus take
vlan tx path.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c
Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)
We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)
Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is illegal to dereference a skb after a successful ndo_start_xmit()
call. We must store skb length in a local variable instead.
Bug was introduced in 2.6.27 by commit 0abf77e55a
(net_sched: Add accessor function for packet length for qdiscs)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marching along, let's bump the version number to indicate things actually
have happened to the driver.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the generic XAUI device support for 82599 controllers.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The performance of hardware RSC is greatly reduced if the total for max rsc
descriptors multiplied by the buffer size is greater than 65535. To
prevent this we need to adjust the max rsc descriptors appropriately.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.
I tried my best to deal with urg hole madness too which happens
here:
if (!sock_flag(sk, SOCK_URGINLINE)) {
++*seq;
...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Frans Pop <elendil@planet.nl>
Tested-by: Ian Zimmermann <itz@buug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
FIFO1_DMA_ERR is set twice, the second should be FIFO2_DMA_ERR.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Ram Vepa <ram.vepa@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After 2.6.29, PPC no more admits passing NULL to the dev parameter of
the DMA API. The result is a BUG followed by solid lock-up when the
mv643xx_eth driver brings an interface up. The following patch makes
the driver work on my Pegasos again; it is mostly a search and replace
of NULL by mp->dev->dev.parent in dma allocation/freeing/mapping/unmapping
functions.
Signed-off-by: Gabriel Paubert <paubert@iram.es>
Acked-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the purposes of bonding is to allow for redundant links, and failover
correctly if the cable is pulled. If all the members of a bonded device have
no carrier present, the bonded device itself needs to report no carrier present
to user space so management tools (like routing daemons) can respond.
Bonding in 802.3ad mode does not work correctly for this because it incorrectly
chooses a link that is down as a possible aggregator.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If bridge is configured with no STP and forwarding delay of 0 (which
is typical for virtualization) then when link starts it will flood all
packets for the first 20 seconds.
This bug was introduced by a combination of earlier changes:
* forwarding database uses hold time of zero to indicate
user wants to always flood packets
* optimzation of the case of forwarding delay of 0 avoids the initial
timer tick
The fix is to just skip all the topology change detection code if
kernel STP is not being used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge catches all STP packets; even if STP is turned
off. This prevents other systems (which do have STP turned on)
from being able to detect loops in the network.
With this patch, if STP is off, then any packet sent to the STP
multicast group address is forwarded to all ports.
Based on earlier patch by Joakim Tjernlund with changes
to go through forwarding (not local chain), and optimization
that only last octet needs to be checked.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running in DCB mode, switching between link flow control and priority
flow control shouldn't need to reset the hardware. This removes that
reset.
This also extends the set_all() dcbnl callback to return a value indicating
that the HW config changed, however a reset was not required.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethtool should report that link flow control is disabled when in priority
flow control mode.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
82599 supports using either link flow control or priority flow control when
in DCB mode. The dcbnl interface already supports sending down
configurations through rtnetlink that can enable LFC when DCB is enabled,
so the driver should take advantage of this.
82598 does not support using LFC when DCB is enabled, so explicitly disable
it when we're in DCB mode. This means we always run in PFC mode when DCB
is enabled.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sets the low water threshhold for priority flow control for 82598
and 82599 controllers in DCB mode.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable jumbo frame when FCoE feature is enabled in 82599. Use 3K
as the receive queue buffer size for receive queues used by FCoE
to address for max Fiber Channel frame size as 2148 bytes (with
max 2112 bytes of payload).
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable using FCoE redirection table feature in 82599. The FCoE
redirection table has maximum of eight entries, corresponding
to maximum of eight receive queues to be used for distributing
incoming FCoE packets. This patch sets up the FCoE redirection
table when multiple receive queues are available for FCoE.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ring feature for FCoE to make use of the FCoE redirection
table in 82599. The FCoE redirection table is a receive side
scaling feature for Fiber Channel over Ethernet feature in 82599,
enabling distributing FCoE packets to different receive queues
based on the exchange id.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we can find a type NETDEV_HW_ADDR_T_SAN mac address from the
corresponding netdev for a fcoe interface then sets up added the
fc->ctlr.spma flag and stores spma mode address in ctl_src_addr.
In case the spma flag is set then:-
1. Adds spma mode MAC address in ctl_src_addr as secondary
MAC address, the FLOGI for FIP and pre-FIP will go out
using this address.
2. Cleans up stored spma MAC address in ctl_src_addr in
fcoe_netdev_cleanup.
3. Sets up spma bit in fip_flags for FIP solicitations along
with exiting FPMA bit setting.
4. Initialize the FLOGI FIP MAC descriptor to stored spma
MAC address in ctl_src_addr. This is used as proposed
FCoE MAC address from initiator along with both SPMA
and FPMA bit set in FIP solicitation, in response the
switch may grant any FPMA or SPMA mode MAC address to
initiator.
Removes FIP descriptor type checking against ELS type
ELS_FLOGI in fcoe_ctlr_encaps to update a FIP MAC descriptor,
instead now checks against FIP_DT_FLOGI.
I've tested this with available FPMA-only FCoE switch but
since data_src_addr is updated using same old code for
both FPMA and SPMA modes with FIP or pre-FIP links, so added
SPMA mode will work with SPMA-only switch also provided that
switch grants a valid MAC address.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently fcoe_netdev_config adds netdev pkt handler for fcoe pkts,
fcoe_if_create adds netdev pkt handler for fip packets, a secondary
MAC address is added by fcoe_netdev_config and then later cleanup
for these netdev related config/adds is done only during
fcoe_if_destroy and no cleanup done on error during fcoe interface
creation after above netdev config calling in fcoe_if_create.
So this patch adds single func for above mentioned cleanup the
fcoe_netdev_cleanup and then calls this func on either fcoe interface
destroy or exiting from fcoe_if_create due to an error after fcoe/fip
related above netdev config is done.
Moved netdev pkt handler addition code blocks for fip pkts close to
similar code block for foce pkt in fcoe_netdev_config, so that added
fcoe_netdev_cleanup could be called on error from fcoe_netdev_config
to undo these both additions for fcoe/fip pkt handlers. This move
required reference to fcoe_fip_recv in fcoe_netdev_config, so moved
fip related functions fcoe_fip_recv, fcoe_fip_send and
fcoe_update_src_mac above fcoe_netdev_config.
This consolidation will enable spma mode support in next patch to
easily add or delete spma mode mac address beside fixing current
no cleanup issue during error.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>