Because bonding stats are usually sum of slave stats, it was
not easy to account for tx drops at bonding layer.
We can use dev->tx_dropped for this, as this counter is later
added to the device stats (in dev_get_stats())
This extends the idea we had in commit ee63771474 ("bonding: Simplify
the xmit function for modes that use xmit_hash") for bond_3ad_xor_xmit()
to other bonding modes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan says:
====================
sunvnet: Use multiple Tx queues.
v2: moved tcp fix out of this series per David Miller feedback
The primary objective of this patch-set is to address the suggestion from
http://marc.info/?l=linux-netdev&m=140790778931563&w=2
With the changes in Patch 2, every vnet_port will get packets from
a single tx-queue, and flow-control/head-of-line-blocking is
confined to the vnet_ports that share that tx queue (as opposed to
flow-controlling *all* peers).
Patch 1 is an optimization that resets the DATA_READY bit when
we re-enable Rx interrupts. This optimization lets us exit quickly
from vnet_event_napi() when new data has not triggered an interrupt.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use multple Tx netdev queues for sunvnet by supporting a one-to-one
mapping between vnet_port and Tx queue. Provide a ndo_select_queue
indirection (vnet_select_queue()) which selects the queue based
on the peer that would be selected in vnet_start_xmit()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When vnet_event_napi re-enables interrupts, it should
reset LDC_EVENT_DATA_READY as an optimization.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Challenge ACK is described in RFC 5961, fix typo.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes checkpatch warning:
"WARNING: Prefer seq_puts to seq_printf"
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is often quite helpful to be able to know the state of a transport
outside of the application itself (for troubleshooting purposes or for
monitoring purposes). Add it under /proc/net/sctp/remaddr.
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set RTL8152_UNPLUG when finding -ENODEV. This could accelerate
unloading the driver when the device is unplugged.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only count packets that failed cookie-authentication.
We can get SYNCOOKIESFAILED > 0 while we never even sent a single cookie.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x_msix_fp_int() and bnx2x_interrupt() run from hard interrupt
context.
They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context.
They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow the workload spreading via vRSS for IPv6.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fallback device is in ipv6 mode by default.
The mode can not be changed in runtime, so there
is no way to decapsulate ip4in6 packets coming from
various sources without creating the specific tunnel
ifaces for each peer.
This allows to update the fallback tunnel device, but only
the mode could be changed. Usual command should work for the
fallback device: `ip -6 tun change ip6tnl0 mode any`
The fallback device can not be hidden from the packet receiver
as a regular tunnel, but there is no need for synchronization
as long as we do single assignment.
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do assignment before if condition and test !skb like in rawv6_recvmsg()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apply commit e0f36310f7
("ipx: remove unnecessary casting on ntohl")
to all seq_printf/08lX
Inspired-by: "David S. Miller" <davem@davemloft.net>
Inspired-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
bpf: reduce verifier memory consumption and add tests
Small set of cleanups:
- reduce verifier memory consumption
- add verifier test to check register state propagation and state equivalence
- add JIT test reduced from recent nmap triggered crash
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
nmap generates classic BPF programs to filter ARP packets with given target MAC
which triggered a bug in eBPF x64 JIT. The bug was fixed in
commit e0ee9c1215 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
This patch is adding a testcase in eBPF instructions (those that
were generated by classic->eBPF converter) to be processed by JIT.
The test is primarily targeting JIT compiler.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- add a test specifically targeting verifier state pruning.
It checks state propagation between registers, storing that
state into stack and state pruning algorithm recognizing
equivalent stack and register states.
- add summary line to spot failures easier
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
verifier keeps track of register state spilled to stack.
registers are 8-byte wide and always aligned, so instead of tracking them
in every byte-sized stack slot, use MAX_BPF_STACK / 8 array to track
spilled register state.
Though verifier runs in user context and its state freed immediately
after verification, it makes sense to reduce its memory usage.
This optimization reduces sizeof(struct verifier_state)
from 12464 to 1712 on 64-bit and from 6232 to 1112 on 32-bit.
Note, this patch doesn't change existing limits, which are there to bound
time and memory during verification: 4k total number of insns in a program,
1k number of jumps (states to visit) and 32k number of processed insn
(since an insn may be visited multiple times). Theoretical worst case memory
during verification is 1712 * 1k = 17Mbyte. Out-of-memory situation triggers
cleanup and rejects the program.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck says:
====================
net: dsa: Fixes and enhancements
Patch 01/15 addresses a bug indicated by an an annoying and unhelpful
log message.
Patches 02/15 and 03/15 are minor enhancements, adding support for
known switch revisions.
Patches 04/15 and 05/15 add support for MV88E6352 and MV88E6176.
Patch 06/15 adds support for hardware monitoring, specifically for
reporting the chip temperature, to the dsa subsystem.
Patches 07/15 and 08/15 implement hardware monitoring for MV88E6352,
MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
Patch 09/15 and 10/15 add support for EEPROM access to the DSA subsystem.
Patch 11/15 implements EEPROM access for MV88E6352 and MV88E6176.
Patch 12/15 adds support for reading switch registers to the DSA
subsystem.
Patches 13/15 amd 14/15 implement support for reading switch registers
to the drivers for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
Patch 15/15 adds support for reading additional RMON registers to the drivers
for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
The series was tested on top of v3.18-rc2 in an x86 system with MV88E6352.
Testing in systems with 88E6131, 88E6060 and MV88E6165 was done earlier
(I don't have access to those systems right now). The series was also build
tested using my build system at http://server.roeck-us.net:8010/builders.
Look into the 'dsa' column for build results.
The series merges cleanly into net-next as of today (10/29).
v3:
- Fix bug in eeprom patches seen if devicetree is enabled:
eeprom-length property is attached to switch devicetree node,
not to dsa node, and there was a compile error.
v2:
- Made reporting chip temperatures through the hwmon subsystem optional
with new Kconfig option
- Changed the hwmon chip name to <network device name>_dsa<index>
- Made EEPROM presence and size configurable through platform and devicetree
data
- Various minor changes and fixes (see individual patches for details)
====================
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Display sw_in_discards, sw_in_filtered, and sw_out_filtered for chips
supported by mv88e6123_61_65 and mv88e6352 drivers.
The variables are provided in port registers, not the normal status registers.
Mark by adding 0x100 to the register offset and add special handling code
to mv88e6xxx_get_ethtool_stats.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The infrastructure can now report switch registers to ethtool.
Add support for it to the mv88e6123_61_65 driver.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading switch registers with 'ethtool -d'.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6352 supports read and write access to its configuration eeprom.
There is no means to detect if an EEPROM is connected to the switch.
Also, the switch supports EEPROMs with different sizes, but can not detect
or report the type or size of connected EEPROMs. Therefore, do not implement
the get_eeprom_len callback but depend on platform or devicetree data to
provide information about EEPROM presence and size.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dsa core now supports reading from and writing to a switch EEPROM
if connected. Describe optional devicetree property indicating that
an EEPROM is present and its size.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6123 and compatible chips support reading the chip temperature
from PHY register 6:26.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6352 supports reading the chip temperature from two PHY registers,
6:26 and 6:27. Report it using the more accurate register 6:27.
Also report temperature limit and alarm.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6176 is mostly compatible to MV88E6352 and is documented
in the same functional specification. Add support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell 88E6352 is mostly compatible to MV88E6123/61/65,
but requires indirect phy access. Also, its configuration
registers are a bit different.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report known silicon revisions when probing Marvell 88E6131 switches.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report known silicon revisions when probing Marvell 88E6060 switches.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting skb->protocol to a private protocol type may result in warning
messages such as
e1000e 0000:00:19.0 em1: checksum_partial proto=dada!
This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set
to CHECKSUM_PARTIAL. Looking through the code, it appears that changing
skb->protocol for transmitted packets is not necessary and may actually
be harmful. For example, it prevents purposely unmodified (from a DSA
perspective) network drivers from properly setting up their transmit
checksum offload pointers since they inspect skb->protocol to set up the
IPv4 header or IPv6 header pointers. So don't unnecessarily change the
protocol field.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.
Here are some numbers with 30 000 dummy interfaces configured:
Before the patch:
$ time rmmod dummy
real 2m0.118s
user 0m0.000s
sys 1m50.048s
After the patch:
$ time rmmod dummy
real 1m9.970s
user 0m0.000s
sys 0m47.976s
Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_schedule() can be called from any context and has to mask hard
irqs.
Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.
Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Vrabel says:
====================
xen-netback: minor cleanups
Two minor xen-netback cleanups originally from Zoltan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This flag is unnecessary, it came from some old code.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the interrupt handler still calls napi_complete. Although it
won't schedule NAPI again as either NAPI_STATE_DISABLE or
NAPI_STATE_SCHED is set, it is just unnecessary, and it makes more
sense to do this way.
Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: support nway_reset
Fix the CHECK from checkpatch.pl and support nway_reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace tx_underun with tx_underrun for checkpatch.pl.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.
Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.
Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.
[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several warnings and errors of coding style rules corrected.
Compile tested.
Signed-off-by: Roberto Medina <robertoxmed@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>