Commit Graph

535095 Commits

Author SHA1 Message Date
Vivien Didelot 2a778e1b58 net: dsa: change FDB routines prototypes
Change the prototype of port_getnext to include a vid parameter.

This is necessary to introduce the support for VLAN.

Also rename the fdb_{add,del,getnext} function pointers to
port_fdb_{add,del,getnext} since they are specific to a given port.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:03:19 -07:00
Vivien Didelot c5723ac51f net: dsa: mv88e6xxx: rename ATU MAC accessors
Rename the __mv88e6xxx_{read,write}_addr functions to more explicit
_mv88e6xxx_atu_mac_{read,write} functions, which also respect the single
underscore convention used in the file (meaning SMI lock must be held).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:03:19 -07:00
Vivien Didelot 194fea7bd8 net: dsa: mv88e6xxx: extend fid mask
The driver currently manages one FID per port (or bridge group), with a
mask of DSA_MAX_PORTS bits, where 0 means that the FID is in use.

The Marvell 88E6xxx switches support up to 4094 FIDs (from 1 to 0xfff;
FID 0 means that multiple address databases are not being used).

This patch changes the fid_mask for an fid_bitmap of 4096 bits.

>From now on, FIDs 1 to num_ports are reserved for non-bridged ports and
bridge groups (a bridge group gets the FID of its first member). The
remaining bits will be reserved for VLAN entries.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:03:18 -07:00
Vivien Didelot a08df0f0f7 net: dsa: mv88e6xxx: define GLOBAL_ATU_FID
Define register GLOBAL_ATU_FID instead of the raw value 0x01.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:03:18 -07:00
David S. Miller cdf0969763 Revert "Merge branch 'mv88e6xxx-switchdev-fdb'"
This reverts commit f1d5ca4344, reversing
changes made to 4933d85c51.

I applied v2 instead of v3.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 12:00:37 -07:00
Kaixu Xia 2c9c3bbbbf bpf: s390: Fix build error caused by the struct bpf_array member name changed
There is a build error that "'struct bpf_array' has no member
named 'prog'" on s390. In commit 2a36f0b92e ("bpf: Make the
bpf_prog_array_map more generic"), the member 'prog' of struct
bpf_array is replaced by 'ptrs'. So this patch fixes it.

Fixes: 2a36f0b92e ("bpf: Make the bpf_prog_array_map more generic")
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 11:49:40 -07:00
David S. Miller 15ee8fcdd2 Merge branch 'thunder-acpi'
David Daney says:

====================
net: thunder: Add ACPI support.

Change from v1:  Drop PHY binding part, use fwnode_property* APIs.

The first patch (1/2) rearranges the existing code a little with no
functional change to get ready for the second.  The second (2/2) does
the actual work of adding support to extract the needed information
from the ACPI tables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 11:47:57 -07:00
David Daney 46b903a01c net, thunder, bgx: Add support to get MAC address from ACPI.
Currently there is no way to get the MAC address in a firmware
independent manner, so set the MAC address of the device directly from
the ACPI tables.

The binding agrees with the proposed standard here:

http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf

Based on code from: Narinder Dhillon <ndhillon@cavium.com>
                    Tomasz Nowicki <tomasz.nowicki@linaro.org>
                    Robert Richter <rrichter@cavium.com>

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 11:47:57 -07:00
Robert Richter de387e1156 net: thunder: Factor out DT specific code in BGX
Separate DT code in preparation for follow-on ACPI integration.

Based on code from: Tomasz Nowicki <tomasz.nowicki@linaro.org>

Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 11:47:57 -07:00
Atzm Watanabe 07a51cd379 vxlan: fix fdb_dump index calculation
When too many remotes are bound to an FDB entry, index may not be increased.
This problem will be caused on the large scale environment that is based on
the unicast default destination, for instance.

Signed-off-by: Atzm Watanabe <atzm@iij.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 21:15:18 -07:00
Fabio Estevam ed8db18dea mellanox: mlxsw: Use '%zx' to print size_t format
Use '%zx' to print size_t format in order to fix the following build warning:

drivers/net/ethernet/mellanox/mlxsw/item.h:65:3: warning: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Wformat=]

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 21:12:49 -07:00
yalin wang bec7a630a6 isdn: Remove reverse_bits(), use revbit8()
This change isdn driver, remove reverse_bits() function,
use the generic revbit8() function instead.

Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:29:04 -07:00
Dan Carpenter 2f3a87326d cxgb4: cleanup some indenting
Add or remove some tabs so that statements line up correctly.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:26:53 -07:00
Andrew Lunn 6bc6d0a881 dsa: Support multiple MDIO busses
When using a cluster of switches, some topologies will have an MDIO
bus per switch, not one for the whole cluster. Allow this to be
represented in the device tree, by adding an optional mii-bus property
at the switch level. The old platform_device method of instantiation
supports this already, so only the device tree binding needs extending
with an additional optional phandle.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:25:02 -07:00
Andrew Lunn 966bce38e7 net: dsa: mv88e6352: Use mnemonics for EEPROM registers and bits
Add register definitions #defines for accessing the EEPROM.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:24:06 -07:00
David S. Miller c73a91b8f9 Merge branch 'ovs-gre'
Pravin B Shelar says:

====================
GRE: Use flow based tunneling for OVS GRE vport.

Following patches make use of new Using GRE tunnel meta data
collection feature. This allows us to directly use netdev
based GRE tunnel implementation. While doing so I have
removed GRE demux API which were targeted for OVS. Most
of GRE protocol code is now consolidated in ip_gre module.

v5-v4:
Fixed Kconfig dependency for vport-gre module.

v3-v4:
Added interface to ip-gre device to enable meta data collection.
While doing this I split second patch into two patches.

v2-v3:
Add API to create GRE flow based device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:55 -07:00
Pravin B Shelar 9f57c67c37 gre: Remove support for sharing GRE protocol hook.
Support for sharing GREPROTO_CISCO port was added so that
OVS gre port and kernel GRE devices can co-exist. After
flow-based tunneling patches OVS GRE protocol processing
is completely moved to ip_gre module. so there is no need
for GRE protocol hook. Following patch consolidates
GRE protocol related functions into ip_gre module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Pravin B Shelar b2acd1dc39 openvswitch: Use regular GRE net_device instead of vport
Using GRE tunnel meta data collection feature, we can implement
OVS GRE vport. This patch removes all of the OVS
specific GRE code and make OVS use a ip_gre net_device.
Minimal GRE vport is kept to handle compatibility with
current userspace application.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Pravin B Shelar 2e15ea390e ip_gre: Add support to collect tunnel metadata.
Following patch create new tunnel flag which enable
tunnel metadata collection on given device.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Pravin B Shelar a9020fde67 openvswitch: Move tunnel destroy function to oppenvswitch module.
This function will be used in gre and geneve vport implementations.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 14:03:54 -07:00
Rick Jones fb811395cd net: add explicit logging and stat for neighbour table overflow
Add an explicit neighbour table overflow message (ratelimited) and
statistic to make diagnosing neighbour table overflows tractable in
the wild.

Diagnosing a neighbour table overflow can be quite difficult in the wild
because there is no explicit dmesg logged.  Callers to neighbour code
seem to use net_dbg_ratelimit when the neighbour call fails which means
the "base message" is not emitted and the callback suppressed messages
from the ratelimiting can end-up juxtaposed with unrelated messages.
Further, a forced garbage collection will increment a stat on each call
whether it was successful in freeing-up a table entry or not, so that
statistic is only a hint.  So, add a net_info_ratelimited message and
explicit statistic to the neighbour code.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:46:21 -07:00
Nikolay Aleksandrov a7854037da bridge: netlink: add support for vlan_filtering attribute
This patch adds the ability to toggle the vlan filtering support via
netlink. Since we're already running with rtnl in .changelink() we don't
need to take any additional locks.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:36:43 -07:00
David S. Miller e72ee3ed51 Merge branch 'qlcnic-enhancements'
Shahed Shaikh says:

====================
qlcnic: enhancements

This series adds few enhancements.

  o Patch from Harish reorders the sequence of header files inclusion,
    keeping kernel's header files on top.

  o Firmware introduced a new feature which allows driver to increases
    the size of firmware dump of iSCSI function which is being collected
    by NIC driver.

  o Print buffer address which is holding a firmware dump.

  o Use vzalloc() instead kzalloc() for allocating large chunk of memory
    which will avoid potential memory allocation failure.

  o Add new device ID for 0x8C30 which is a 83xx series based VF function.

Please apply this series to net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:29 -07:00
Shahed Shaikh 02509f171d qlcnic: Update version to 5.3.63
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:29 -07:00
Shahed Shaikh 0e90ad9bfd qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory
Driver allocates a large chunk of temporary buffer using kzalloc
to copy FW image. As there is no real need of this memory to be
physically contiguous, use vzalloc instead.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Shahed Shaikh da286a6fd1 qlcnic: Add new VF device ID 0x8C30
This is a 83xx series based VF device

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Shahed Shaikh 642de51025 qlcnic: Print firmware minidump buffer and template header addresses
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Shahed Shaikh d01a6d3c8a qlcnic: Add support to enable capability to extend minidump for iSCSI
In some cases it is required to capture minidump for iSCSI functions
as part of default minidump collection process. To enable this, firmware
exports it's capability and driver need to enable that capability
by issuing a mailbox command.

With this feature, firmware can provide additional iSCSI function's
minidump with smaller minidump capture mask (0x1f).

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Harish Patil a930a4639d qlcnic: Rearrange ordering of header files inclusion
Include local headers files after kernel's header files.

Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:34:28 -07:00
Madalin Bucur 07151bc9f7 net: phy: select copper mode when Marvel 88e1111 in SGMII
For the Marvel 88e1111 PHY only two SGMII modes are available, both
allowing only SGMII to copper mode (with or without clock). SGMII
to fiber mode is not supported. Make sure the fiber/copper registers
selector bits are cleared for selecting copper mode.

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:31:18 -07:00
Kevin Hao c4bc44c65b net: fec: fix the race between xmit and bdp reclaiming path
When we transmit a fragmented skb, we may run into a race like the
following scenario (assume txq->cur_tx is next to txq->dirty_tx):
           cpu 0                                          cpu 1
  fec_enet_txq_submit_skb
    reserve a bdp for the first fragment
    fec_enet_txq_submit_frag_skb
       update the bdp for the other fragment
       update txq->cur_tx
                                                   fec_enet_tx_queue
                                                     bdp = fec_enet_get_nextdesc(txq->dirty_tx, fep, queue_id);
                                                     This bdp is the bdp reserved for the first segment. Given
                                                     that this bdp BD_ENET_TX_READY bit is not set and txq->cur_tx
                                                     is already pointed to a bdp beyond this one. We think this is a
                                                     completed bdp and try to reclaim it.
    update the bdp for the first segment
    update txq->cur_tx

So we shouldn't update the txq->cur_tx until all the update to the
bdps used for fragments are performed. Also add the corresponding
memory barrier to guarantee that the update to the bdps, dirty_tx and
cur_tx performed in the proper order.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10 13:28:14 -07:00
David S. Miller 2cf1b5ce16 Merge branch 'mlxsw-fixes'
Jiri Pirko says:

====================
mlxsw: Couple of fixes/adjustments

Ido Schimmel (5):
  mlxsw: Call free_netdev when removing port
  mlxsw: Make system port to local port mapping explicit
  mlxsw: Simplify mlxsw_sx_port_xmit function
  mlxsw: Use correct skb length when dumping payload
  mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit

Jiri Pirko (2):
  mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
  mlxsw: Strip FCS from incoming packets
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:31 -07:00
Ido Schimmel e577516b9d mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit
Store the length of the skb before transmitting it and use it for stats
instead of skb->len, since skb might have been freed already.

This issue was discovered using the Kernel Address sanitizer (KASan).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Ido Schimmel 3bfcd34764 mlxsw: Use correct skb length when dumping payload
Do not use the length of the transmitted skb (which was freed), but
that of the response skb.

This issue was discovered using the Kernel Address sanitizer (KASan).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Ido Schimmel d003462a50 mlxsw: Simplify mlxsw_sx_port_xmit function
Previously we only checked if the transmission queue is not full in the
middle of the xmit function. This lead to complex logic due to the fact
that sometimes we need to reallocate the headroom for our Tx header.

Allow the switch driver to know if the transmission queue is not full
before sending the packet and remove this complex logic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Jiri Pirko 7b7b9cff74 mlxsw: Strip FCS from incoming packets
FCS of incoming packets is already checked by HW. Just strip it out.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:10 -07:00
Jiri Pirko 74ed207e2a mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM
This resolves compile errors on um-allyesconfig.

Note that there are many other drivers which have the same issue.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:09 -07:00
Ido Schimmel e61011b5e0 mlxsw: Make system port to local port mapping explicit
System ports are unique identifiers in a multi-ASIC environment that
represent all the available ports in the system. Local ports on the
other hand, are unique only within the local ASIC.

Since system port to local port mapping is not part of the HW-SW
contract and since only single-ASIC configurations are currently
supported, set an explicit 1:1 mapping by configuring the Switch System
Port Record (SSPR) register.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:09 -07:00
Ido Schimmel 26a80f6e54 mlxsw: Call free_netdev when removing port
When removing a port's netdevice we should also free the memory
allocated by alloc_etherdev(). Do this by calling free_netdev() at the
end of the teardown sequence.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:54:09 -07:00
Masanari Iida ecea49914b net: ethernet: Fix double word "the the" in eth.c
This patch fix double word "the the" in
Documentation/DocBook/networking/API-eth-get-headlen.html
Documentation/DocBook/networking/netdev.html
Documentation/DocBook/networking.xml

These files are generated from comment in source,
so I have to fix comment in net/ethernet/eth.c.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:53:00 -07:00
Shaohui Xie 0024f89200 net: phy: add RealTek RTL8211DN phy id
RTL8211DN is compatible with RTL8211E.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:52:15 -07:00
Robert Shearman 118d523463 mpls: Enforce payload type of traffic sent using explicit NULL
RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.

Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:51:42 -07:00
David S. Miller d74a790d52 Merge branch 'bpf-perf'
Kaixu Xia says:

====================
bpf: Introduce the new ability of eBPF programs to access hardware PMU counter

This patchset is base on the net-next:
 git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
commit 9dc20a6496.

Previous patch v6 url:
https://lkml.org/lkml/2015/8/4/188

changes in V7:
 - rebase the whole patch set to net-next tree(9dc20a64);
 - split out the core perf APIs into Patch 1/5;
 - change the return value of function perf_event_attrs()
   from struct perf_event * to const struct perf_event * in
   Patch 1/5;
 - rename the function perf_event_read_internal() to perf_event_
   read_local() and rewrite it in Patch 1/5;
 - rename the function check_func_limit() to check_map_func
   _compatibility() and remove the unnecessary pass pointer to
   a pointer in Patch 4/5;

changes in V6:
 - make the Patch 1/4 commit message more meaning and readable;
 - remove the unnecessary comment in Patch 2/4 and make it clean;
 - declare the function perf_event_release_kernel() in include/
   linux/perf_event.h to fix the build error when CONFIG_PERF_EVENTS
   isn't configured in Patch 2/4;
 - add function perf_event_attrs() to get the struct perf_event_attr
   in Patch 2/4.
 - move the related code from kernel/trace/bpf_trace.c to kernel/
   events/core.c and add function perf_event_read_internal() to
   avoid poking inside of the event outside of perf code in Patch 3/4;
 - generial the func & map match-pair with an array in Patch 3/4;

changes in V5:
 - move struct fd_array_map_ops* fd_ops to bpf_map;
 - move array perf event decrement refcnt function to
   map_free;
 - fix the NULL ptr of perf_event_get();
 - move bpf_perf_event_read() to kernel/bpf/bpf_trace.c;
 - get rid of the remaining struct bpf_prog;
 - move the unnecessay cast on void *;

changes in V4:
 - make the bpf_prog_array_map more generic;
 - fix the bug of event refcnt leak;
 - use more useful errno in bpf_perf_event_read();

changes in V3:
 - collapse V2 patches 1-3 into one;
 - drop the function map->ops->map_traverse_elem() and release
   the struct perf_event in map_free;
 - only allow to access bpf_perf_event_read() from programs;
 - update the perf_event_array_map elem via xchg();
 - pass index directly to bpf_perf_event_read() instead of
   MAP_KEY;

changes in V2:
 - put atomic_long_inc_not_zero() between fdget() and fdput();
 - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
 - Only read the event counter on current CPU or on current
   process;
 - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
   pointer to the struct perf_event;
 - according to the perf_event_map_fd and key, the function
   bpf_perf_event_read() can get the Hardware PMU counter value;

Patch 5/5 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
value when 'kprobe/sys_write' sampling)

  $ cat /sys/kernel/debug/tracing/trace_pipe
  $ ./tracex6
       ...
       syslog-ng-548   [000] d..1    76.905673: : CPU-0   681765271
       syslog-ng-548   [000] d..1    76.905690: : CPU-0   681787855
       syslog-ng-548   [000] d..1    76.905707: : CPU-0   681810504
       syslog-ng-548   [000] d..1    76.905725: : CPU-0   681834771
       syslog-ng-548   [000] d..1    76.905745: : CPU-0   681859519
       syslog-ng-548   [000] d..1    76.905766: : CPU-0   681890419
       syslog-ng-548   [000] d..1    76.905783: : CPU-0   681914045
       syslog-ng-548   [000] d..1    76.905800: : CPU-0   681935950
       syslog-ng-548   [000] d..1    76.905816: : CPU-0   681958299
              ls-690   [005] d..1    82.241308: : CPU-5   3138451
              sh-691   [004] d..1    82.244570: : CPU-4   7324988
           <...>-699   [007] d..1    99.961387: : CPU-7   3194027
           <...>-695   [003] d..1    99.961474: : CPU-3   288901
           <...>-695   [003] d..1    99.961541: : CPU-3   383145
           <...>-695   [003] d..1    99.961591: : CPU-3   450365
           <...>-695   [003] d..1    99.961639: : CPU-3   515751
           <...>-695   [003] d..1    99.961686: : CPU-3   579047
       ...

The detail of patches is as follow:

Patch 1/5 add the necessary core perf APIs perf_event_attrs(),
perf_event_get(),perf_event_read_local() when accessing events
counters in eBPF programs

Patch 2/5 rewrites part of the bpf_prog_array map code and make it
more generic;

Patch 3/5 introduces a new bpf map type. This map only stores the
pointer to struct perf_event;

Patch 4/5 implements function bpf_perf_event_read() that get the
selected hardware PMU conuter;

Patch 5/5 gives a simple example.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:06 -07:00
Kaixu Xia 47efb30274 samples/bpf: example of get selected PMU counter value
This is a simple example and shows how to use the new ability
to get the selected Hardware PMU counter value.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:06 -07:00
Kaixu Xia 35578d7984 bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter
According to the perf_event_map_fd and index, the function
bpf_perf_event_read() can convert the corresponding map
value to the pointer to struct perf_event and return the
Hardware PMU counter value.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:06 -07:00
Kaixu Xia ea317b267e bpf: Add new bpf map type to store the pointer to struct perf_event
Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'.
This map only stores the pointer to struct perf_event. The
user space event FDs from perf_event_open() syscall are converted
to the pointer to struct perf_event and stored in map.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:05 -07:00
Wang Nan 2a36f0b92e bpf: Make the bpf_prog_array_map more generic
All the map backends are of generic nature. In order to avoid
adding much special code into the eBPF core, rewrite part of
the bpf_prog_array map code and make it more generic. So the
new perf_event_array map type can reuse most of code with
bpf_prog_array map and add fewer lines of special code.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:05 -07:00
Kaixu Xia ffe8690c85 perf: add the necessary core perf APIs when accessing events counters in eBPF programs
This patch add three core perf APIs:
 - perf_event_attrs(): export the struct perf_event_attr from struct
   perf_event;
 - perf_event_get(): get the struct perf_event from the given fd;
 - perf_event_read_local(): read the events counters active on the
   current CPU;
These APIs are needed when accessing events counters in eBPF programs.

The API perf_event_read_local() comes from Peter and I add the
corresponding SOB.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:50:05 -07:00
David S. Miller f1d5ca4344 Merge branch 'mv88e6xxx-switchdev-fdb'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: support switchdev FDB objects

This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB
objects.

The first two patches add minor but necessary changes to switchdev, the third
one implements the switchdev glue in DSA for FDB routines, and the remaining
ones refactor the FDB access functions in the mv88e6xxx code.

Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1):

    # bridge fdb add 3c:97:0e:11:30:6e dev swp2
    # bridge fdb add 3c:97:0e:11:40:78 dev swp3
    # bridge fdb add 3c:97:0e:11:50:86 dev swp4
    # bridge fdb del 3c:97:0e:11:40:78 dev swp3
    # bridge fdb
    01:00:5e:00:00:01 dev eth0 self permanent
    01:00:5e:00:00:01 dev eth1 self permanent
    00:50:d2:10:78:15 dev swp0 master br0 permanent
    3c:97:0e:11:30:6e dev swp2 self static
    00:50:d2:10:78:15 dev swp3 master br1 permanent
    3c:97:0e:11:50:86 dev swp4 self static
    # cat /sys/kernel/debug/dsa0/atu
    # DB   T/P  Vec State Addr
    # 001  Port 004   e   3c:97:0e:11:30:6e
    # 004  Port 010   e   3c:97:0e:11:50:86

For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged
ports and bridge groups, and the remaining will be later used by VLANs.

This change is necessary to welcome the support for hardware VLANs (which will
follow soon).

Changes in v2:

 - remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code

 - use ether_addr_copy instead of memcpy for MAC addresses

 - constify MAC address in port_fdb_{add,del}

 - split the mv88e6xxx code refactoring into several patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:48:10 -07:00
Vivien Didelot 878205101f net: dsa: mv88e6xxx: rework FDB add/del operations
Add a low level function for the ATU Load operation, and provide FDB add
and delete wrappers functions.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09 22:48:09 -07:00