We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look
very similar, yet they serve very different purpose. This is confusing for
anyone trying to implement a user space tool supporting lwt.
As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix
them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more
logical to have them in lwtunnel.h together with the encap enum.
Fixes: 3093fbe7ff ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0b50dc4fc9 ("Convert smsc911x to use ACPI as well as DT") makes
the call to smsc911x_probe_config() unconditional, and no longer fails if
there is no device node. device_get_phy_mode() is called unconditionally,
and if there is no phy node configured returns an error code. This error
code is assigned to phy_interface, and interpreted elsewhere in the code
as valid phy mode. This in turn causes qemu to crash when running a
variant of realview_pb_defconfig.
qemu: hardware error: lan9118_read: Bad reg 0x86
Fixes: 0b50dc4fc9 ("Convert smsc911x to use ACPI as well as DT")
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-08-17
1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
From Thomas Egerer.
2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
From Andrzej Hajda.
3) Pass oif to the xfrm lookups so that it gets set on the flow
and the resolver routines can match based on oif.
From David Ahern.
4) Add documentation for the new xfrm garbage collector threshold.
From Alexander Duyck.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse builds have been warning for a really long time now
that etherdevice.h has a conversion that is unsafe.
include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to integer
This code change fixes the issue and generates the exact
same assembly before/after (checked on x86_64)
Fixes: 2c722fe1c8 (etherdevice: Optimize a few is_<foo>_ether_addr functions)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter says:
====================
net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
This series adds a new private net_device flag indicating that a device may
(and probably should) be used without a queueing discipline attached to it.
This is already common practice for many virtual device types like e.g.
loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these
devices lack an underlying layer which could impose back pressure and therefore
making a TX queue necessary to not slow down senders.
Up to now, drivers being aware of the above applying to them set
dev->tx_queue_len to zero to indicate no qdisc should be attached to the
interface they drive and the kernel reacts upon this by assigning the noop
qdisc instead of the default pfifo_fast. This implicit agreement though leads
to an inconvenient situation once a user tries to attach a real qdisc to these
devices, as the formerly special tx_queue_len value becomes a regular one,
limiting the queue to zero packets and thus prevents any TX from happening. To
overcome this, practically all qdisc implementations intercept and sanitize the
malicious value.
With this series applied, drivers may signal the lack of need for a qdisc
without having to tamper with tx_queue_len, making fallbacks in qdiscs and
caveats in userspace unnecessary.
Upon upstream acceptance, this series will be followed up by a set of patches
converting device drivers, adding a warning so out-of-tree driver authors get
aware of this change and dropping all special handling of tx_queue_len in
net/sched/.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A zero length payload means that no TLV (Type Length Value) data has
been passed. Prior to this patch a non-existing TLV could be sanity
checked with TLV_OK() resulting in random behavior where a user
sending an empty message occasionally got a incorrect "operation not
supported" message back.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware tells driver in case bandwidth configuration for
a specific function exists, but [regretably] the same field has different
meanings depending on the multi-function mode - it can either be
a percentile value or an actual speed.
For newer multi-function modes current logic is incorrect -
driver understands values as actual speeds instead of percentages,
causing the resulting chip configuration to be incorrect.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup()
does not.
This patch solves the typical message at reboot time or device
dismantle :
unregister_netdevice: waiting for eth0 to become free. Usage count = 4
Fixes: 3bfd847203 ("net: Use passed in table for nexthop lookups")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We receive all 802.15.4 frames on the packet handler "lowpan_rcv" this
patch checks if the wpan device belongs to a lowpan interface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes 802.15.4 packet layer registration when mutliple
lowpan interfaces will be added. We need to register the packet layer at
the first lowpan interface and deregister it at the last interface. This
done by open_count variable which is protected by rtnl.
Additional do a quiet fix by adding dev_put(real_dev) when netdev
registration fails, which fix the refcount for the wpan dev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
need a correctly set skb network header.
Unfortunately we cannot rely on the device drivers to set it for us.
Therefore setting it in the beginning of the according ndo_start_xmit
handler.
Fixes: 1d8ab8d3c1 ("batman-adv: Modified forwarding behaviour for multicast packets")
Fixes: ab49886e3d ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
When an interface is purged, the broadcast packets scheduled for this
interface should get purged as well.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The list_del() calls were changed to list_del_init() to prevent
an accidental double deletion in batadv_tt_req_node_new().
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
So far the mcast tvlv handler did not anticipate the processing of
multiple incoming OGMs from the same originator at the same time. This
can lead to various issues:
* Broken refcounting: For instance two mcast handlers might both assume
that an originator just got multicast capabilities and will together
wrongly decrease mcast.num_disabled by two, potentially leading to
an integer underflow.
* Potential kernel panic on hlist_del_rcu(): Two mcast handlers might
one after another try to do an
hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will
cause memory corruption / crashes.
(Reported by: Sven Eckelmann <sven@narfation.org>)
Right in the beginning the code path makes assumptions about the current
multicast related state of an originator and bases all updates on that. The
easiest and least error prune way to fix the issues in this case is to
serialize multiple mcast handler invocations with a spinlock.
Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 60432d756c ("batman-adv: Announce new capability via multicast TVLV")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: e17931d1a6 ("batman-adv: introduce capability initialization bitfield")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 3f4841ffb3 ("batman-adv: tvlv - add network coding container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 17cf0ea455 ("batman-adv: tvlv - add distributed arp table container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The bit was not according to ieee80211 specification.
Fix that.
Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of using the out-of-line average calculation, use the new
DECLARE_EWMA() macro to declare a signal EWMA, and use that.
This actually *reduces* the code size slightly (on x86-64) while
also reducing the station info size by 80 bytes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Having the EWMA parameters stored in the runtime struct imposes
memory requirements for the constant values that could just be
inlined in the code. This particularly makes sense if there are
a lot of such structs, for example in mac80211 in the station
table where each station has a number of these in an array, and
there can be many stations.
Provide a macro DECLARE_EWMA() that declares the necessary struct
and inline functions to access it with the parameters hard-coded;
using this also means the user no longer needs to 'select AVERAGE'
as it's entirely self-contained.
In the mac80211 case, on x86-64, this actually slightly *reduces*
code size, while also saving 80 bytes of runtime memory per sta.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
During hwsim_init_netlink(), we should call genl_unregister_family()
if failed on netlink_register_notifier() since the genetlink is
already registered.
Signed-off-by: Su Kang Yin <cantona@cantona.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define rc_rateidx_vht_mcs_mask array and rate_idx_match_vht_mcs_mask()
method in order to apply mcs mask for vht rates
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define rate_control_apply_mask_ratetbl() in order to apply ratemask in
rate_control_set_rates() for station rate table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove ieee80211_tx_rate dependency in rate_idx_match_legacy_mask(),
rate_idx_match_mcs_mask() and rate_idx_match_mask() in order to use the
previous logic to define a ratemask in rate_control_set_rates() for
station rate table. Moreover move rate mask definition logic in
rate_control_cap_mask()
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove unnecessary ieee80211_tx_info pointer from rate_control_apply_mask
signature. rate_control_apply_mask() will be used to define a ratemask in
rate_control_set_rates() for station rate table
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Perform the BSS_CHANGED_BSSID action when joining an OCB network.
This is required to set the broadcast BSSID in some network drivers.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently OCB mode accepts frames with bssid==broadcast and type!=beacon.
Some non-data frames are sent matching this, for example probe responses.
This results in unnecessary creation of STA entries.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To make mac80211 accept the multicast rate requested by the user the
rate control should be told that it is operating in BSS mode.
Without this, the default rate is selected in rate_control_send_low
(!pubsta and !txrc->bss)
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow setting multicast rate on OCB interfaces.
Current behaviour results in EOPNOTSUPP when attempting this.
Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If driver failed to setup wiphy params (e.g. rts
threshold, fragmentation treshold) userspace
wasn't properly notified about this. This could
lead to user confusion who would think the command
succeeded even if that wasn't the case.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The original assumption of 20MHz wide channels hasn't been true since
the addition of support for 5 and 10 MHz channels.
Change the code to no longer disable all channels that don't fit into
the 20MHz grid, but instead set the appropriate flags to disable
operation on specific bandwidths.
Signed-off-by: Matthias May <matthias.may@neratec.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
debugfs_create_bool is asking to put u32 type pointer instead of bool
so that passing bool type with u32* cast will cause memory corruption
to read that value since it is handled by 4 bytes instead of 1 byte
inside.
Signed-off-by: Ben Young Tae Kim <ytkim@qca.qualcomm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
David Ahern says:
====================
VRF-lite - v6
In the context of internet scale routing a requirement that always comes
up is the need to partition the available routing tables into disjoint
routing planes. A specific use case is the multi-tenancy problem where
each tenant has their own unique routing tables and in the very least
need different default gateways.
This patch allows the ability to create virtual router domains (aka VRFs
(VRF-lite to be specific) in the linux packet forwarding stack. The main
observation is that through the use of rules and socket binding to interfaces,
all the facilities that we need are already present in the infrastructure. What
is missing is a handle that identifies a routing domain and can be used to
gather applicable rules/tables and uniqify neighbor selection. The scheme used
needs to preserves the notions of ECMP, and general routing principles.
This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.
The very important point to note is that this is only a Layer3 concept
so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)
High Level points
=================
1. Simple overlay driver (minimal changes to current stack)
* uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
* Applications can use SO_BINDTODEVICE or cmsg device indentifiers
to pick VRF (ping, traceroute just work)
* Standard IP Rules work, and since they are aggregated against the
device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
the routing plane (and ARP)
N2
N1 (all configs here) +---------------+
+--------------+ | |
|swp1 :10.0.1.1+----------------------+swp1 :10.0.1.2 |
| | | |
|swp2 :10.0.2.1+----------------------+swp2 :10.0.2.2 |
| | +---------------+
| VRF 1 |
| table 5 |
| |
+---------------+
| |
| VRF 2 | N3
| table 6 | +---------------+
| | | |
|swp3 :10.0.2.1+----------------------+swp1 :10.0.2.2 |
| | | |
|swp4 :10.0.3.1+----------------------+swp2 :10.0.3.2 |
+--------------+ +---------------+
Given the topology above, the setup needed to get the basic VRF
functions working would be
Create the VRF devices and associate with a table
ip link add vrf1 type vrf table 5
ip link add vrf2 type vrf table 6
Install the lookup rules that map table to VRF domain
ip rule add pref 200 oif vrf1 lookup 5
ip rule add pref 200 iif vrf1 lookup 5
ip rule add pref 200 oif vrf2 lookup 6
ip rule add pref 200 iif vrf2 lookup 6
ip link set vrf1 up
ip link set vrf2 up
Enslave the routing member interfaces
ip link set swp1 master vrf1
ip link set swp2 master vrf1
ip link set swp3 master vrf2
ip link set swp4 master vrf2
Connected and local routes are automatically moved from main and local
tables to the VRF table.
ping using VRF0 is simply
ping -I vrf0 10.0.1.2
Design Highlights
=================
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
The master device index is used as the iif for all lookups.
2. Tx path
Similarly, for Tx the VRF device oif is used in the flow to direct
lookups to the table associated with the VRF via its rule. From there
the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
not be used for FIB table lookups.
3. Connected and local routes
On link up for a device, connected and local routes are added to the
table associated with the VRF device, rather than the local and main
tables.
4. Socket lookups
Sockets operating in the VRF must be bound to the VRF device. As such
socket lookups compare the VRF device index to sk_bound_dev_if.
5. Neighbor entries
Neighbor entries are not impacted by the VRF device. Entries are
associated with a particular interface; the VRF association is indirect
via the interface-to-VRF device enslavement.
Version 6
- addressed comments from DaveM
- added patch to properly set oif in ip_send_unicast_reply. Needs to be
set to VRF device for proper FIB lookup
- added patch to handle IP fragments
Version 5
- dropped patch regarding socket lookups; no longer needed
+ removed vrf helpers no longer needed after this patch is dropped
- removed dev_open and close operations
+ no need to reset vrf data on an ifdown and creates problems if a
slave is deleted while the vrf interface is down (Thanks, Nikolay)
- cleanups for sparse warnings
+ make C=2 is now clean for vrf driver
Version 4
- builds are clean with and without VRF device enabled (no, yes and module)
- tightened the driver implementation
+ device add/delete, slave add/remove, and module unload are all clean
- fixed RCU references
+ with RCU and lock debugging enabled changes are clean through the
suite of tests
- TX path uses custom dst, so patch refactoring rtable allocation is
dropped along with the patch adding rt_nexthop helper
- dropped the task patch that adds default bind to interface for sockets
and the associated chvrf example command
+ the patches are a convenience for running unmodified code. They
are not needed for the core functionality. Any application with
support for SO_BINDTODEVICE works properly with this patch set.
Version 3
- addressed comments from first 2 RFCs with the exception of the name
Nicolas: We will do the name conversion once we agree on what the
correct name should be (vrf, mrf or something else)
- packets flow through the VRF device in both directions allowing the
following:
- tcpdump -i vrf<n>
- tc rules on vrf device
- netfilter rules on vrf device
TO-DO
=====
1. IPv6
2. ipsec, xfrms
- dst patch accepted into ipsec-next; will post VRF patch once merge happens
3. listen filter to allow 1 socket to work with multiple VRF devices
- i.e., bind to VRF's a, b, c only or NOT VRFs e, f, g
Eric B:
I have ipsec working with VRFs implemented using the VRF driver,
including the worst case scenario of complete duplication in the
networking config.
Thanks to Nikolay for his many, many code reviews whipping the device
driver into shape, and bug-Fixes and ideas from Hannes, Roopa Prabhu,
Jon Toppins, Jamal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver borrows heavily from IPvlan and teaming drivers.
Routing domains (VRF-lite) are created by instantiating a VRF master
device with an associated table and enslaving all routed interfaces that
participate in the domain. As part of the enslavement, all connected
routes for the enslaved devices are moved to the table associated with
the VRF device. Outgoing sockets must bind to the VRF device to function.
Standard FIB rules bind the VRF device to tables and regular fib rule
processing is followed. Routed traffic through the box, is forwarded by
using the VRF device as the IIF and following the IIF rule to a table
that is mated with the VRF.
Example:
Create vrf 1:
ip link add vrf1 type vrf table 5
ip rule add iif vrf1 table 5
ip rule add oif vrf1 table 5
ip route add table 5 prohibit default
ip link set vrf1 up
Add interface to vrf 1:
ip link set eth1 master vrf1
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fragmentation cache uses information from the IP header to reassemble
packets. That information can be duplicated across VRFs -- same source
and destination addresses, protocol and id. Handle fragmentation with
VRFs by adding the VRF device index to entries in the cache and the
lookup arg.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If output device is not specified use VRF device if input device is
enslaved. This is needed to ensure tcp acks and resets go out VRF device.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a user passes in a table for new routes use that table for nexthop
lookups. Specifically, this solves the case where a connected route does
not exist in the main table, but only another table and then a subsequent
route is added with a next hop using the connected route. ie.,
$ ip route ls
default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
169.254.0.0/16 dev eth0 scope link metric 1003
192.168.56.0/24 dev eth1 proto kernel scope link src 192.168.56.51
$ ip route ls table 10
1.1.1.0/24 dev eth2 scope link
Without this patch adding a nexthop route fails:
$ ip route add table 10 2.2.2.0/24 via 1.1.1.10
RTNETLINK answers: Network is unreachable
With this patch the route is added successfully.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a device associated with a VRF is brought up or down routes
should be added to/removed from the table associated with the VRF.
fib_magic defaults to using the main or local tables. Have it use
the table with the device if there is one.
A part of this is directing prefsrc validations to the correct
table as well.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.
inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
if the passed in device is enslaved to a VRF then the table for that VRF
is used for the lookup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently inet_addr_type and inet_dev_addr_type expect local addresses
to be in the local table. With the VRF device local routes for devices
associated with a VRF will be in the table associated with the VRF.
Provide an alternate inet_addr lookup to use a specific table rather
than defaulting to the local table.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For unconnected UDP sockets using a VRF device lookup source address
based on VRF table. This allows the UDP header to be properly setup
before showing up at the VRF device via the dst.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As with ingress use the index of VRF master device for route lookups on
egress. However, the oif should only be used to direct the lookups to a
specific table. Routes in the table are not based on the VRF device but
rather interfaces that are part of the VRF so do not consider the oif for
lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this
latter part.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On ingress use index of VRF master device for route lookups if real device
is enslaved. Rules are expected to be installed for the VRF device to
direct lookups to a specific table.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a VRF_MASTER flag for interfaces and helper functions for determining
if a device is a VRF_MASTER.
Add link attribute for passing VRF_TABLE id.
Add vrf_ptr to netdevice.
Add various macros for determining if a device is a VRF device, the index
of the master VRF device and table associated with VRF device.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is useful information to include in ipv6 netlink messages that
report interface information. IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6. This closes that gap.
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 10e4ea751 ("net: Fix race condition in store_rps_map") has moved the
manipulation of the rps_needed jump label under a spinlock. Since changing
the state of a jump label may sleep this is incorrect and causes warnings
during runtime.
Make rps_map_lock a mutex to allow sleeping under it.
Fixes: 10e4ea751 ("net: Fix race condition in store_rps_map")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>