We do not actually need to set any rx filters for the virtual batman
soft interface. However a dummy handler enables a user to set static
multicast listeners for instance.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
This is a simple batadv_tt_save_orig_buffer() refactoring
aiming to make it more generic and avoid useless casts.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Implement batadv_tt_entries() to get the number of entries
fitting in a given amount of bytes. This computation is done
several times in the code and therefore it is useful to have
an helper function.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
This is not used anymore with the new fragmentation, and it might
actually mess up the bonding code because find_router() assumes it
is only called once per packet.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The module prints a warning when the MTU on the hard interface is too
small to transfer payload traffic without fragmentation. The required
MTU is calculated based on the encapsulation header size. If network
coding is compild into the module its header size is taken into
account as well.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
the icmp and the icmp_rr packets share the same initial
fields since they use the same code to be processed and
forwarded.
Extract the common fields and put them into a separate
struct so that future ICMP packets can be easily added
without bloating the packet definition.
However, keep the seqno field outside of the newly created
common header because future ICMP types may require a
bigger sequence number space.
This change breaks compatibility due to fields reordering
in the ICMP headers.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
When comparing a network ordered value with a constant, it
is better to convert the constant at compile time by means
of htons() instead of converting the value at runtime using
ntohs().
This refactoring may slightly improve the code performance.
Moreover substitute __constant_htons() with htons() since
the latter increase readability and it is smart enough to be
as efficient as the former
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Non-broadcast packets larger than MTU are fragmented and sent with
an encapsulating header. Up to 16 fragments are supported, which are
sent in reverse order on the wire to allow minimal memory copying when
creating fragments.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Fragments arriving at their destination are buffered for later merge.
Merged packets are passed to the main receive function as had they never
been fragmented.
Fragments are forwarded without merging if the MTU of the outgoing
interface is smaller than the size of the merged packet.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Remove the existing fragmentation code before adding the new version
and delete unicast.{h,c}.
batadv_unicast_send_skb() is moved to send.c and renamed to
batadv_send_skb_unicast().
fragmentation entry in sysfs (bat_priv->fragmentation) is kept for use in
the new fragmentation code.
BATADV_UNICAST_FRAG packet type is renamed to BATADV_FRAG for use in the
new fragmentation code.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
In case of a VLAN tagged frame the ethhdr pointer is
moved forward by 4 bytes so that the offset of h_proto
in struct ethhdr matches the real
h_vlan_encapsulated_proto address in the skb. While this
trickery is correct it makes the code harder to understand
and may lead to bugs in case of re-use of ethhdr for other
purposes.
This patch introduces a proto variable to make things
cleaner and easier to understand.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
batadv_tt_global_entry_free_ref uses call_rcu to schedule a
function which will only free the global entry itself.
For this reason call_rcu is useless and kfree_rcu can be
used to simplify the code.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
batadv_tt_global_add_orig is neither used nor implemented
anymore, therefore it is possible to remove its declaration
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
batadv_tt_global_add is not used anymore outside of the TT
code thanks to the TVLV implementation. It can therefore be
declared as static
Last user has been removed by 3de4e64df0f1326db7cc0ef25f5af8522850252d
("batman-adv: tvlv - convert roaming adv packet to use tvlv unicast packets")
Moreover make it return bool since its result can be either 0 or 1.
Reported-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Adding host information for record route is only required for ICMP
requests and replys, and should not be added to just any (future?)
packet type.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the
outgoing interface, so it would be better to initialize it even when no
route is found, otherwise its value will be zero.
While I'm not sure if this could happen in real life, but doing that
will avoid to call the skb_push function with a zero in ip6gre_header
function.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) We need to take a timestamp only for skb that should be cloned.
Other skbs are not in write queue and no rtt estimation is done on them.
2) the unlikely() hint is wrong for receivers (they send pure ACK)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: MF Nowlan <fitz@cs.yale.edu>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netem can leak memory because packets get stored in red-black
tree and it is not cleared on reset.
Reported by: Сергеев Сергей <adron@yapic.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When packet is dropped from rb-tree netem the backlog statistic should
also be updated.
Reported-by: Сергеев Сергей <adron@yapic.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- update emails for A. Quartulli and M. Lindner in MAINTAINERS
- switch to the next on-the-wire protocol version
- introduce the T(ype) V(ersion) L(ength) V(alue) framework
- adjust the existing components to make them use the new TVLV code
- make the TT component use CRC32 instead of CRC16
- totally remove the VIS functionality (has been moved to userspace)
- reorder packet types and flags
- add static checks on packet format
- remove __packed from batadv_ogm_packet
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJSVa1BAAoJEADl0hg6qKeOiZkP/iD8Gehkb+jII3J1AE4+TdaW
X5XzB40A6bRr/RhiqqFXA6FhjLPCwPlaYwo+FL9QC9fIbV6Drofd8B4Vxvp8daQI
A5ZTzA7GFJhUkzM4vvw5eyF5848OYGITAJh5kSkiYiFSC6x9X9UFrG1EBBX7ikGd
7MXCjNa5GXaO7BsSnSbGuuMYw6CuZxwwxUZ3F/fG031yTlRqz88TiBpOGCZXSM5J
eYOf68byDPPsRvj/RgpmYTt7MzWnAgp9gJotp5Owuqpb3kvZC6SkA8uj4O2f29rn
nmQ+1q5onjHpfqukezPOIx9HEKo2XCq3qbFiW74dyNi+EQlYJ6RAGyRBkTTl4hse
cTOeWENANoFco/SPiOi1PjGrytuflAiIegwkVugAvnmYlbsVIi6MXV+BGo/ARybv
RwPBFULCaxO5HNkYz6elcu8cwFY9s3PROEdMVpRj+iNTeQq3ZQHj+QfPsHdedkAz
mkHqiZQo/pgBeWd+KiN98YJtQ4nPV1Qi+yGiayc3hmsk5fIABaOHxP3NV0NrTeOO
PYKWFcKfdLgtvrfcOXbcWpPz0d8SUkyF8zjlzIscXlpJVWK1sOw8oEER3IrDgzDy
PmzHkEL+qfDh4I5hEPOIEXxzPw/E4b+G3WI34bUN/4chGSmJdo6yyaC6XcXe7r9g
AsxmABE84uT2npNSOSkN
=YQD5
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- update emails for A. Quartulli and M. Lindner in MAINTAINERS
- switch to the next on-the-wire protocol version
- introduce the T(ype) V(ersion) L(ength) V(alue) framework
- adjust the existing components to make them use the new TVLV code
- make the TT component use CRC32 instead of CRC16
- totally remove the VIS functionality (has been moved to userspace)
- reorder packet types and flags
- add static checks on packet format
- remove __packed from batadv_ogm_packet
This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).
Currently:
- it is necessary to identify and mark inbound IPsec
packets destined to each vti interface, via netfilter rules in
the mangle table at prerouting hook.
- the vti module cannot retrieve the right tunnel in input since
commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.
- the i_key is used by the outbound processing as a mark to lookup
for the right SP and SA bundle.
This patch uses the o_key to store the vti mark (instead of i_key) and
enables:
- to avoid the need for previously marking the inbound skbuffs via a
netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
used nor changed by the vti interface. Only the vti interface o_key
is used.
SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.
The vti interface must be created as follows (i_key = 0, o_key = mark):
ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1
The SPs attached to vti1 must be created as follows (mark = vti1 o_key):
ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
proto esp mode tunnel
ip xfrm policy add dir in mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
proto esp mode tunnel
The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:
ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
enc "cbc(aes)" "azertyuiopqsdfgh"
ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
enc "cbc(aes)" "sqbdhgqsdjqjsdfh"
To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.
To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:
echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm
The outgoing traffic is steered to vti1 by a route via the vti interface:
ip route add 192.168.0.0/16 dev vti1
The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 9f00b2e7cf
bridge: only expire the mdb entry when query is received
changed the mdb expiration timer to be armed only when QUERY is
received. Howerver, this causes issues in an environment where
the multicast server socket comes and goes very fast while a client
is trying to send traffic to it.
The root cause is a race where a sequence of LEAVE followed by REPORT
messages can race against QUERY messages generated in response to LEAVE.
The QUERY ends up starting the expiration timer, and that timer can
potentially expire after the new REPORT message has been received signaling
the new join operation. This leads to a significant drop in multicast
traffic and possible complete stall.
The solution is to have REPORT messages update the expiration timer
on entries that already exist.
CC: Cong Wang <xiyou.wangcong@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 634fb979e8 ("inet: includes a sock_common in request_sock")
I forgot that the two ports in sock_common do not have same byte order :
skc_dport is __be16 (network order), but skc_num is __u16 (host order)
So sparse complains because ir_loc_port (mapped into skc_num) is
considered as __u16 while it should be __be16
Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num),
and perform appropriate htons/ntohs conversions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds IPv6 support for IPsec virtual tunnel interfaces
(vti). IPsec virtual tunnel interfaces provide a routable interface
for IPsec tunnel endpoints.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
sk_pacing_rate is read by sch_fq packet scheduler at any time,
with no synchronization, so make sure we update it in a
sensible way. ACCESS_ONCE() is how we instruct compiler
to not do stupid things, like using the memory location
as a temporary variable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 5 :
We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.
This patch includes the needed struct sock_common in front
of struct request_sock
This means there is no more inet6_request_sock IPv6 specific
structure.
Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.
loc_port -> ir_loc_port
loc_addr -> ir_loc_addr
rmt_addr -> ir_rmt_addr
rmt_port -> ir_rmt_port
iif -> ir_iif
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.
It's relatively easy to extend the skb using frag_list to allow
more frags to be appended into the last sk_buff.
This still builds very efficient skbs, and allows reaching 45 MSS per
skb.
(45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
sk_buff)
High speed TCP flows benefit from this extension by lowering TCP stack
cpu usage (less packets stored in receive queue, less ACK packets
processed)
Forwarding setups could be hurt, as such skbs will need to be
linearized, although its not a new problem, as GRO could already
provide skbs with a frag_list.
We could make the 65536 bytes threshold a tunable to mitigate this.
(First time we need to linearize skb in skb_needs_linearize(), we could
lower the tunable to ~16*1460 so that following skb_gro_receive() calls
build smaller skbs)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a enhancement.
for the first node in fib_trie, newpos is 0, bit is 1.
Only for the leaf or node with unmatched key need calc pos.
Signed-off-by: baker.zhang <baker.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vis flag is not needed anymore, and since we do a compat bump we
can start with the first bit again
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
As we decreased the struct size from 26 to 24 byte, we can remove
__packed as the compiler will not add any more padding.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Reordering the packet type numbers allows us to handle unicast
packets in a general way - even if we don't know the specific packet
type, we can still forward it. There was already code handling
this for a couple of unicast packets, and this is the more
generalized version to do that.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Since we removed the __packed from most of the packets, we should
make sure that the offset generated by the compiler are correct for
sent/received data.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
This is replaced by a userspace program, we don't need this
functionality to bloat the kernel.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Client flags from bit 0 to 7 are sent over the wire.
BATADV_TT_CLIENT_TEMP is a local flag and is not supposed
to be sent to the network. Therefore it has occupy a
higher bit.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
CRC32C has to be preferred to CRC16 because of its possible
HW native support and because of the reduced collision
probability. With this change the Translation Table
component now uses CRC32C to compute the local and global
table checksum.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Instead of generating roaming specific packets the TVLV unicast API is
used to send roaming information.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Instead of generating TT specific packets the TVLV unicast API is used
to send translation table data.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The translation table meta data (version number, crc checksum, etc)
as well as the translation table diff propgated within OGMs now uses
the newly introduced tvlv infrastructure.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Create DAT container to announce DAT capabilities (if enabled).
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Prior to this patch batman-adv read the advertised uplink bandwidth
from userspace and compressed this information into a single byte
called "gateway class".
Now the download & upload bandwidth information is sent as-is. No
userspace change is necessary since the sysfs API always allowed
to specify a bandwidth.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Spyros Gasteratos <morfeas3000@gmail.com>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
The goal is to provide the infrastructure for sending, receiving and
parsing information 'containers' while preserving backward
compatibility. TVLV (based on the commonly known Type Length Value
technique) was chosen as the format for those containers. Even if a
node does not know the tvlv type of a certain container it can simply
skip the current container and proceed with the next. Past experience
has shown features evolve over time, so a 'version' field was added
right from the start to allow differentiating between feature
variants - hence the name: T(ype) V(ersion) L(ength) V(alue).
This patch introduces the basic TVLV infrastructure:
* register / unregister tvlv containers to be sent with each OGM
(on primary interfaces only)
* register / unregister callback handlers to be called upon
finding the corresponding tvlv type in a tvlv buffer
* unicast tvlv send / receive API calls
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Spyros Gasteratos <morfeas3000@gmail.com>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
With this change batman-adv is breaking compatibility with
older versions and it is moving to compat-version 15.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Steffen Klassert says:
====================
1) We used the wrong netlink attribute to verify the
lenght of the replay window on async events. Fix this by
using the right netlink attribute.
2) Policy lookups can not match the output interface on forwarding.
Add the needed informations to the flow informations.
3) We update the pmtu when we receive a ICMPV6_DEST_UNREACH message
on IPsec with ipv6. This is wrong and leads to strange fragmented
packets, only ICMPV6_PKT_TOOBIG messages should update the pmtu.
Fix this by removing the ICMPV6_DEST_UNREACH check from the IPsec
protocol error handlers.
4) The legacy IPsec anti replay mechanism supports anti replay
windows up to 32 packets. If a user requests for a bigger
anti replay window, we use 32 packets but pretend that we use
the requested window size. Fix from Fan Du.
5) If asynchronous events are enabled and replay_maxdiff is set to
zero, we generate an async event for every received packet instead
of checking whether a timeout occurred. Fix from Thomas Egerer.
6) Policies need a refcount when the state resolution timer is armed.
Otherwise the timer can fire after the policy is deleted.
7) We might dreference a NULL pointer if the hold_queue is empty,
add a check to avoid this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_IPV6=n is still a valid choice ;)
It appears we can remove dead code.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_secret() is only used when CONFIG_IPV6 or CONFIG_INET are selected.
Building a defconfig with both of these symbols unselected (Using the ARM
at91sam9rl_defconfig, for example) leads to the following build warning:
$ make at91sam9rl_defconfig
#
# configuration written to .config
#
$ make net/core/secure_seq.o
scripts/kconfig/conf --silentoldconfig Kconfig
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
make[1]: `include/generated/mach-types.h' is up to date.
CALL scripts/checksyscalls.sh
CC net/core/secure_seq.o
net/core/secure_seq.c:17:13: warning: 'net_secret_init' defined but not used [-Wunused-function]
Fix this warning by protecting the definition of net_secret() with these
symbols.
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since P2P device doesn't have a netdev associated to it,
we cannot prevent the user to start it when in RFKILL.
So refuse to even add it when in RFKILL.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
__ieee80211_scan_completed is called from a worker. This
means that the following flow is possible.
* driver calls ieee80211_scan_completed
* mac80211 cancels the scan (that is already complete)
* __ieee80211_scan_completed runs
When scan_work will finally run, it will see that the scan
hasn't been aborted and might even trigger another scan on
another band. This leads to a situation where cfg80211's
scan is not done and no further scan can be issued.
Fix this by setting a new flag when a HW scan is being
cancelled so that no other scan will be triggered.
Cc: stable@vger.kernel.org
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 3 :
Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
and parallel SYN processing.
Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
friend states) sockets, another for TIME_WAIT sockets only.
As the hash table is sized to get at most one socket per bucket, it
makes little sense to have separate twchain, as it makes the lookup
slightly more complicated, and doubles hash table memory usage.
If we make sure all socket types have the lookup keys at the same
offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
and ESTABLISHED sockets already have common lookup fields for IPv4.
[ INET_TW_MATCH() is no longer needed ]
I'll provide a follow-up to factorize IPv6 lookup as well, to remove
INET6_TW_MATCH()
This way, SYN_RECV pseudo sockets will be supported the same.
A new sock_gen_put() helper is added, doing either a sock_put() or
inet_twsk_put() [ and will support SYN_RECV later ].
Note this helper should only be called in real slow path, when rcu
lookup found a socket that was moved to another identity (freed/reused
immediately), but could eventually be used in other contexts, like
sock_edemux()
Before patch :
dmesg | grep "TCP established"
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
After patch :
TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
include/linux/netdevice.h
net/core/sock.c
Trivial merge issues.
Removal of "extern" for functions declaration in netdevice.h
at the same time "const" was added to an argument.
Two parallel line additions in net/core/sock.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Steinar reported FQ pacing was not working for UDP flows.
It looks like the initial sk->sk_pacing_rate value of 0 was
a wrong choice. We should init it to ~0U (unlimited)
Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
no real sense. The default rate is really unlimited, and we
need to avoid a zero divide.
Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the calculation of the nlmsg size, by adding the missing
nla_total_size().
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCA_FQ_INITIAL_QUANTUM should set q->initial_quantum
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6
headers. This length is also counted in dev->hard_header_len.
Perhaps, it's more clean to modify the hlen to count only the GRE header
without ipv6 header as the variable name suggest, but the simple way to fix
this without regression risk is simply modify the calculation of the limit
in ip6gre_tunnel_change_mtu function.
Verified in kernel version v3.11.
Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can get classid through cgroup_subsys_state,
this is directviewing and effective.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the tasks have been migrated to the cgroup,
there is no need to call task_netprioidx to get
task's cgroup id.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received. This has introduced a performance regression for some
UDP workloads.
This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.
Benchmark results from a netperf UDP_RR test:
Before 89789.68 transactions/s
After 90587.62 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.63us RTT
After 12.48us RTT
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.
For UDP multicast we can only cache the dst if there is only one
receiving socket on the host. Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.
For UDP unicast we only demux sockets with an exact match in order to
not break forwarding setups. Additionally since the hash chains may be
long we only check the first socket to see if it is a match and not
waste extra time searching the whole chain when we might not find an
exact match.
Benchmark results from a netperf UDP_RR test:
Before 87961.22 transactions/s
After 89789.68 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.97us RTT
After 12.63us RTT
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sockets can receive packets from multiple endpoints and thus may be
received on multiple receive queues. Since packets packets can arrive
on multiple receive queues we should not mark the napi_id for all
packets. This makes busy read/poll only work for connected UDP sockets.
This additionally enables busy read/poll for UDP multicast packets as
long as the socket is connected by moving the check into
__udp_queue_rcv_skb().
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qdisc_tree_decrease_qlen() is called when some packets are dropped
on a qdisc, and we want to notify parents of qlen changes.
We also can increment parents qdisc qstats drop counters.
This permits more accurate drop counters up to root qdisc.
For example a graft operation typically resets a qdisc
(drops all packets) and call qdisc_tree_decrease_qlen()
Note that callers are responsible for their drop counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’:
net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable]
Create a helper "l2tp_tunnel()" to facilitate this, and as a side
effect get rid of a bunch of unnecessary void pointer casts.
Signed-off-by: David S. Miller <davem@davemloft.net>
When a lowpan link to a wpan device is created, set the hardware address
of the lowpan link to that of the wpan device.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refuse to create 6lowpan links if the actual hardware interface is
of any type other than ARPHRD_IEEE802154.
Signed-off-by: Alan Ott <alan@signal11.us>
Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We might dreference a NULL pointer if the hold_queue is empty,
so add a check to avoid this.
Bug was introduced with git commit a0073fe18 ("xfrm: Add a state
resolution packet queue")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We need to ensure that policies can't go away as long as the hold timer
is armed, so take a refcont when we arm the timer and drop one if we
delete it.
Bug was introduced with git commit a0073fe18 ("xfrm: Add a state
resolution packet queue")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
On Tue, 20 Aug 2013 11:40:04 -0500 Eric Sandeen <sandeen@redhat.com> wrote:
> This was brought up in a Red Hat bug (which may be marked private, I'm sorry):
>
> Bug 987055 - open O_WRONLY succeeds on some root owned files in /proc for process running with unprivileged EUID
>
> "On RHEL7 some of the files in /proc can be opened for writing by an unprivileged EUID."
>
> The flaw existed upstream as well last I checked.
>
> This commit in kernel v3.8 caused the regression:
>
> commit cff109768b
> Author: Eric W. Biederman <ebiederm@xmission.com>
> Date: Fri Nov 16 03:03:01 2012 +0000
>
> net: Update the per network namespace sysctls to be available to the network namespace owner
>
> - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
> the netowrk namespace to change sysctls.
> - Allow anyone the uid of the user namespace root the same
> permissions over the network namespace sysctls as the global root.
> - Allow anyone with gid of the user namespace root group the same
> permissions over the network namespace sysctl as the global root group.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> because it changed /sys/net's special permission handler to test current_uid, not
> current_euid; same for current_gid/current_egid.
>
> So in this case, root cannot drop privs via set[ug]id, and retains all privs
> in this codepath.
Modify the code to use current_euid(), and in_egroup_p, as in done
in fs/proc/proc_sysctl.c:test_perm()
Cc: stable@vger.kernel.org
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
drivers/net/wireless/rtlwifi/rtl8188ee/phy.h
drivers/net/wireless/rtlwifi/rtl8192ce/phy.h
drivers/net/wireless/rtlwifi/rtl8192de/phy.h
drivers/net/wireless/rtlwifi/rtl8723ae/phy.h
Just some minor conflicts between the wireless-next changes
and Joe Perches's "extern" removal from function prototypes
in header files.
John W. Linville says:
====================
Regarding the Bluetooth bits, Gustavo says:
"The big work here is from Marcel and Johan. They did a lot of work
in the L2CAP, HCI and MGMT layers. The most important ones are the
addition of a new MGMT command to enable/disable LE advertisement
and the introduction of the HCI user channel to allow applications
to get directly and exclusive access to Bluetooth devices."
As to the ath10k bits, Kalle says:
"Bartosz dropped support for qca98xx hw1.0 hardware from ath10k, it's
just too much to support it. Michal added support for the new firmware
interface. Marek fixed WEP in AP and IBSS mode. Rest of the changes are
minor fixes or cleanups."
And also:
"Major changes are:
* throughput improvements including aligning the RX frames correctly and
optimising HTT layer (Michal)
* remove qca98xx hw1.0 support (Bartosz)
* add support for firmware version 999.999.0.636 (Michal)
* firmware htt statistics support (Kalle)
* fix WEP in AP and IBSS mode (Marek)
* fix a mutex unlock balance in debugfs file (Shafi)
And of course there's a lot of smaller fixes and cleanup."
For the wl12xx bits, Luca says:
"Here are some patches intended for 3.13. Eliad is upstreaming a bunch
of patches that have been pending in the internal tree. Mostly bugfixes
and other small improvements."
Along with that...
Arend and friends bring us a batch of brcmfmac updates, Larry Finger
offers some rtlwifi refactoring, and Sujith sends the usual batch of
ath9k updates. As usual, there are a number of other small updates
from a variety of players as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending out multicast messages, the source address in inet->mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5db8 ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the unreg_list and the close_list in dev_close_many preventing
dev_close_many from permuting the unreg_list. The permutations of the
unreg_list have resulted in cases where the loopback device is accessed
it has been freed in code such as dst_ifdown. Resulting in subtle memory
corruption.
This is the second bug from sharing the storage between the close_list
and the unreg_list. The issues that crop up with sharing are
apparently too subtle to show up in normal testing or usage, so let's
forget about being clever and use two separate lists.
v2: Make all callers pass in a close_list to dev_close_many
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530
This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
-In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
-In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.
Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio wants to pass in cpumask_of(cpu), make parameter
const to avoid build warnings.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung found following problem :
There are bugs in the SACK processing code, merging part in
tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
skbs FIN flag. When a receiver first SACK the FIN sequence, and later
throw away ofo queue (e.g., sack-reneging), the sender will stop
retransmitting the FIN flag, and hangs forever.
Following packetdrill test can be used to reproduce the bug.
$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`
// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0
+.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4
+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(10000) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
+.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%
First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.
Bug was added in 2.6.29 by commit 832d11c5cd
("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter updates for your net-next tree,
mostly ipset improvements and enhancements features, they are:
* Don't call ip_nest_end needlessly in the error path from me, suggested
by Pablo Neira Ayuso, from Jozsef Kadlecsik.
* Fixed sparse warnings about shadowed variable and missing rcu annotation
and fix of "may be used uninitialized" warnings, also from Jozsef.
* Renamed simple macro names to avoid namespace issues, reported by David
Laight, again from Jozsef.
* Use fix sized type for timeout in the extension part, and cosmetic
ordering of matches and targets separatedly in xt_set.c, from Jozsef.
* Support package fragments for IPv4 protos without ports from Anders K.
Pedersen. For example this allows a hash:ip,port ipset containing the
entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN
tunnels to/from the host. Without this patch only the first package
fragment (with fragment offset 0) was matched.
* Introduced a new operation to get both setname and family, from Jozsef.
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating an error message to the user.
* Reworked extensions support in ipset types from Jozsef. The approach of
defining structures with all variations is not manageable as the
number of extensions grows. Therefore a blob for the extensions is
introduced, somewhat similar to conntrack. The support of extensions
which need a per data destroy function is added as well.
* When an element timed out in a list:set type of set, the garbage
collector skipped the checking of the next element. So the purging
was delayed to the next run of the gc, fixed by Jozsef.
* A small Kconfig fix: NETFILTER_NETLINK cannot be selected and
ipset requires it.
* hash:net,net type from Oliver Smith. The type provides the ability to
store pairs of subnets in a set.
* Comment for ipset entries from Oliver Smith. This makes possible to
annotate entries in a set with comments, for example:
ipset n foo hash:net,net comment
ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B"
* Fix of hash types resizing with comment extension from Jozsef.
* Fix of new extensions for list:set type when an element is added
into a slot from where another element was pushed away from Jozsef.
* Introduction of a common function for the listing of the element
extensions from Jozsef.
* Net namespace support for ipset from Vitaly Lavrov.
* hash:net,port,net type from Oliver Smith, which makes possible
to store the triples of two subnets and a protocol, port pair in
a set.
* Get xt_TCPMSS working with net namespace, by Gao feng.
* Use the proper net netnamespace to allocate skbs, also by Gao feng.
* A couple of cleanups for the conntrack SIP helper, by Holger
Eitzenberger.
* Extend cttimeout to allow setting default conntrack timeouts via
nfnetlink, so we can get rid of all our sysctl/proc interfaces in
the future for timeout tuning, from me.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on tcp listener refactoring, I found that it
would really make things easier if sock_common could include
the IPv6 addresses needed in the lookups, instead of doing
very complex games to get their values (depending on sock
being SYN_RECV, ESTABLISHED, TIME_WAIT)
For this to happen, I need to be sure that tcp6_timewait_sock
and tcp_timewait_sock consume same number of cache lines.
This is possible if we only use 32bits for tw_ttd, as we remove
one 32bit hole in inet_timewait_sock
inet_tw_time_stamp() is defined and used, even if its current
implementation looks like tcp_time_stamp : We might need finer
resolution for tcp_time_stamp in the future.
Before patch : sizeof(struct tcp6_timewait_sock) = 0xc8
After patch : sizeof(struct tcp6_timewait_sock) = 0xc0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Here is another batch of fixes intended for the 3.12 stream...
For the mac80211 bits, Johannes says:
"This time I have two fixes for IBSS (including one for wext, hah), a fix
for extended rates IEs, an active monitor checking fix and a sysfs
registration race fix."
On top of those...
Amitkumar Karwar brings an mwifiex fix for an interrupt loss issue
w/ SDIO devices. The problem was due to a command timeout issue
introduced by an earlier patch.
Felix Fietkau a stall in the ath9k driver. This patch fixes the
regression introduced in the commit "ath9k: use software queues for
un-aggregated data packets".
Stanislaw Gruszka reverts an rt2x00 patch that was found to cause
connection problems with some devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to cap ->msg_namelen or it leads to a buffer overflow when we
to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to
exploit this bug.
The call tree is:
___sys_recvmsg()
move_addr_to_user()
audit_sockaddr()
__audit_sockaddr()
Reported-by: Jüri Aedla <juri.aedla@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix multi soft-interfaces setups with Network Coding enabled by
registering the CODED packet type once only (instead of once per soft-if)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iQIcBAABCAAGBQJSTHvQAAoJEADl0hg6qKeOBl8P/jDJhnH65p+zsXlK5RQ1bOmq
F9cY7nY1cESQ0V4j9BGGqPcvy3ltCbVnaPGvrYKQ78CIVYFIlA0ZwmnjnXzkxi4a
XWgG10Znx8yUOPllFoRp7r7yJht2FVprWnEN1aVCwbflpHxD5jI+L3C8JWULEfbI
7Gm3CcHQWzQSOv8u00XeoBmAo3Q+N0gaEAXl+vogKW4RP59GU4QSCstahyRuPmme
l1C9SrLqi+KJjpvgxEdjHmGD8K0yLYJVw/6iMYlYpKbraU793madj0JNT+LwwAmE
dMTOp83yKy+n8k4XRKYRnvOElAJVVvEjU81V/4ompVHzIfu/7f1xSWyAQpecbhFG
srd/QLqIszScx7ELDQ3IVMacTLs2tMaEotvrymYIooRLz3ecgeAyXth3aBQErSD2
SoDliIpx8+D45c04ri9Hcwu2k1g100VYG0QiJMUC0berYGDyjPnbEdpnmYTioJ6J
4s4Qs3ve70lo0yc2ODDZxYN6n6Rk0PXuxJwj5PeBR6RswEo1izdelOXEcAevVjZE
SRJn0niZmtYlS5gD/6aohkVKnKti9Rd2DrgOU7qCWJ/wLUiFSL5L7Lj9megKbmeG
f4qxD9rC3wKQdX1TtU/ED7IfMWMBY0tcSEnbCYs+otI8kCbtvr0490h1JtNJALHb
po2HXXIMEjqmbhkgsz29
=pqD/
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included change:
- fix multi soft-interfaces setups with Network Coding enabled by
registering the CODED packet type once only (instead of once per soft-if)
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable fully_acked is only assigned the values true and false.
Change its type to bool.
The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):
@exists@
type T;
identifier b;
@@
- T
+ bool
b = ...;
... when any
b = \(true\|false\)
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out the code that extracts the ports from skb_flow_dissect and
add a new function skb_flow_get_ports which can be re-used.
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 2 :
We can use a generic lookup, sockets being in whatever state, if
we are sure all relevant fields are at the same place in all socket
types (ESTABLISH, TIME_WAIT, SYN_RECV)
This patch removes these macros :
inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
And adds :
sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
Then, INET_TW_MATCH() is really the same than INET_MATCH()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Bluetooth specification makes it clear that only one command
should be present in the L2CAP LE signalling packet. So tighten
the checks here and restrict it to exactly one command.
This is different from L2CAP BR/EDR signalling where multiple
commands can be part of the same packet.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When SMP packets are received, make sure they contain at least 1 byte
header for the opcode. If not, drop the packet and disconnect the link.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The ATT fixed channel is only valid when using LE connections. On
BR/EDR it is required to go through L2CAP connection oriented
channel for ATT.
Drop ATT packets when they are received on a BR/EDR connection.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When receiving connectionless packets on a LE connection, just drop
the packet. There is no concept of connectionless channels for LE.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When receiving SMP packets on a BR/EDR connection, then just drop
the packet and do not try to process it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The L2CAP raw sockets are only used for BR/EDR signalling. Packets
on LE links should not be forwarded there.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The switch statement for the various L2CAP fixed channel handlers
is not really ordered.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Changing the device class when BR/EDR is disabled has no visible
effect for remote devices. However to simplify the logic allow it
as long as the controller supports BR/EDR operations.
If it is not allowed, then the overall logic becomes rather
complicated since the class of device values would need clearing
or restoring when BR/EDR setting changes.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Loading long term keys into a BR/EDR only controller make no sense.
The kernel would never use any of these keys. So instead of allowing
userspace to waste memory, reject such operation with a not supported
error message.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Loading link keys into a LE only controller make no sense. The kernel
would never use any of these keys. So instead of allowing userspace
to waste memory, reject such operation with a not supported error
message.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Setting the static address does not depend on LE beeing enabled. It
only depends on a controller with LE support.
When depending on LE enabled this command becomes really complicated
since in case LE gets disabled, it would be required to clear the
static address and also its random address representation inside
the controller. With future support for private addresses such
complex setup should be avoided.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Only when BR/EDR is supported and enabled, allow changing of the SSP
setting. Just checking if the hardware supports SSP is not enough
since it might be the case that BR/EDR is disabled.
In the case that BR/EDR is disabled, but SSP supported by the
controller the not supported error message is now returned.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
commit 3ab5aee7fe ("net: Convert TCP & DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.
We should instead use inet_twsk_put()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the memset/memcpy uses of 6 to ETH_ALEN
where appropriate.
Also convert some struct definitions and u8 array
declarations of [6] to ETH_ALEN.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_fixup_sndbuf() is underestimating initial send buffer requirements.
It was not noticed because big GSO packets were escaping the limitation,
but with smaller TSO packets (or TSO/GSO/SG off), application hits
sk_sndbuf before having a chance to fill enough packets in socket write
queue.
- initial cwnd can be bigger than 10 for specific routes
- SKB_TRUESIZE() is a bit under real needs in some cases,
because of power-of-two rounding in kmalloc()
- Fast Recovery (RFC 5681 3.2) : Cubic needs 70% factor
- Extra cushion (application might react slowly to POLLOUT)
tcp_v4_conn_req_fastopen() needs to call tcp_init_metrics() before
calling tcp_init_buffer_space()
Then we realize tcp_new_space() should call tcp_fixup_sndbuf()
instead of duplicating this stuff.
Rename tcp_fixup_sndbuf() to tcp_sndbuf_expand() to be more
descriptive.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because 'node' is the i'st child of 'oldnode',
thus, here 'i' equals
tkey_extract_bits(node->key, oldtnode->pos, oldtnode->bits)
we just get 1 more bit,
and need not care the detail value of this bits.
I apologize for the mistake.
I generated the patch on a branch version,
and did not notice the put_child has been changed.
I have redone the test on HEAD version with my patch.
two cases are used.
case 1. inflate a node which has a leaf child node.
case 2: inflate a node which has a an child node with skipped bits
test env:
ip link set eth0 up
ip a add dev eth0 192.168.11.1/32
here, we just focus on route table(MAIN),
so I use a "192.168.11.1/32" address to simplify the test case.
call trace:
+ fib_insert_node
+ + trie_rebalance
+ + + resize
+ + + + inflate
Test case 1: inflate a node which has a leaf child node.
===========================================================
step 1. prepare a fib trie
------------------------------------------
ip r a 192.168.0.0/24 via 192.168.11.1
ip r a 192.168.1.0/24 via 192.168.11.1
we get a fib trie.
root@baker:~# cat /proc/net/fib_trie
Main:
+-- 192.168.0.0/23 1 0 0
|-- 192.168.0.0
/24 universe UNICAST
|-- 192.168.1.0
/24 universe UNICAST
Local:
.....
step 2. Add the third route
------------------------------------------
root@baker:~# ip r a 192.168.2.0/24 via 192.168.11.1
A fib_trie leaf will be inserted in fib_insert_node before trie_rebalance.
For function 'inflate':
'inflate' is called with following trie.
+-- 192.168.0.0/22 1 1 0 <=== tn node
+-- 192.168.0.0/23 1 0 0 <== node a
|-- 192.168.0.0
/24 universe UNICAST
|-- 192.168.1.0
/24 universe UNICAST
|-- 192.168.2.0 <== leaf(node b)
When process node b, which is a leaf. here:
i is 1,
node key "192.168.2.0"
oldnode is (pos:22, bits:1)
unpatch source:
tkey_extract_bits(node->key, oldtnode->pos + oldtnode->bits, 1)
it equals:
tkey_extract_bits("192.168,2,0", 22 + 1, 1)
thus got 0, and call put_child(tn, 2*i, node); <== 2*i=2.
patched source:
tkey_extract_bits(node->key, oldtnode->pos, oldtnode->bits + 1),
tkey_extract_bits("192.168,2,0", 22, 1 + 1) <== get 2.
Test case 2: inflate a node which has a an child node with skipped bits
==========================================================================
step 1. prepare a fib trie.
ip link set eth0 up
ip a add dev eth0 192.168.11.1/32
ip r a 192.168.128.0/24 via 192.168.11.1
ip r a 192.168.0.0/24 via 192.168.11.1
ip r a 192.168.16.0/24 via 192.168.11.1
ip r a 192.168.32.0/24 via 192.168.11.1
ip r a 192.168.48.0/24 via 192.168.11.1
ip r a 192.168.144.0/24 via 192.168.11.1
ip r a 192.168.160.0/24 via 192.168.11.1
ip r a 192.168.176.0/24 via 192.168.11.1
check:
root@baker:~# cat /proc/net/fib_trie
Main:
+-- 192.168.0.0/16 1 0 0
+-- 192.168.0.0/18 2 0 0
|-- 192.168.0.0
/24 universe UNICAST
|-- 192.168.16.0
/24 universe UNICAST
|-- 192.168.32.0
/24 universe UNICAST
|-- 192.168.48.0
/24 universe UNICAST
+-- 192.168.128.0/18 2 0 0
|-- 192.168.128.0
/24 universe UNICAST
|-- 192.168.144.0
/24 universe UNICAST
|-- 192.168.160.0
/24 universe UNICAST
|-- 192.168.176.0
/24 universe UNICAST
Local:
...
step 2. add a route to trigger inflate.
ip r a 192.168.96.0/24 via 192.168.11.1
This command will call serveral times inflate.
In the first time, the fib_trie is:
________________________
+-- 192.168.128.0/(16, 1) <== tn node
+-- 192.168.0.0/(17, 1) <== node a
+-- 192.168.0.0/(18, 2)
|-- 192.168.0.0
|-- 192.168.16.0
|-- 192.168.32.0
|-- 192.168.48.0
|-- 192.168.96.0
+-- 192.168.128.0/(18, 2) <== node b.
|-- 192.168.128.0
|-- 192.168.144.0
|-- 192.168.160.0
|-- 192.168.176.0
NOTE: node b is a interal node with skipped bits.
here,
i:1,
node->key "192.168.128.0",
oldnode:(pos:16, bits:1)
so
tkey_extract_bits(node->key, oldtnode->pos + oldtnode->bits, 1)
it equals:
tkey_extract_bits("192.168,128,0", 16 + 1, 1) <=== 0
tkey_extract_bits(node->key, oldtnode->pos, oldtnode->bits, 1)
it equals:
tkey_extract_bits("192.168,128,0", 16, 1+1) <=== 2
2*i + 0 == 2, so the result is same.
Signed-off-by: baker.zhang <baker.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_established_options assumes opts->options is 0 before calling,
as it read modify writes it.
For the tcp_current_mss() case the opts structure is not zeroed,
so this can be done with uninitialized values.
This is ok, because ->options is not read in this path.
But it's still better to avoid the operation on the uninitialized
field. This shuts up a static code analyzer, and presumably
may help the optimizer.
Cc: netdev@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrong type of L2CAP signalling packets on the wrong type of
either BR/EDR or LE links need to be dropped. When that happens
the packet is dropped, but the memory not freed. So actually
free the memory as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When filling the netlink message we miss to wipe the pad field,
therefore leak one byte of heap memory to userland. Fix this by
setting pad to 0.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Limit the current implementation to a single channel context used by
a single vif, thereby avoiding multi-vif/channel complexities.
Reuse the main function from AP CSA code, but move a portion out in
order to fit the STA scenario.
Add a new mac80211 HW flag so we don't break devices that don't support
channel switch with channel-contexts. The new behavior will be opt-in.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch increments the management interface revision due to the
various fixes, improvements and other changes that have gone in
lately.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
We shouldn't include the simultaneous LE & BR/EDR flags in the LE
advertising data if BR/EDR is disabled on a dual-mode controller. This
patch fixes this issue and ensures that the create_ad function generates
the correct flags when BR/EDR is disabled.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The REJECTED management response should mainly be used when the adapter
is in a state where we cannot accept some command or a specific
parameter value. The NOT_SUPPORTED response in turn means that the
adapter really cannot support the command or parameter value.
This patch fixes this distinction and adds two helper functions to
easily get the appropriate LE or BR/EDR related status response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
On dual-mode BR/EDR/LE and LE only controllers it is possible
to configure a random address. There are two types or random
addresses, one is static and the other private. Since the
random private addresses require special privacy feature to
be supported, the configuration of these two are kept separate.
This command allows for setting the static random address. It is
only supported on controllers with LE support. The static random
address is suppose to be valid for the lifetime of the controller
or at least until the next power cycle. To ensure such behavior,
setting of the address is limited to when the controller is
powered off.
The special BDADDR_ANY address (00:00:00:00:00:00) can be used to
disable the static address. This is also the default value.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
batman-adv saves its table of packet handlers as a global state, so handlers
must be set up only once (and setting them up a second time will fail).
The recently-added network coding support tries to set up its handler each time
a new softif is registered, which obviously fails when more that one softif is
used (and in consequence, the softif creation fails).
Fix this by splitting up batadv_nc_init into batadv_nc_init (which is called
only once) and batadv_nc_mesh_init (which is called for each softif); in
addition batadv_nc_free is renamed to batadv_nc_mesh_free to keep naming
consistent.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Disabling the high speed setting when the controller is powered on has
too many side effects that are not taken care of. And in general it
is not an useful operation anyway. So just make such a command fail
with a rejection error message.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch introduces a new mgmt command for enabling/disabling BR/EDR
functionality. This can be convenient when one wants to make a dual-mode
controller behave like a single-mode one. The command is only available
for dual-mode controllers and requires that LE is enabled before using
it. The BR/EDR setting can be enabled at any point, however disabling it
requires the controller to be powered off (otherwise a "rejected"
response will be sent).
Disabling the BR/EDR setting will automatically disable all other BR/EDR
related settings.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To allow treating dual-mode (BR/EDR/LE) controllers as single-mode ones
(LE-only) we want to introduce a new HCI_BREDR_ENABLED flag to track
whether BR/EDR is enabled or not (previously we simply looked at the
feature bit with lmp_bredr_enabled).
This patch add the new flag and updates the relevant places to test
against it instead of using lmp_bredr_enabled. The flag is by default
enabled when registering an adapter and only cleared if necessary once
the local features have been read during the HCI init procedure.
We cannot completely block BR/EDR usage in case user space uses raw HCI
sockets but the patch tries to block this in places where possible, such
as the various BR/EDR specific ioctls.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the VLAN tci is set in skb->vlan_tci use the
priority field to determine the WMM priority.
Signed-off-by: cedric Voncken <cedric.voncken@acksys.fr>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When hci_sock.c calls hci_dev_open it needs to ensure that there isn't
pending work in progress, such as that which is scheduled for the
initial setup procedure or the one for automatically powering off after
the setup procedure. This adds the necessary calls to ensure that any
previously scheduled work is completed before attempting to call
hci_dev_do_open.
This patch fixes a race with old user space versions where we might
receive a HCIDEVUP ioctl before the setup procedure has been completed.
When that happens the setup procedures callback may fail early and leave
the device in an inconsistent state, causing e.g. the setup callback to
be (incorrectly) called more than once.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The requirements of an external call to hci_dev_open from hci_sock.c are
different to that from within hci_core.c. In the former case we want to
flush any pending work in hdev->req_workqueue whereas in the latter we
don't (since there we are already calling from within the workqueue
itself). This patch does the necessary refactoring to a separate
hci_dev_do_open function (analogous to hci_dev_do_close) but does not
yet introduce the synchronizations relating to the workqueue usage.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth protocol and hardware is pretty much all little endian
and so when running sparse via "make C=2" for example, enable the
endian checks by default.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The HCI User Channel operation is an admin operation that puts the
device into promiscuous mode for single use. It is more suitable
to require CAP_NET_ADMIN than CAP_NET_RAW.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When enabling or disabling high speed setting it is required to send
a new settings event to inform other management interface users about
the changed settings.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Hiding the Bluetooth high speed support behind a module parameter is
not really useful. This can be enabled and disabled at runtime via
the management interface. This also has the advantage that this can
now be changed per controller and not just global.
This patch removes the module parameter and exposes the high speed
setting of the management interface to all controllers.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The controller type is limited to BR/EDR/LE and AMP controllers. This
can be easily encoded with just 2 bits and still leave enough room
for future controller types.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/usb/qmi_wwan.c
drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
include/net/netfilter/nf_conntrack_synproxy.h
include/net/secure_seq.h
The conflicts are of two varieties:
1) Conflicts with Joe Perches's 'extern' removal from header file
function declarations. Usually it's an argument signature change
or a function being added/removed. The resolutions are trivial.
2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
a new value, another changes an existing value. That sort of
thing.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking changes from David Miller:
1) Multiply in netfilter IPVS can overflow when calculating destination
weight. From Simon Kirby.
2) Use after free fixes in IPVS from Julian Anastasov.
3) SFC driver bug fixes from Daniel Pieczko.
4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.
5) Locking and encapsulation fixes to serial line CAN driver, from
Andrew Naujoks.
6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
Eilon Greenstein, and Ariel Elior.
7) In lapb, if no other packets are outstanding, T1 timeouts actually
stall things and no packet gets sent. Fix from Josselin Costanzi.
8) ICMP redirects should not make it to the socket error queues, from
Duan Jiong.
9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.
10) Fix setting of VLAN priority field on via-rhine driver, from Roget
Luethi.
11) Fix TX stalls and VLAN promisc programming in be2net driver from
Ajit Khaparde.
12) Packet padding doesn't get handled correctly in new usbnet SG
support code, from Ming Lei.
13) Fix races in netdevice teardown wrt. network namespace closing.
From Eric W. Biederman.
14) Fix potential missed initialization of net_secret if not TCP
connections are openned. From Eric Dumazet.
15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
Aleksander Morgado.
16) skb_cow_head() can change skb->data and thus packet header pointers,
don't use stale ip_hdr reference in ip_tunnel code.
17) Backend state transition handling fixes in xen-netback, from Paul
Durrant.
18) Packet offset for AH protocol is handled wrong in flow dissector,
from Eric Dumazet.
19) Taking down an fq packet scheduler instance can leave stale packets
in the queues, fix from Eric Dumazet.
20) Fix performance regressions introduced by TCP Small Queues. From
Eric Dumazet.
21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
Hannes Frederic Sowa.
22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
reference to the ipv4/ipv6 specific network device state, so use the
reference put that will check and release the object if the
reference hits zero. From Salam Noureddine.
23) Fix memory corruption in ip_tunnel driver, and use skb_push()
instead of __skb_push() so that similar bugs are less hard to find.
From Steffen Klassert.
24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
Nicolas Dichtel.
25) fq scheduler doesn't accurately rate limit in certain circumstances,
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
pkt_sched: fq: rate limiting improvements
ip6tnl: allow to use rtnl ops on fb tunnel
sit: allow to use rtnl ops on fb tunnel
ip_tunnel: Remove double unregister of the fallback device
ip_tunnel_core: Change __skb_push back to skb_push
ip_tunnel: Add fallback tunnels to the hash lists
ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
qlcnic: Fix SR-IOV configuration
ll_temac: Reset dma descriptors indexes on ndo_open
skbuff: size of hole is wrong in a comment
ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
ethernet: moxa: fix incorrect placement of __initdata tag
ipv6: gre: correct calculation of max_headroom
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
bonding: Fix broken promiscuity reference counting issue
tcp: TSQ can use a dynamic limit
dm9601: fix IFF_ALLMULTI handling
pkt_sched: fq: qdisc dismantle fixes
...
rtnl ops where introduced by c075b13098 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.
Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824 ("ip6tnl: add x-netns support")).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl ops where introduced by ba3e3f50a0 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.
Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 5e6700b3bf ("sit: add support of x-netns")).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When queueing the netdevices for removal, we queue the
fallback device twice in ip_tunnel_destroy(). The first
time when we queue all netdevices in the namespace and
then again explicitly. Fix this by removing the explicit
queueing of the fallback device.
Bug was introduced when network namespace support was added
with commit 6c742e714d ("ipip: add x-netns support").
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
moved the IP header installation to iptunnel_xmit() and
changed skb_push() to __skb_push(). This makes possible
bugs hard to track down, so change it back to skb_push().
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we can not update the tunnel parameters of
the fallback tunnels because we don't find them in the
hash lists. Fix this by adding them on initialization.
Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.
Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:
* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
Patrick McHardy.
* Fix possible weight overflow in lblc and lblcr schedulers due to
32-bits arithmetics, from Simon Kirby.
* Fix possible memory access race in the lblc and lblcr schedulers,
introduced when it was converted to use RCU, two patches from
Julian Anastasov.
* Fix hard dependency on CPU 0 when reading per-cpu stats in the
rate estimator, from Julian Anastasov.
* Fix race that may lead to object use after release, when invoking
ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If allowed in a country, these channels typically require DFS
so mark them as such. Channel 144 is a bit special, it's coming
into use now to allow more VHT 80 channels, but world roaming
with passive scanning is acceptable anyway. It seems fairly
unlikely that it'll be used as the control channel for a VHT
AP, but it needs to be present to allow a full VHT connection
to an AP that uses it as one of the secondary channels.
Also enable VHT 160 on these channels, and also for channels
36-48 to be able to use VHT 160 there.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Default timeouts are currently set via proc/sysctl interface, the
typical pattern is a file name like:
/proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE
This results in one entry per default protocol state timeout.
This patch simplifies this by allowing to set default protocol
timeouts via cttimeout netlink interface.
This should allow us to get rid of the existing proc/sysctl code
in the midterm.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A CAC should fail if it is triggered while the interface is already
running.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are currently seven different NAT hooks used in both
nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in
nf_conntrack_sip, then set from the nf_nat_sip NAT helper.
And because each of them is exported there is quite some overhead
introduced due of this.
By introducing nf_nat_sip_hooks I am able to reduce both text/data
somewhat. For nf_conntrack_sip e. g. I get
text data bss dec
old 15243 5256 32 20531
new 15010 5192 32 20234
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use proper net struct to allocate skb, otherwise
netlink mmap will be of no effect.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the default setting for WMM parameters outside the for loop
to avoid redundant assignment multiple times.
Signed-off-by: Fred Zhou <fred.zy@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some devices may not be able to report A-MSDUs in
single buffers. Drivers for such devices were
forced to re-assemble A-MSDUs which would then
be eventually disassembled by mac80211. This could
lead to CPU cache thrashing and poor performance.
Since A-MSDU has a single sequence number all
subframes share it. This was in conflict with
retransmission/duplication recovery
(IEEE802.11-2012: 9.3.2.10).
Patch introduces a new flag that is meant to be
set for all individually reported A-MSDU subframes
except the last one. This ensures the
last_seq_ctrl is updated after the last subframe
is processed. If an A-MSDU is actually a duplicate
transmission all reported subframes will be
properly discarded.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[johannes: add braces that were missing even before]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The authentication frame has a fixied size of 30 bytes
(including header, algo num, trans seq num, and status)
followed by a variable challenge text.
Allocate using exact size, instead of over-allocation
by sizeof(ieee80211_mgmt).
Signed-off-by: Fred Zhou <fred.zy@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use proper net struct to allocate skb, otherwise netlink mmap
will have no effect.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add support for parsing and setting the dfs region (ETSI, FCC, JP)
when the internal regulatory database is used. Before this
the DFS region was being ignored even if present on the used
db.txt
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can be useful for drivers if they have any failure cases
when joining an IBSS. Also move setting the queue parameters
to before this new call, in case the new driver op needs them
already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
VHT_CAP_BEAMFORMER_ANTENNAS cap is actually defined in the draft as
VHT_CAP_BEAMFORMEE_STS_MAX, and its size is 3 bits long.
VHT_CAP_SOUNDING_DIMENSIONS is also 3 bits long.
Fix the definitions and change the cap masking accordingly.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In some debugfs related functions snprintf was used
while scnprintf should have been used instead.
(blindly adding the return value of snprintf and supplying
it to the next snprintf might result in buffer overflow when
the input is too big)
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
__xfrm4/6_state_addr_check is a four steps check, all we need to do
is checking whether the destination address match when looking SA
using wildcard source address. Passing saddr from flow is worst option,
as the checking needs to reach the fourth step while actually only
one time checking will do the work.
So, simplify this process by only checking destination address when
using wildcard source address for looking up SAs.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If SA is in the process of acquiring, which indicates this SA is more
promising and precise than the fall back option, i.e. using wild card
source address for searching less suitable SA.
So, here bail out, and try again.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
Tested on linux-3.4.43.
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for the timer handlers to run after the call to
ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
function in order to do proper cleanup when the refcnt reaches 0.
Otherwise, the refcnt can reach zero without the in_device being
destroyed and we end up leaking a reference to the net_device and
see messages like the following,
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
Tested on linux-3.4.43.
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
so initialize max_headroom to zero. Otherwise the
if (encap_limit >= 0) {
max_headroom += 8;
mtu -= 8;
}
increments an uninitialized variable before max_headroom was reset.
Found with coverity: 728539
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c
v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next
v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark code path's likely/unlikely based on most common usage.
* Very few devices use dsa tags.
* Most traffic is Ethernet (not 802.2)
* No sane person uses trailer type or Novell encapsulation
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove old legacy comment and weird if condition.
The comment has outlived it's stay and is throwback to some
early net code (before my time). Maybe Dave remembers what it meant.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.
Problem is this limit is either too big for low rates, or too small
for high rates.
Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.
New limit is two packets or at least 1 to 2 ms worth of packets.
Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.
High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]
Example for a single flow on 10Gbp link controlled by FQ/pacing
14 packets in flight instead of 2
$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.
Additional change : skb->destructor should be set to tcp_wfree().
A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
setting fl6.flowi6_flags as zero after memset is redundant, Remove it.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fq_reset() should drops all packets in queue, including
throttled flows.
This patch moves code from fq_destroy() to fq_reset()
to do the cleaning.
fq_change() must stop calling fq_dequeue() if all remaining
packets are from throttled flows.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
err is set once, then first code resets it.
err = tcf_exts_validate(...)
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than returning earlier value (EINVAL), return ENOMEM if
kzalloc fails. Found while reviewing to find another EINVAL condition.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new set that provides similar functionality to ip,port,net
but permits arbitrary size subnets for both the first and last
parameter.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This patch adds netns support for ipset.
Major changes were made in ip_set_core.c and ip_set.h.
Global variables are moved to per net namespace.
Added initialization code and the destruction of the network namespace ipset subsystem.
In the prototypes of public functions ip_set_* added parameter "struct net*".
The remaining corrections related to the change prototypes of public functions ip_set_*.
The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347
Signed-off-by: Vitaly Lavrov <lve@guap.ru>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
The new extensions require zero initialization for the new element
to be added into a slot from where another element was pushed away.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
The destroy function must take into account that resizing doesn't
create new extensions so those cannot be destroyed at resize.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This provides kernel support for creating ipsets with comment support.
This does incur a penalty to flushing/destroying an ipset since all
entries are walked in order to free the allocated strings, this penalty
is of course less expensive than the operation of listing an ipset to
userspace, so for general-purpose usage the overall impact is expected
to be little to none.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This provides kernel support for creating list ipsets with the comment
annotation extension.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This provides kernel support for creating bitmap ipsets with comment
support.
As is the case for hashes, this incurs a penalty when flushing or
destroying the entire ipset as the entries must first be walked in order
to free the comment strings. This penalty is of course far less than the
cost of listing an ipset to userspace. Any set created without support
for comments will be flushed/destroyed as before.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This adds the core support for having comments on ipset entries.
The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
This adds a new set that provides the ability to configure pairs of
subnets. A small amount of additional handling code has been added to
the generic hash header file - this code is conditionally activated by a
preprocessor definition.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Get rid of the structure based extensions and introduce a blob for
the extensions. Thus we can support more extension types easily.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Default timeout and extension offsets are moved to struct set, because
all set types supports all extensions and it makes possible to generalize
extension support.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
In order to support hash:net,net, hash:net,port,net etc. types,
arrays are introduced for the book-keeping of existing cidr sizes
and network numbers in a set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating a clear error message to the user, which is not
helpful.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Enable ipset port set types to match IPv4 package fragments for
protocols that doesn't have ports (or the port information isn't
supported by ipset).
For example this allows a hash:ip,port ipset containing the entry
192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels
to/from the host. Without this patch only the first package fragment
(with fragment offset 0) was matched, while subsequent fragments wasn't.
This is not possible for IPv6, where the protocol is in the fragmented
part of the package unlike IPv4, where the protocol is in the IP header.
IPPROTO_ICMPV6 is deliberately not included, because it isn't relevant
for IPv4.
Signed-off-by: Anders K. Pedersen <akp@surftown.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
net/netfilter/ipset/ip_set_hash_ipportnet.c:275:20:
warning: symbol 'cidr' shadows an earlier one
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
In commit 8ed781668d ("flow_keys: include thoff into flow_keys for
later usage"), we missed that existing code was using nhoff as a
temporary variable that could not always contain transport header
offset.
This is not a problem for TCP/UDP because port offset (@poff)
is 0 for these protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
include/net/xfrm.h
Simple conflict between Joe Perches "extern" removal for function
declarations in header files and the changes in Steffen's tree.
Steffen Klassert says:
====================
Two patches that are left from the last development cycle.
Manual merging of include/net/xfrm.h is needed. The conflict
can be solved as it is currently done in linux-next.
1) We announce the creation of temporary acquire state via an asyc event,
so the deletion should be annunced too. From Nicolas Dichtel.
2) The VTI tunnels do not real tunning, they just provide a routable
IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier
instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When flags IFF_PROMISC and IFF_ALLMULTI are changed, netlink messages are not
consistent. For example, if a multicast daemon is running (flag IFF_ALLMULTI
set in dev->flags but not dev->gflags, ie not exported to userspace) and then a
user sets it via netlink (flag IFF_ALLMULTI set in dev->flags and dev->gflags, ie
exported to userspace), no netlink message is sent.
Same for IFF_PROMISC and because dev->promiscuity is exported via
IFLA_PROMISCUITY, we may send a netlink message after each change of this
counter.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch only prepares the next one, there is no functional change.
Now, __dev_notify_flags() can also be used to notify flags changes via
rtnetlink.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider the scenario where an IPv6 router is advertising a fixed
preferred_lft of 1800 seconds, while the valid_lft begins at 3600
seconds and counts down in realtime.
A client should reset its preferred_lft to 1800 every time the RA is
received, but a bug is causing Linux to ignore the update.
The core problem is here:
if (prefered_lft != ifp->prefered_lft) {
Note that ifp->prefered_lft is an offset, so it doesn't decrease over
time. Thus, the comparison is always (1800 != 1800), which fails to
trigger an update.
The most direct solution would be to compute a "stored_prefered_lft",
and use that value in the comparison. But I think that trying to filter
out unnecessary updates here is a premature optimization. In order for
the filter to apply, both of these would need to hold:
- The advertised valid_lft and preferred_lft are both declining in
real time.
- No clock skew exists between the router & client.
So in this patch, I've set "update_lft = 1" unconditionally, which
allows the surrounding code to be greatly simplified.
Signed-off-by: Paul Marks <pmarks@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While sending packet skb_cow_head() can change skb header which
invalidates inner_iph pointer to skb header. Following patch
avoid using it. Found by code inspection.
This bug was introduced by commit 0e6fbc5b6c (ip_tunnels: extend
iptunnel_xmit()).
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP packets hitting the SYN proxy through the SYNPROXY target are not
validated by TCP conntrack. When th->doff is below 5, an underflow happens
when calculating the options length, causing skb_header_pointer() to
return NULL and triggering the BUG_ON().
Handle this case gracefully by checking for NULL instead of using BUG_ON().
Reported-by: Martin Topholm <mph@one.com>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
mac80211 scan processing could get stuck if roc work for pending, but
not started when a scan request was deferred due to such roc item.
Normally the deferred scan would be started from
ieee80211_start_next_roc(), but ieee80211_sw_roc_work() calls that only
if the finished ROC was started. Fix this by calling
ieee80211_run_deferred_scan() in the case the last ROC was not actually
started.
This issue was hit relatively easily in P2P find operations where Listen
state (remain-on-channel) and Search state (scan) are repeated in a
loop.
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When clients are idle for too long, hostapd sends nullfunc frames for
probing. When those are acked by the client, the idle time needs to be
updated.
To make this work (and to avoid unnecessary probing), update sta->last_rx
whenever an ACK was received for a tx packet. Only do this if the flag
IEEE80211_HW_REPORTS_TX_ACK_STATUS is set.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows calls for clients in AP_VLANs (e.g. for 4-addr) to succeed
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.
SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.
u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
To be effectively paced, a flow must use FQ packet scheduler.
Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.
I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>