In preparation of validating the length of a register store, use
nft_validate_register_store() in nft_lookup instead of open coding the
validation.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-04-11
This series contains updates to iflink, ixgbe and ixgbevf.
The entire set of changes come from Vlad Zolotarov to ultimately add
the ethtool ops to VF driver to allow querying the RSS indirection table
and RSS random key.
Currently we support only 82599 and x540 devices. On those devices, VFs
share the RSS redirection table and hash key with a PF. Letting the VF
query this information may introduce some security risks, therefore this
feature will be disabled by default.
The new netdev op allows a system administrator to change the default
behaviour with "ip link set" command. The relevant iproute2 patch has
already been sent and awaits for this series upstream.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Also convert the spinlock to a mutex.
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
udp_config.local_udp_port is be16. And iproute2 passes
network order for FOU_ATTR_PORT.
This doesn't fix any bug, just for consistency.
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not a big deal, just for corretness.
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the following harmless warning:
./ip/ip fou del port 7777
[ 122.907516] udp_del_offload: didn't find offload for port 7777
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent adoption of skc_cookie in struct sock_common,
struct tcp_timewait_sock size increased from 192 to 200 bytes
on 64bit arches. SLAB rounds then to 256 bytes.
It is time to drop SLAB_HWCACHE_ALIGN constraint for twsk_slab.
This saves about 12 MB of memory on typical configuration reaching
262144 timewait sockets, and has no noticeable impact on performance.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* new mac80211 internal software queue to allow drivers to have
shorter hardware queues and pull on-demand
* use rhashtable for mac80211 station table
* minstrel rate control debug improvements and some refactoring
* fix noisy message about TX power reduction
* fix continuous message printing and activity if CRDA doesn't respond
* fix VHT-related capabilities with "iw connect" or "iwconfig ..."
* fix Kconfig for cfg80211 wireless extensions compatibility
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJVJ7CPAAoJEDBSmw7B7bqr8+IQAKCAbUyd6PFRT5tcz9kW5GCW
/ibb+n1e14yWKgNEe1gddUGKG/L3HGCBXNkCYzR2M8mlL7dLPqspaBcGHS4dx8F4
D0AuikqvtXIxfAXi0zmU2uo7rH7u2X34R2LtS8AlKByD+jmFvxMiPPvxNFgzJu/7
63UQm73p2pnu/KdXLW1OQEcpZtZJ9+N/uBiq9zbVdX3A8T84ME0oMyy+EAQqCZdK
CcsTXHCnAgmmXWJlu1JRdopr1bd38mSGB70eXduFtPqDdmtQRnoaCQ9e+tJDA4j4
svEw0yDmsc4WG1EKLKKCRd3uFOZsng+lcXrHfpm5wlSPpCOItfQ9BzT3x1u6Y5JU
Z1WMOMkkEce+95U7/RLoXwC/2RS3XelUXTde4cGIRMvO5drOrU58P0gdn3J+yKbv
6v+2GGKy/39tdXUOxIl3EZT/huIl+h1UNO8C2hyaEwdXK+X1zl31/u6kk1Ns18Wr
YPEJixxHx0zR8jaZgDC7OlWLuqn4Ay+Ls9yCyIesdHzKpizJKqn83PntYnpJmxoA
9hlIyRDWnqH44KxzB85ni1C2Qudec3mcCWIWV7M+UoSC1Cgs/LxDzH7kRejR2ZIl
vRhg5pqyr53L0h2lq5DO4Cj4UzbXb7YioKJRxjyKloNOlRrCZtK/VEsHbdsKEcIp
d/wHj1AyFZeQfuhk8Qqr
=mtuo
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
There isn't much left, but we have
* new mac80211 internal software queue to allow drivers to have
shorter hardware queues and pull on-demand
* use rhashtable for mac80211 station table
* minstrel rate control debug improvements and some refactoring
* fix noisy message about TX power reduction
* fix continuous message printing and activity if CRDA doesn't respond
* fix VHT-related capabilities with "iw connect" or "iwconfig ..."
* fix Kconfig for cfg80211 wireless extensions compatibility
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
it's almost always equal to current_fsuid(), but there's an exception -
if the first writeback fid is opened by non-root *and* that happens before
root has done any lookups in /, we end up doing attach for root. The
current code leaves the resulting FID owned by root from the server POV
and by non-root from the client one. Unfortunately, it means that e.g.
massive dcache eviction will leave that user buggered - they'll end
up redoing walks from / *and* picking that FID every time. As soon as
they try to create something, the things will get nasty.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add configuration setting for drivers to allow/block an RSS Redirection
Table and a Hash Key querying for discrete VFs.
On some devices VF share the mentioned above information with PF and
querying it may adduce a theoretical security risk. We want to let a
system administrator to decide if he/she wants to take this risk or not.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-04-09
We've had enough new patches during the past week (especially from
Marcel) that it'd be good to still get these queued for 4.1.
The majority of the changes are from Marcel with lots of cleanup &
refactoring patches for the HCI UART driver. Marcel also split out some
Broadcom & Intel vendor specific functionality into two new btintel &
btbcm modules.
In addition to the HCI driver changes there's the completion of our
local OOB data interface for pairing, added support for requesting
remote LE features when connecting, as well as a couple of minor fixes
for mac802154.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lookup key for tcp_md5_do_lookup() has to be taken
from addr_sk, not sk (which can be the listener)
Fixes: fd3a154a00 ("tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an FDB entry is added or deleted the information about VLAN
is not passed to listening applications like 'bridge monitor fdb'.
With this patch VLAN ID is passed if it was set in the original
netlink message.
Also remove an unused bdev variable.
Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.
11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>
This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.
Fixes: 7faee5c0d5 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
handle_offloads() calls skb_reset_inner_headers() to store
the layer pointers to the encapsulated packet. However, we
currently push the vlag tag (if there is one) onto the packet
afterwards. This changes the MAC header for the encapsulated
packet but it is not reflected in skb->inner_mac_header, which
breaks GSO and drivers which attempt to use this for encapsulation
offloads.
Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next tree.
They are:
* nf_tables set timeout infrastructure from Patrick Mchardy.
1) Add support for set timeout support.
2) Add support for set element timeouts using the new set extension
infrastructure.
4) Add garbage collection helper functions to get rid of stale elements.
Elements are accumulated in a batch that are asynchronously released
via RCU when the batch is full.
5) Add garbage collection synchronization helpers. This introduces a new
element busy bit to address concurrent access from the netlink API and the
garbage collector.
5) Add timeout support for the nft_hash set implementation. The garbage
collector peridically checks for stale elements from the workqueue.
* iptables/nftables cgroup fixes:
6) Ignore non full-socket objects from the input path, otherwise cgroup
match may crash, from Daniel Borkmann.
7) Fix cgroup in nf_tables.
8) Save some cycles from xt_socket by skipping packet header parsing when
skb->sk is already set because of early demux. Also from Daniel.
* br_netfilter updates from Florian Westphal.
9) Save frag_max_size and restore it from the forward path too.
10) Use a per-cpu area to restore the original source MAC address when traffic
is DNAT'ed.
11) Add helper functions to access physical devices.
12) Use these new physdev helper function from xt_physdev.
13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter
state information.
14) Annotate original layer 2 protocol number in nf_bridge info, instead of
using kludgy flags.
15) Also annotate the pkttype mangling when the packet travels back and forth
from the IP to the bridge layer, instead of using a flag.
* More nf_tables set enhancement from Patrick:
16) Fix possible usage of set variant that doesn't support timeouts.
17) Avoid spurious "set is full" errors from Netlink API when there are pending
stale elements scheduled to be released.
18) Restrict loop checks to set maps.
19) Add support for dynamic set updates from the packet path.
20) Add support to store optional user data (eg. comments) per set element.
BTW, I have also pulled net-next into nf-next to anticipate the conflict
resolution between your okfn() signature changes and Florian's br_netfilter
updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-04-09
1) Prohibit the use/abuse of the xfrm netlink interface on
32/64 bit compatibility tasks. We need a full compat
layer before we can allow this. From Fan Du.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2015-04-09
1) We dereferenced the xfrm outer_mode too early, larval
SAs don't have it set. Move the dereference of the
outer mode below the larval SA check to fix it.
From Alexey Dobriyan.
2) Fix vti6 tunnel uninit on namespace crosssing.
From Yao Xiwei.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When establishing a Bluetooth LE connection, read the remote used
features mask to determine which features are supported. This was
not really needed with Bluetooth 4.0, but since Bluetooth 4.1 and
also 4.2 have introduced new optional features, this becomes more
important.
This works the same as with BR/EDR where the connection enters the
BT_CONFIG stage and hci_connect_cfm call is delayed until the remote
features have been retrieved. Only after successfully receiving the
remote features, the connection enters the BT_CONNECTED state.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
For kernel_sendmsg() that eliminates the need to play with setfs();
for kernel_recvmsg() it does *not* - a couple of callers are using
it with non-NULL ->msg_control, which would be treated as userland
address on recvmsg side of things.
In all cases we are really setting a kvec-backed iov_iter, though.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
trivial conflict in net/socket.c and non-trivial one in crypto -
that one had evaded aio_complete() removal.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
FastOpen requests are not like other regular request sockets.
They do not yet use rsk_timer : tcp_fastopen_queue_check()
simply manually removes one expired request from fastopenq->rskq_rst
list.
Therefore, tcp_check_req() must not call mod_timer_pending(),
otherwise we crash because rsk_timer was not initialized.
Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 4.1.
This is a shorter one than usual, as the Intel Field Peak NFC
driver could not make it in time.
We have:
- A new driver for NXP NCI based chipsets, like e.g. the NPC100 or
the PN7150. It currently only supports an i2c physical layer, but
could easily be extended to work on top of e.g. SPI.
This driver also includes support for user space triggered firmware
updates.
- A few minor st21nfc[ab] fixes, cleanups, and comments improvements.
- A pn533 error return fix.
- A few NFC related logs formatting cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVJV81AAoJEIqAPN1PVmxKM3sP/R6fKXMxtVxXsiPzBMk+SpBI
onNXbCx27rp1lHTFznE16JAaaSfcQEwSQmYx7xa6KXqRYVlDfRfC+5R+rPGrCjfH
kV/eLBNnYpmPw0ViVT7dsWK0b4me0k5pr9ki9mze3YxvuMbA5vdv0tvuRFz5IRF/
hl+WI5pntGuRtnIyBKIauAMylylUYVvCBGhmHnveiX0Dp8noJLBSV0wvdzujm51S
+Uio/jHlUEV2lotrQBOfWNtEkwonXSwzZWSzimBCyEGLAwTx4lGXHmQftOtPd3zE
sOT7Gw77ZCsulHoHiJyC0KpDSDS0NYVrtTI5BiuGgAivGi0YZw2XD3CYRIdy0m2Z
lMoodYdiCgsxmrU6I6Rd/7DQBxD0Vhc+CGAyk41f7AwU4Fwq105kpSLupTtU+NzT
ZpdvSXeirU8sxt+3WDOgv8esyYGZVVD8GuBbofMZQZ0vLq+FcDpYAW4w3LKvvi6X
C+WN8f7SCI0kqpd4leyl6EG3SoQKFyPWobu0IlV520R9b76iBcyqooTIpvVa4Yk7
az6fKhi9gK/T6FHW78y6fnkczd47JKrC924m+g6P/hhTD5zQ956ferp0uFTjkPtF
8kNgRT7OIRi1JO783cnQx1uA61axC3GX1HFzsD9paVkTwRNtJ3eFsxtcYt7Nfv8D
WGNWRQp5LmBsD/SqFBfk
=MzOJ
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC: 4.1 pull request
This is the NFC pull request for 4.1.
This is a shorter one than usual, as the Intel Field Peak NFC
driver could not make it in time.
We have:
- A new driver for NXP NCI based chipsets, like e.g. the NPC100 or
the PN7150. It currently only supports an i2c physical layer, but
could easily be extended to work on top of e.g. SPI.
This driver also includes support for user space triggered firmware
updates.
- A few minor st21nfc[ab] fixes, cleanups, and comments improvements.
- A pn533 error return fix.
- A few NFC related logs formatting cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
const __read_mostly is a senseless combination. If something
is already const it cannot be __read_mostly. Remove the bogus
__read_mostly in the fou driver.
This fixes section conflicts with LTO.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More recent GCC warns about two kinds of switch statement uses:
1) Switching on an enumeration, but not having an explicit case
statement for all members of the enumeration. To show the
compiler this is intentional, we simply add a default case
with nothing more than a break statement.
2) Switching on a boolean value. I think this warning is dumb
but nevertheless you get it wholesale with -Wswitch.
This patch cures all such warnings in netfilter.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a determined set of concurrent senders keep the send queue full,
we can loop forever inside rds_send_xmit. This fix has two parts.
First we are dropping out of the while(1) loop after we've processed a
large batch of messages.
Second we add a generation number that gets bumped each time the
xmit bit lock is acquired. If someone else has jumped in and
made progress in the queue, we skip our goto restart.
Original patch by Chris Mason.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passive connections were added for the case where one loopback IB
connection between identical addresses needs another connection to store
the second QP. Unfortunately, they were also created in the case where
the addesses differ and we already have both QPs.
This lead to a message reordering bug.
- two different IB interfaces and addresses on a machine: A B
- traffic is sent from A to B
- connection from A-B is created, connect request sent
- listening accepts connect request, B-A is created
- traffic flows, next_rx is incremented
- unacked messages exist on the retrans list
- connection A-B is shut down, new connect request sent
- listen sees existing loopback B-A, creates new passive B-A
- retrans messages are sent and delivered because of 0 next_rx
The problem is that the second connection request saw the previously
existing parent connection. Instead of using it, and using the existing
next_rx_seq state for the traffic between those IPs, it mistakenly
thought that it had to create a passive connection.
We fix this by only using passive connections in the special case where
laddr and faddr match. In this case we'll only ever have one parent
sending connection requests and one passive connection created as the
listening path sees the existing parent connection which initiated the
request.
Original patch by Zach Brown
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 79b16aadea ("udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The socket parameter might legally be NULL, thus sock_net is sometimes
causing a NULL pointer dereference. Using net_device pointer in dst_entry
is more reliable.
Fixes: b6a7719aed ("ipv4: hash net ptr into fragmentation bucket selection")
Reported-by: Rick Jones <rick.jones2@hp.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a new "dynset" expression for dynamic set updates.
A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.
Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently a set binding is assumed to be related to a lookup and, in
case of maps, a data load.
In order to use bindings for set updates, the loop detection checks
must be restricted to map operations only. Add a flags member to the
binding struct to hold the set "action" flags such as NFT_SET_MAP,
and perform loop detection based on these.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use atomic operations for the element count to avoid races with async
updates.
To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The NFT_SET_TIMEOUT flag is ignore in nft_select_set_ops, which may
lead to selection of a set implementation that doesn't actually
support timeouts.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_bridge_info->mask is used for several things, for example to
remember if skb->pkt_type was set to OTHER_HOST.
For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter
though -- routing expects PACKET_HOST.
Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook
invocation and then un-does it after hook traversal.
This information is irrelevant outside of br_netfilter.
After this change, ->mask now only contains flags that need to be
known outside of br_netfilter in fast-path.
Future patch changes mask into a 2bit state field in sk_buff, so that
we can remove skb->nf_bridge pointer for good and consider all remaining
places that access nf_bridge info content a not-so fastpath.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
->mask is a bit info field that mixes various use cases.
In particular, we have flags that are mutually exlusive, and flags that
are only used within br_netfilter while others need to be exposed to
other parts of the kernel.
Remove BRNF_8021Q/PPPoE flags. They're mutually exclusive and only
needed within br_netfilter context.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Don't access skb->nf_bridge directly, this pointer will be removed soon.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
right now we store this in the nf_bridge_info struct, accessible
via skb->nf_bridge. This patch prepares removal of this pointer from skb:
Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out
device (or ifindexes).
Followup patches to netfilter will then allow nf_bridge_info to be
obtained by a call into the br_netfilter core, rather than keeping a
pointer to it in sk_buff.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
br_netfilter maintains an extra state, nf_bridge_info, which is attached
to skb via skb->nf_bridge pointer.
Amongst other things we use skb->nf_bridge->data to store the original
mac header for every processed skb.
This is required for ip refragmentation when using conntrack
on top of bridge, because ip_fragment doesn't copy it from original skb.
However there is no need anymore to do this unconditionally.
Move this to the one place where its needed -- when br_netfilter calls
ip_fragment().
Also switch to percpu storage for this so we can handle fragmenting
without accessing nf_bridge meta data.
Only user left is neigh resolution when DNAT is detected, to hold
the original source mac address (neigh resolution builds new mac header
using bridge mac), so rename ->data and reduce its size to whats needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently in xt_socket, we take advantage of early demuxed sockets
since commit 00028aa370 ("netfilter: xt_socket: use IP early demux")
in order to avoid a second socket lookup in the fast path, but we
only make partial use of this:
We still unnecessarily parse headers, extract proto, {s,d}addr and
{s,d}ports from the skb data, accessing possible conntrack information,
etc even though we were not even calling into the socket lookup via
xt_socket_get_sock_{v4,v6}() due to skb->sk hit, meaning those cycles
can be spared.
After this patch, we only proceed the slower, manual lookup path
when we have a skb->sk miss, thus time to match verdict for early
demuxed sockets will improve further, which might be i.e. interesting
for use cases such as mentioned in 681f130f39 ("netfilter: xt_socket:
add XT_SOCKET_NOWILDCARD flag").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The change to only export WEXT symbols when required could break
the build if CONFIG_CFG80211_WEXT was explicitly disabled while
a driver like orinoco selected it.
Fix this by hiding the symbol when it's required so it can't be
disabled in that case.
Fixes: 2afe38d15c ("cfg80211-wext: export symbols only when needed")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fast Open has been using an experimental option with a magic number
(RFC6994). This patch makes the client by default use the RFC7413
option (34) to get and send Fast Open cookies. This patch makes
the client solicit cookies from a given server first with the
RFC7413 option. If that fails to elicit a cookie, then it tries
the RFC6994 experimental option. If that also fails, it uses the
RFC7413 option on all subsequent connect attempts. If the server
returns a Fast Open cookie then the client caches the form of the
option that successfully elicited a cookie, and uses that form on
later connects when it presents that cookie.
The idea is to gradually obsolete the use of experimental options as
the servers and clients upgrade, while keeping the interoperability
meanwhile.
Signed-off-by: Daniel Lee <Longinus00@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fast Open has been using the experimental option with a magic number
(RFC6994) to request and grant Fast Open cookies. This patch enables
the server to support the official IANA option 34 in RFC7413 in
addition.
The change has passed all existing Fast Open tests with both
old and new options at Google.
Signed-off-by: Daniel Lee <Longinus00@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes byte backlog accounting for the first of two chained netem instances.
Bytes backlog reported now corresponds to the number of queued packets.
When two netem instances are chained, for instance to apply rate and queue
limitation followed by packet delay, the number of backlogged bytes reported
by the first netem instance is wrong. It reports the sum of bytes in the queues
of the first and second netem. The first netem reports the correct number of
backlogged packets but not bytes. This is shown in the example below.
Consider a chain of two netem schedulers created using the following commands:
$ tc -s qdisc replace dev veth2 root handle 1:0 netem rate 10000kbit limit 100
$ tc -s qdisc add dev veth2 parent 1:0 handle 2: netem delay 50ms
Start an iperf session to send packets out on the specified interface and
monitor the backlog using tc:
$ tc -s qdisc show dev veth2
Output using unpatched netem:
qdisc netem 1: root refcnt 2 limit 100 rate 10000Kbit
Sent 98422639 bytes 65434 pkt (dropped 123, overlimits 0 requeues 0)
backlog 172694b 73p requeues 0
qdisc netem 2: parent 1: limit 1000 delay 50.0ms
Sent 98422639 bytes 65434 pkt (dropped 0, overlimits 0 requeues 0)
backlog 63588b 42p requeues 0
The interface used to produce this output has an MTU of 1500. The output for
backlogged bytes behind netem 1 is 172694b. This value is not correct. Consider
the total number of sent bytes and packets. By dividing the number of sent
bytes by the number of sent packets, we get an average packet size of ~=1504.
If we divide the number of backlogged bytes by packets, we get ~=2365. This is
due to the first netem incorrectly counting the 63588b which are in netem 2's
queue as being in its own queue. To verify this is the case, we subtract them
from the reported value and divide by the number of packets as follows:
172694 - 63588 = 109106 bytes actualled backlogged in netem 1
109106 / 73 packets ~= 1494 bytes (which matches our MTU)
The root cause is that the byte accounting is not done at the
same time with packet accounting. The solution is to update the backlog value
every time the packet queue is updated.
Signed-off-by: Joseph D Beshay <joseph.beshay@utdallas.edu>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Read Local Out Of Band Extended Data mgmt command is specified to
return the SSP values when given a BR/EDR address type as input
parameter. The returned values may include either the 192-bit variants
of C and R, or their 256-bit variants, or both, depending on the status
of Secure Connections and Secure Connections Only modes. If SSP is not
enabled the command will only return the Class of Device value (like it
has done so far).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Which this patch, it's possible to dump the list of ids allocated for peer
netns.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, netns ids that are created and deleted are advertised into the
group RTNLGRP_NSID.
Because callers of rtnl_net_notifyid() already know the id of the peer, there is
no need to call __peernet2id() in rtnl_net_fill().
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to initialize err, it will be overridden by the value of nlmsg_parse().
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
That was we can make sure the output path of ipv4/ipv6 operate on
the UDP socket rather than whatever random thing happens to be in
skb->sk.
Based upon a patch by Jiri Pirko.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
On the output paths in particular, we have to sometimes deal with two
socket contexts. First, and usually skb->sk, is the local socket that
generated the frame.
And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.
We do not want to disassociate skb->sk when encapsulating in order
to fix this, because that would break socket memory accounting.
The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device. We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is currently always set to NULL, but nf_queue is adjusted to be
prepared for it being set to a real socket by taking and releasing a
reference to that socket when necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
The hci_recv_stream_fragment function should have never been introduced
in the first place. The Bluetooth core does not need to know anything
about the HCI transport protocol.
With all transport protocol specific detailed moved back into the
drivers where they belong (mainly generic USB and UART drivers), this
function can now be removed.
This reduces the size of hci_dev structure and also removes an exported
symbol from the Bluetooth core module.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The data pointer provided to hci_recv_stream_fragment function should
have been marked const. The function has no business in modifying the
original data. So fix this now.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Every tracing file must have its own TRACE_SYSTEM defined.
The mac80211 tracepoint header broke this and add in the middle
of the file had:
#undef TRACE_SYSTEM
#define TRACE_SYSTEM mac80211_msg
Unfortunately, this broke new code in the ftrace infrastructure.
Moving the mac80211_msg into its own trace file with its own
TRACE_SYSTEM defined fixes the issue.
Link: http://lkml.kernel.org/r/1428389938.1841.1.camel@sipsolutions.net
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This reverts commit 89baaa570a.
Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support. (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.) See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.
On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.
[1] "SOCK_MEMALLOC vs loopback"
http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
http://www.spinics.net/lists/ceph-devel/msg23392.html
Conflicts:
net/ceph/messenger.c [ context: tcp_nodelay option ]
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mel Gorman <mgorman@suse.de>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-04-04
Here's what's probably the last bluetooth-next pull request for 4.1:
- Fixes for LE advertising data & advertising parameters
- Fix for race condition with HCI_RESET flag
- New BNEPGETSUPPFEAT ioctl, needed for certification
- New HCI request callback type to get the resulting skb
- Cleanups to use BIT() macro wherever possible
- Consolidate Broadcom device entries in the btusb HCI driver
- Check for valid flags in CMTP, HIDP & BNEP
- Disallow local privacy & OOB data combo to prevent a potential race
- Expose SMP & ECDH selftest results through debugfs
- Expose current Device ID info through debugfs
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the connect request from userspace didn't include an extended
capabilities IE, create one using the driver capabilities. This
fixes VHT associations, since those need to set the operating mode
notification capability.
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As the next patch will require the IE splitting utility functions
in cfg80211, move them there from mac80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the kernel deleted a vti6 interface, this interface was not removed from
the tunnels list. Thus, when the ip6_vti module was removed, this old interface
was found and the kernel tried to delete it again. This was leading to a kernel
panic.
Fixes: 61220ab349 ("vti6: Enable namespace changing")
Signed-off-by: Yao Xiwei <xiwei.yao@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
https://bugzilla.kernel.org/show_bug.cgi?id=95211
Commit 70be6c91c8
("xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer") added check
which dereferences ->outer_mode too early but larval SAs don't have
this pointer set (yet). So check for tunnel stuff later.
Mike Noordermeer reported this bug and patiently applied all the debugging.
Technically this is remote-oops-in-interrupt-context type of thing.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
IP: [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0
...
[<ffffffff81500fc6>] ? xfrm4_esp_rcv+0x36/0x70
[<ffffffff814acc9a>] ? ip_local_deliver_finish+0x9a/0x200
[<ffffffff81471b83>] ? __netif_receive_skb_core+0x6f3/0x8f0
...
RIP [<ffffffff8150dca2>] xfrm_input+0x3c2/0x5a0
Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Conflicts:
drivers/net/ethernet/mellanox/mlx4/cmd.c
net/core/fib_rules.c
net/ipv4/fib_frontend.c
The fib_rules.c and fib_frontend.c conflicts were locking adjustments
in 'net' overlapping addition and removal of code in 'net-next'.
The mlx4 conflict was a bug fix in 'net' happening in the same
place a constant was being replaced with a more suitable macro.
Signed-off-by: David S. Miller <davem@davemloft.net>
According to description in 'include/net/dsa.h', in cascade switches
configurations where there are more than one interconnected devices,
'rtable' array in 'dsa_chip_data' structure is used to indicate which
port on this switch should be used to send packets to that are destined
for corresponding switch.
However, dsa_of_setup_routing_table() fills 'rtable' with port numbers
of the _target_ switch, but not current one.
This commit removes redundant devicetree parsing and adds needed port
number as a function argument. So dsa_of_setup_routing_table() now just
looks for target switch number by parsing parent of 'link' device node.
To remove possible misunderstandings with the way of determining target
switch number, a corresponding comment was added to the source code and
to the DSA device tree bindings documentation file.
This was tested on a custom board with two Marvell 88E6095 switches with
following corresponding routing tables: { -1, 10 } and { 8, -1 }.
Signed-off-by: Pavel Nakonechny <pavel.nakonechny@skitlab.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 608cd71a9c ("tc: bpf: generalize pedit action") has added the
possibility to mangle packet data to BPF programs in the tc pipeline.
This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace()
for fixing up the protocol checksums after the packet mangling.
It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid
unnecessary checksum recomputations when BPF programs adjusting l3/l4
checksums and documents all three helpers in uapi header.
Moreover, a sample program is added to show how BPF programs can make use
of the mangle and csum helpers.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
That way we don't have to reinstantiate another nf_hook_state
on the stack of the nf_reinject() path.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of passing a large number of arguments down into the nf_hook()
entry points, create a structure which carries this state down through
the hook processing layers.
This makes is so that if we want to change the types or signatures of
any of these pieces of state, there are less places that need to be
changed.
Signed-off-by: David S. Miller <davem@davemloft.net>
The TX power field in the LE advertising data should be placed last
since it needs to be possible to enable kernel controlled TX power,
but still allow for userspace provided flags field.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
With this patch kernel will be able to handle setup request. This is
needed if we would like to handle control mesages with extension
headers. User space will be only resposible for reading setup data and
checking if scenario is conformance to specification (dst and src device
bnep role). In case of new user space, setup data must be leaved(peek
msg) on queue. New bnep session will be responsible for handling this
data.
Signed-off-by: Grzegorz Kolodziejczyk <grzegorz.kolodziejczyk@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Handling extended headers of control frames is required BNEP
functionality. This patch refractor bnep rx frame handling function.
Extended header for control frames shouldn't be omitted as it was
previously done. Every control frame should be checked if it contains
extended header and then every extension should be parsed separately.
Signed-off-by: Grzegorz Kolodziejczyk <grzegorz.kolodziejczyk@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This is needed if user space wants to know supported bnep features
by kernel, e.g. if kernel supports sending response to bnep setup
control message. By now there is no possibility to know supported
features by kernel in case of bnep. Ioctls allows only to add connection,
delete connection, get connection list, get connection info. Adding
connection if it's possible (establishing network device connection) is
equivalent to starting bnep session. Bnep session handles data queue of
transmit, receive messages over bnep channel. It means that if we add
connection the received/transmitted data will be parsed immediately. In
case of get bnep features we want to know before session start, if we
should leave setup data on socket queue and let kernel to handle with it,
or in case of no setup handling support, if we should pull this message
and handle setup response within user space.
Signed-off-by: Grzegorz Kolodziejczyk <grzegorz.kolodziejczyk@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This adds the ability to read out the skb->priority from an eBPF
program, so that it can be taken into account from a tc filter
or action for the use-case where the priority is not being used
to directly override the filter classification in a qdisc, but
to tag traffic otherwise for the classifier; the priority can be
assigned from various places incl. user space, in future we may
also mangle it from an eBPF program.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send command not understood response should be verified if it was
successfully sent, like all send responses.
Signed-off-by: Grzegorz Kolodziejczyk <grzegorz.kolodziejczyk@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>