We can allow a filter to be added successfully to the underlying
hardware, but still return an error if the cached list memory
allocation fails. This patch fixes that condition.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
with 32 bit userland and 64 bit kernels, it is unlikely but possible
that insertion of new rules fails even tough there are only about 2000
iptables rules.
This happens because the compat delta is using a short int.
Easily reproducible via "iptables -m limit" ; after about 2050
rules inserting new ones fails with -ELOOP.
Note that compat_delta included 2 bytes of padding on x86_64, so
structure size remains the same.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This will cause trouble once CONFIG_COMPAT support is added to ebtables.
xt_compat_*_offset() calculate the kernel/userland structure size delta
using:
XT_ALIGN(size) - COMPAT_XT_ALIGN(size)
If the match/target sizes are aligned at registration time,
delta is always zero.
Should have zero effect for existing systems: xtables uses
XT_ALIGN() whenever it deals with match/target sizes.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
next_offset must be > 0, otherwise this loops forever.
The offset also contains the size of the ebt_entry structure
itself, so anything smaller is invalid.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Normally, each connection needs a unique identity. Conntrack zones allow
to specify a numerical zone using the CT target, connections in different
zones can use the same identity.
Example:
iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1
iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1
Signed-off-by: Patrick McHardy <kaber@trash.net>
The error handlers might need the template to get the conntrack zone
introduced in the next patches to perform a conntrack lookup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
It is one of these things that iptables cannot catch and which can
cause "Invalid argument" to be printed. Without a hint in dmesg, it is
not going to be helpful.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit 4cd24eaf0c
("net: use netdev_mc_count and netdev_mc_empty when appropriate")
added this hunk to net/mac80211/iface.c:
__dev_addr_unsync(&local->mc_list, &local->mc_count,
- &dev->mc_list, &dev->mc_count);
+ &dev->mc_list, dev->mc_count);
which is definitely not correct, introduced a warning (reported
by Stephen Rothwell):
net/mac80211/iface.c: In function 'ieee80211_stop':
net/mac80211/iface.c:416: warning: passing argument 4 of '__dev_addr_unsync' makes pointer from integer without a cast
include/linux/netdevice.h:1967: note: expected 'int *' but argument is of type 'int'
and is thus reverted here.
Signed-off-by: David S. Miller <davem@davemloft.net>
The function name must be followed by a space, hypen, space, and a
short description.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add code to allow rtnetlink clients to query and set VF information through
the PF driver.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable 'copied' is used in udp_recvmsg() to emphasize that the passed
'len' is adjusted to fit the actual datagram length. But the same can be
done by adjusting 'len' directly. This patch thus removes the indirection.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
DCCP is datagram-oriented but lacks UDP's support for MSG_TRUNC as defined in
recvmsg(2)/recv(2). Hence the following 'Hello world\0' receiver
len = recv(fd, buf, 10, MSG_PEEK | MSG_TRUNC);
wrongly (always) returns 10, while in UDP it returns 12 as expected.
This patch adds the missing MSG_TRUNC support to recvmsg().
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some XFRM attributes were not going through basic validation.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't need to disable bottom half it is already down in the
previous lock. Move some blank lines to group locking in same
context.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Permanent IPV6 addresses should not be removed when the link is
set to admin down, only when device is removed.
When link is lost permanent addresses should be marked as tentative
so that when link comes back they are subject to duplicate address
detection (if DAD was enabled for that address).
Other routing systems keep manually configured IPv6 addresses
when link is set down.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the fib size exceeds what can be dumped in a single skb, the
dump is suspended and resumed once the last skb has been received
by userspace. When the fib is changed while the dump is suspended,
the walker might contain stale pointers, causing a crash when the
dump is resumed.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
PGD 5347a067 PUD 65c7067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
RIP: 0010:[<ffffffffa01bce04>]
[<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6]
...
Call Trace:
[<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71
[<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6]
[<ffffffff81371af4>] netlink_dump+0x5b/0x19e
[<ffffffff8134f288>] ? consume_skb+0x28/0x2a
[<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6
[<ffffffff81372781>] ? netlink_unicast+0xfa/0x151
[<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79
[<ffffffff81348a53>] sock_recvmsg+0xca/0xe3
[<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38
[<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10
[<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5
[<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f
[<ffffffff810ef152>] ? fget_light+0x2f/0xac
[<ffffffff813519e7>] ? verify_iovec+0x4f/0x94
[<ffffffff81349a65>] sys_recvmsg+0x14d/0x223
Store the serial number when beginning to walk the fib and reload
pointers when continuing to walk after a change occured. Similar
to other dumping functions, this might cause unrelated entries to
be missed when entries are deleted.
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For hardware with IEEE80211_HW_HAS_RATE_CONTROL the rate controller is not
initialized. However, calling functions such as ieee80211_beacon_get result
in the rate_control_get_rate function getting called, which is accessing
(in this case uninitialized) rate control structures unconditionally.
Fix by exiting the function before setting the rates for HW with
IEEE80211_HW_HAS_RATE_CONTROL set. The initialization of the ieee80211_tx_info
struct is intentionally still executed.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Kalle Valo <kalle.valo@nokia.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes a problem in the DCCP getsockopt() API: currently there is no way
for a user to a priori know the number of built-in CCIDs, other than trying
DCCP_SOCKOPT_AVAILABLE_CCIDS in a loop, incrementing the option length until
EINVAL is no longer returned.
This patch truncates the array to the user-provided length. No copy is made
when the length is <= 0.
Due to the length restriction in do_dccp_getsockopt() to sizeof(int), the
minimum array length remains 4, which is a reasonable default (only 3
CCIDs, CCID-2..4, are currently defined).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we treat IGMPv3 reports as if it were an IGMPv2/v1 report.
This is broken as IGMPv3 reports are formatted differently. So we
end up suppressing a bogus multicast group (which should be harmless
as long as the leading reserved field is zero).
In fact, IGMPv3 does not allow membership report suppression so
we should simply ignore IGMPv3 membership reports as a host.
This patch does exactly that. I kept the case statement for it
so people won't accidentally add it back thinking that we overlooked
this case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces dev->mc_count in all drivers (hopefully I didn't miss
anything). Used spatch and did small tweaks and conding style changes when
it was suitable.
Jirka
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
call_rcu() will unconditionally reinitialize RCU head anyway.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Initialize the .cmd member of various ethtool using a designated struct
initializer rather. This makes things a teeny bit more robust, although
the chance of a struct layout changing is extremely remote, and also
makes the code a little easier to read.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In POST_ROUTING hook, calling dev_net(in) is going to oops.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
nf_nat_mangle_tcp_packet() can currently only handle a single mangling
per window because it only maintains two sequence adjustment positions:
the one before the last adjustment and the one after.
This patch makes sequence number adjustment tracking in
nf_nat_mangle_tcp_packet() optional and allows a helper to manually
update the offsets after the packet has been fully handled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add TCP support, which is mandated by RFC3261 for all SIP elements.
SIP over TCP is similar to UDP, except that messages are delimited
by Content-Length: headers and multiple messages may appear in one
packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
When using TCP multiple SIP messages might be present in a single packet.
A following patch will parse them by setting the dptr to the beginning of
each message. The NAT helper needs to reload the dptr value after mangling
the packet however, so it needs to know the offset of the message to the
beginning of the packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
When requests are parsed, the "sip:" part of the SIP URI should be skipped.
Usually this doesn't matter because address parsing skips forward until after
the username part, but in case REGISTER requests it doesn't contain a username
and the address can not be parsed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Make the output a bit more informative by showing the helper an expectation
belongs to and the expectation class.
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patchset enables the ethtool layer to program n-tuple
filters to an underlying device. The idea is to allow capable
hardware to have static rules applied that can assist steering
flows into appropriate queues.
Hardware that is known to support these types of filters today
are ixgbe and niu.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure, that TCP has a nonzero RTT estimation after three-way
handshake. Currently, a listening TCP has a value of 0 for srtt,
rttvar and rto right after the three-way handshake is completed
with TCP timestamps disabled.
This will lead to corrupt RTO recalculation and retransmission
flood when RTO is recalculated on backoff reversion as introduced
in "Revert RTO on ICMP destination unreachable"
(f1ecd5d9e7).
This behaviour can be provoked by connecting to a server which
"responds first" (like SMTP) and rejecting every packet after
the handshake with dest-unreachable, which will lead to softirq
load on the server (up to 30% per socket in some tests).
Thanks to Ilpo Jarvinen for providing debug patches and to
Denys Fedoryshchenko for reporting and testing.
Changes since v3: Removed bad characters in patchfile.
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In associated state, when bringing an interface down, existing
BA sessions are torn down. When this is in progress, nothing
prevents mac80211 from accepting another BA session start request.
Use a new station flag to fix this.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The static initial tables are pretty large, and after the net
namespace has been instantiated, they just hang around for nothing.
This commit removes them and creates tables on-demand at runtime when
needed.
Size shrinks by 7735 bytes (x86_64).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
The respective xt_table structures already have most of the metadata
needed for hook setup. Add a 'priority' field to struct xt_table so
that xt_hook_link() can be called with a reduced number of arguments.
So should we be having more tables in the future, it comes at no
static cost (only runtime, as before) - space saved:
6807373->6806555.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
The calls to ip6t_do_table only show minimal differences, so it seems
like a good cleanup to merge them to a single one too.
Space saving obtained by both patches: 6807725->6807373
("Total" column from `size -A`.)
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
This patch combines all the per-hook functions in a given table into
a single function. Together with the 2nd patch, further
simplifications are possible up to the point of output code reduction.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Rewrite COMPAT_XT_ALIGN in terms of dummy structure hack.
Compat counters logically have nothing to do with it.
Use ALIGN() macro while I'm at it for same types.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Even if the null data frame is not acked by the AP, mac80211
goes into power save. This might lead to loss of frames
from the AP.
Prevent this by restarting dynamic_ps_timer when ack is not
received for null data frames.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The action modules have been prefixed with 'act_', but the Kconfig
description was not changed.
Signed-off-by: Jan Luebbe <jluebbe@debian.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
restructure client create code to handle error cases better and
only cleanup initialized portions of the stack.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Kernel bugzilla #15239
On some workloads, it is quite possible to get a huge dst list to
process in dst_gc_task(), and trigger soft lockup detection.
Fix is to call cond_resched(), as we run in process context.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Options pointer is being moved before calling kfree() which seems
to cause problems. This uses a separate pointer to track and free
original allocation.
Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
The current mac80211 implementation enables power save if there
is no Tx traffic for a specific timeout. Hence, PS is triggered
even if there is a continuous Rx only traffic(like UDP) going on.
This makes the drivers to wait on the tim bit in the next beacon
to awake which leads to redundant sleep-wake cycles.
Fix this by restarting the dynamic ps timer on receiving every
data packet.
Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
CC: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rate_control_alloc is not used by anything outside of
ieee80211_init_rate_ctrl_alg. Both are in rate.c; there's no reason to make
rate_control_alloc visible outside of it.
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
get_tx_stats() driver operation is not currently used anywhere in mac80211
and there are no plans to use it in the not-so-near future. So it can go
without anyone missing it.
Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When userspace requests a deauth while the
authentication work is pending in the auth
(not probe) state, we do not properly abort
the work and then things get confused.
Fix that and also improve the checks here
to include the correct virtual interface,
just in case two virtual interfaces would
ever try to connect to the same BSS.
Also fix a bug -- need to use list_del_rcu
instead of just list_del to free a work
item.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In AP mode, the only mode where the parameter
is supposed to be valid, we never assign it!
Fix that to allow drivers to avoid parsing
the TIM IE for the value.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This file helps debugging HT channels since it displays if we are on
ht20 or ht40+/ht40-
Signed-off-by: Benoit Papillault <benoit.papillault@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When there is a need to restart/reconfig hw, tear down all the
aggregation queues and let the mac80211 and driver get in-sync to have
the opportunity to re-establish the aggregation queues again.
Need to wait until driver re-establish all the station information before tear
down the aggregation queues, driver(at least iwlwifi driver) will reject the
stop aggregation queue request if station is not ready. But also need to make
sure the aggregation queues are tear down before waking up the queues, so
mac80211 will not sending frames with aggregation bit set.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Many drivers would like to sleep during station
addition and removal, and currently have a high
complexity there from not being able to.
This introduces two new callbacks sta_add() and
sta_remove() that drivers can implement instead
of using sta_notify() and that can sleep, and
the new sta_add() callback is also allowed to
fail.
The reason we didn't do this previously is that
the IBSS code wants to insert stations from the
RX path, which is a tasklet, so cannot sleep.
This patch will keep the station allocation in
that path, but moves adding the station to the
driver out of line. Since the addition can now
fail, we can have IBSS peer structs the driver
rejected -- in that case we still talk to the
station but never tell the driver about it in
the control.sta pointer. If there will ever be
a driver that has a low limit on the number of
stations and that cannot talk to any stations
that are not known to it, we need to do come up
with a new strategy of handling larger IBSSs,
maybe quicker expiry or rejecting peers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We can now easily determine whether we already
have probe response information for the BSS we
are asked to connect to, in which case there's
little point in probing the BSS again.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Upstream radiotap has adopted the namespace
proposal David Young made and I then took care
of, for which I had adapted the radiotap parser
as a library outside the kernel. This brings
the in-kernel parser up to speed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reinette found the reason for the warnings that
happened occasionally when a hw-offloaded scan
finished; her description of the problem:
mac80211 will defer the handling of scan requests if it is
busy with management work at the time. The scan requests
are deferred and run after the work has completed. When
this occurs there are currently two problems.
* The scan request for hardware scan is not fully populated
with the band and channels to scan not initialized.
* When the scan is queued the state is not correctly updated
to reflect that a scan is in progress. The problem here is
that when the driver completes the scan and calls
ieee80211_scan_completed() a warning will be triggered
since mac80211 was not aware that a scan was in progress.
The reason is that the queued scan work will start
the hw scan right away when the hw_scan_req struct
has already been allocated. However, in the first
pass it will not have been filled, which happens
at the same time as setting the bits. To fix this,
simply move the allocation after the pending work
test as well, so that the first iteration of the
scan work will call __ieee80211_start_scan() even
in the hardware scan case.
Bug-identified-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We only reply to probe request if either the requested SSID is the
broadcast SSID or if the requested SSID matches our own SSID. This
latter case was not properly handled since we were replying to different
SSID with the same length as our own SSID.
Signed-off-by: Benoit Papillault <benoit.papillault@free.fr>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
stat structures contain a size prefix. In our twstat messages
we were including the size of the size prefix in the prefix, which is not
what the protocol wants, and Inferno servers would complain.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
If the user specifies a transport and we can't find it, we failed back
to the default trainsport silently. This patch will make the code
complain more loudly and return an error code.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The 9p virtio transport was not updating its connection status correctly
preventing it from being able to mount the server.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.
Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per C99 6.2.4(2) when temporary table data goes out of scope,
the behaviour is undefined:
if (compat) {
struct foo tmp;
...
private = &tmp;
}
[dereference private]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Expectation hashtable size was simply glued to a variable with no code
to rehash expectations, so it was a bug to allow writing to it.
Make "expect_hashsize" readonly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.
We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).
If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
conntrack, which is located in the data section, might be accidentally
freed when a new namespace is instantiated while the untracked conntrack
is attached to a skb because the reference count it re-initialized.
The best fix would be to use a seperate untracked conntrack per
namespace since it includes a namespace pointer. Unfortunately this is
not possible without larger changes since the namespace is not easily
available everywhere we need it. For now move the untracked conntrack
initialization to the init_net setup function to make sure the reference
count is not re-initialized and handle cleanup in the init_net cleanup
function to make sure namespaces can exit properly while the untracked
conntrack is in use in other namespaces.
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag':
net/ipv4/netfilter/nf_defrag_ipv4.c:62: error: implicit declaration of function 'nf_ct_is_template'
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Early on this was an experimental facility that few
people other than Alexey Kuznetsov played with.
Now it's a pretty fundamental thing and as people add
more features to AF_PACKET sockets this config options
creates ifdef spaghetti.
So kill it off.
Signed-off-by: David S. Miller <davem@davemloft.net>
The report descriptor is read by user space (via the Service
Discovery Protocol), so it is only available during the ioctl
to connect. However, the HID probe function that needs the
descriptor might not be called until a specific module is
loaded. Keep a copy of the descriptor so it is available for
later use.
Signed-off-by: Michael Poole <mdpoole@troilus.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the unused age_list member from the net_bridge
structure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GSO/checksum offload to af_packet sockets using
virtio_net_hdr. Based on Rusty's patch to add this support to tun.
It allows GSO/checksum offload to be enabled when using raw socket
backend with virtio_net.
Adds PACKET_VNET_HDR socket option to prepend virtio_net_hdr in the
receive path and process/skip virtio_net_hdr in the send path. This
option is only allowed with SOCK_RAW sockets attached to ethernet
type devices.
v2 updates
----------
Michael's Comments
- Perform length check in packet_snd() when GSO is off even when
vnet_hdr is present.
- Check for SKB_GSO_FCOE type and return -EINVAL
- don't allow tx/rx ring when vnet_hdr is enabled.
Herbert's Comments
- Removed ethernet specific code.
- protocol value is assumed to be passed in by the caller.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_IP_ROUTE_PERVASIVE is missing a corresponding config
IP_ROUTE_PERVASIVE somewhere in KConfig (and missing it for ages
already) so it looks like some aging artefact no longer needed.
Therefor this patch kills of the only remaining reference to that
config Item removing the already unrechable code snipet.
Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing try_to_freeze() to one of the pktgen_thread_worker() code
paths so that it doesn't block suspend/hibernation.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=15006
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Ciprian Dorin Craciun <ciprian.craciun@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>