Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
On 64 bit arches having efficient unaligned accesses (eg x86_64) we can
use long words to reduce number of instructions for free.
Joe Perches suggested to change ipv6_masked_addr_cmp() to return a bool
instead of 'int', to make sure ipv6_masked_addr_cmp() cannot be used
in a sorting function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer needed. TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics. Therefore there is no
longer any reason to pre-cow metrics in this way.
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintain a local hash table of TCP dynamic metrics blobs.
Computed TCP metrics are no longer maintained in the route metrics.
The table uses RCU and an extremely simple hash so that it has low
latency and low overhead. A simple hash is legitimate because we only
make metrics blobs for fully established connections.
Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking. But
the basic design seems sound.
With help from Eric Dumazet and Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.
We violated that expectation in the new fib_lookup() fast path.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some NFC chips will statically create and open pipes for both standard
and proprietary gates. The driver can now pass this information to HCI
such that HCI will not attempt to create and open them, but will instead
directly use the passed pipe ids.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This API should be used by drivers, HCI, SHDLC or NCI stacks to report an
unrecoverable error.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The management frame and remain-on-channel APIs will be
needed in the P2P device abstraction, so move them over
to the new wdev-based APIs. Userspace can still use both
the interface index and wdev identifier for them so it's
backward compatible, but for the P2P Device wdev it will
be able to use the wdev identifier only.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to support a P2P device abstraction and
Bluetooth high-speed AMPs, we need to have a way
to identify virtual interfaces that don't have a
netdev associated.
Do this by adding a NL80211_ATTR_WDEV attribute
to identify a wdev which may or may not also be
a netdev.
To simplify things, use a 64-bit value with the
high 32 bits being the wiphy index for this new
wdev identifier in the nl80211 API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This API call was intended to be used by drivers
if they want to optimize key handling by removing
one key when another is added. Remove it since no
driver is using it. If needed, it can always be
added back.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Hans reports that he's still hitting:
BUG: unable to handle kernel NULL pointer dereference at 000000000000027c
IP: [<ffffffff813615db>] netlink_has_listeners+0xb/0x60
PGD 0
Oops: 0000 [#3] PREEMPT SMP
CPU 0
It happens when adding a number of containers with do:
nfct_query(h, NFCT_Q_CREATE, ct);
and most likely one namespace shuts down.
this problem was supposed to be fixed by:
70e9942 netfilter: nf_conntrack: make event callback registration per-netns
Still, it was missing one rcu_access_pointer to check if the callback
is set or not.
Reported-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.
It's just pure overhead.
Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.
Also, move fib_num_tclassid_users into the ipv4 network namespace.
Signed-off-by: David S. Miller <davem@davemloft.net>
60g band uses different from .11n MCS scheme, so bitrate
should be calculated differently
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Until now, a u16 value was used to represent bitrate value.
With VHT bitrates this becomes too small.
Introduce a new 32-bit bitrate attribute. nl80211 will report
both the new and the old attribute, unless the bitrate doesn't
fit into the old u16 attribute in which case only the new one
will be reported.
User space tools encouraged to prefer the 32-bit attribute, if
available (since it won't be available on older kernels.)
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
[reword commit message and comments a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes for a simplified conversion away from dst_get_neighbour*().
All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().
Signed-off-by: David S. Miller <davem@davemloft.net>
Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).
Signed-off-by: David S. Miller <davem@davemloft.net>
When a dst_confirm() happens, mark the confirmation as pending in the
dst. Then on the next packet out, when we have the neigh in-hand, do
the update.
This removes the dependency in dst_confirm() of dst's having an
attached neigh.
While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL. So just fix those cases up.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
moving the corresponding protocol part to where it really belongs to.
To clarify, note that we follow two different approaches to support per-net
depending if it's built-in or run-time loadable protocol tracker.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Some drivers require setup before being able to send
management frames in managed mode, in particular in
multi-channel cases.
Introduce API to allow the drivers to do such setup
while being able to sleep waiting for the setup to
finish in the device. This isn't possible inside the
TX call since that can't sleep.
A future patch may also restructure the TX retry to
wait for the driver to report the frame status, as
suggested by Arik in
http://mid.gmane.org/CA+XVXffKSEL6ZQPQ98x-zO-NL2=TNF1uN==mprRyUmAaRn254g@mail.gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IEEE80211_TX_MAX_RATES can be reduced from 5 to 4 as there
is no current hardware supporting a rate chain with 5 multi
rate stages (mrr), so 4 mrr stages are sufficient.
The memory that is freed within the ieee80211_tx_info struct
will be used in the upcoming Transmission Power Control (TPC)
implementation.
Suggested-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The implementation of tx_frags is buggy due to
not handling queue stop, and there's no driver
implementing it so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add enumerations for both cfg80211 and nl80211.
This expands wiphy.bands etc. arrays.
Extend channel <-> frequency translation to cover 60g band
and modify the rate check logic since there are no legacy
mandatory rates (only MCS is used.)
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport. While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:
An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
etc.) to the same destination transport address from which it
received the DATA or control chunk to which it is replying. This
rule should also be followed if the endpoint is bundling DATA chunks
together with the reply chunk.
This patch seeks to correct that. It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack. By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack. This brings
us into stricter compliance with the RFC.
Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward. While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports. In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on. so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack. This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve debugging of hci_conn objects by: adding print to hci_conn
refcounting, adding object spcifier when missing, change conn to hcon
since conn is heavily used for l2cap_conn objects and this is misleading.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
We do not need it anymore since cfg80211 tracks
monitor channel and monitor channel type.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implements .set_monitor_enabled(wiphy, enabled).
Notifies driver upon change of interface layout.
If only monitor interfaces become present it is
called with 2nd argument being true. If
non-monitor interface appears then 2nd argument
is false. Driver is notified only upon change.
This makes it more obvious about the fact that
cfg80211 supports single monitor channel. Once we
implement multi-channel we don't want to allow
setting monitor channel while other interface
types are running. Otherwise it would be ambiguous
once we start considering num_different_channels.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IBSS may hop between channels. It is necessary to
account this special case when considering
interface combinations.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We need to know which channel is used by a running
AP and mesh for channel context accounting and
finding matching/active interface combination.
STA/IBSS have current_bss already which allows us
to check which channel a vif is tuned to.
Non-fixed channel IBSS can be handled with
additional changes.
Monitor mode is going to be handled differently.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.
We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.
Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.
Signed-off-by: David S. Miller <davem@davemloft.net>
At Facebook, we do Layer-3 DSR via IP-in-IP tunneling. Our load balancers wrap
an extra IP header on incoming packets so they can be routed to the backend.
In the v4 tunnel driver, when these packets fall on the default tunl0 device,
the behavior is to decapsulate them and drop them back on the stack. So our
setup is that tunl0 has the VIP and eth0 has (obviously) the backend's real
address.
In IPv6 we do the same thing, but the v6 tunnel driver didn't have this same
behavior - if you didn't have an explicit tunnel setup, it would drop the
packet.
This patch brings that v4 feature to the v6 driver.
The same IPv6 address checks are performed as with any normal tunnel,
but as the fallback tunnel endpoint addresses are unspecified, the checks
must be performed on a per-packet basis, rather than at tunnel
configuration time.
[Patch description modified by phil@ipom.com]
Signed-off-by: Ville Nuorvala <ville.nuorvala@gmail.com>
Tested-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checking for in_dev being NULL is pointless.
In fact, all of our callers have in_dev precomputed already,
so just pass it in and remove the NULL checking.
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NLMSG_GOODSIZE results in multiple pages being used as
nlmsg_new() will automatically add the size of the netlink
header to the payload thus exceeding the page limit.
NLMSG_DEFAULT_SIZE takes this into account.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit changes inet_csk_route_req() so that it uses a pointer to
a struct flowi6, rather than allocating its own on the stack. This
brings its behavior in line with its IPv4 cousin,
inet_csk_route_req(), and allows a follow-on patch to fix a dst leak.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow drivers to advertise their VHT capabilities
and export them to userspace via nl80211.
Signed-off-by: Mahesh Palivela <maheshp@posedge.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The specific destination is the host we direct unicast replies to.
Usually this is the original packet source address, but if we are
responding to a multicast or broadcast packet we have to use something
different.
Specifically we must use the source address we would use if we were to
send a packet to the unicast source of the original packet.
The routing cache precomputes this value, but we want to remove that
precomputation because it creates a hard dependency on the expensive
rpfilter source address validation which we'd like to make cheaper.
There are only three places where this matters:
1) ICMP replies.
2) pktinfo CMSG
3) IP options
Now there will be no real users of rt->rt_spec_dst and we can simply
remove it altogether.
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename it to ip_send_unicast_reply() and add explicit 'saddr'
argument.
This removed one of the few users of rt->rt_spec_dst.
Signed-off-by: David S. Miller <davem@davemloft.net>
This and ieee80211_add_ext_srates_ie() aren't
exported, so can't be used by drivers anyway,
but there's also no reason that they should be
so make them private to mac80211 and use sdata
instead of vif arguments.
Acked-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of using a fixed value of "-1" or "-EMSGSIZE", propagate what
the nla_*() interfaces actually return.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit c074da2810.
This change has several unwanted side effects:
1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
thus never create a real cached route.
2) All TCP traffic will use DST_NOCACHE and never use the routing
cache at all.
Signed-off-by: David S. Miller <davem@davemloft.net>
DDOS synflood attacks hit badly IP route cache.
On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.
rt_garbage_collect() triggers and tries to cleanup things.
Eventually route cache is disabled but machine is under fire and might
OOM and crash.
This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.
This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)
SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a cleanup.
It adds nf_ct_kfree_compat_sysctl_table to release l4proto's
compat sysctl table and set the compat sysctl table point to NULL.
This new function will be used by follow-up patches.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
l4proto->init contain quite redundant code. We can simplify this
by adding a new parameter l3proto.
This patch prepares that code simplification.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When CONFIG_PM is disabled, no device can possibly
support WoWLAN since it can't go to sleep to start
with. Due to this, mac80211 had even rejected the
hardware registration. By making all the code and
data for WoWLAN depend on CONFIG_PM we can promote
this runtime error to a compile-time error.
Add #ifdef around all WoWLAN code to remove it in
systems that don't need it as they never suspend.
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Every real 802.15.4 transceiver, which works with software MAC layer,
can be classified as a wpan device in this stack. So the wpan device
implementation provides missing link in datapath between the device
drivers and the Linux network queue.
According to the IEEE 802.15.4 standard each packet can be one of the
following types:
- beacon
- MAC layer command
- ACK
- data
This patch adds support for the data packet-type only, but this is
enough to perform data transmission and receiving over radio.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to pick up the serial port and tty changes in Linus's tree to allow
everyone to sync up.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Support configuring an RSSI threshold in dBm (s32) when requesting
scheduled scan, below which a BSS won't be reported by the cfg80211
driver.
Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove use of module parameters on caif hsi device, as
rtnl configuration parameters are already supported.
All caif hsi configuration data is put in cfhsi_config,
and default values in hsi_default_config.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove use of struct platform_device, and replace it with
struct cfhsi_ops. Updated variable names in the same
spirit:
cfhsi_get_dev to cfhsi_get_ops,
cfhsi->dev to cfhsi->ops and,
cfhsi->dev.drv to cfhsi->ops->cb_ops.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add RTNL support for managing the caif hsi interface.
The HSI HW interface is no longer registering as a device,
instead we use symbol_get to get hold of the HSI API.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing kernel doc for sk_rx_dst
Move sk_rx_dst to avoid two 32bit holes on 64bit arches
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With early demux enabled by default for TCP flows, there is high chance that
skb->sk will be non-null. 'unlikely()' was removed from __inet_lookup_skb() but
maybe it can be removed from skb_steal_sock() as well.
Note: skb_steal_sock() is also called by __inet6_lookup_skb() and
__udp4_lib_lookup_skb() but they are protected by their own 'unlikely' calls.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/qmi_wwan.c
net/batman-adv/translation-table.c
net/ipv6/route.c
qmi_wwan.c resolution provided by Bjørn Mork.
batman-adv conflict is dealing merely with the changes
of global function names to have a proper subsystem
prefix.
ipv6's route.c conflict is merely two side-by-side additions
of network namespace methods.
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a few things that make the logging and
debugging in mac80211 less useful than it should
be right now:
* a lot of messages should be pr_info, not pr_debug
* wholesale use of pr_debug makes it require *both*
Kconfig and dynamic configuration
* there are still a lot of ifdefs
* the style is very inconsistent, sometimes the
sdata->name is printed in front
Clean up everything, introducing new macros and
separating out the station MLME debugging into
a new Kconfig symbol.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Don't cache output dst for syncookies, as this adds pressure on IP route
cache and rcu subsystem for no gain.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is meant to add a control for disabling early socket demux.
The main motivation behind this patch is to provide an option to disable
the feature as it adds an additional cost to routing that reduces overall
throughput by up to 5%. For example one of my systems went from 12.1Mpps
to 11.6 after the early socket demux was added. It looks like the reason
for the regression is that we are now having to perform two lookups, first
the one for an established socket, and then the one for the routing table.
By adding this patch and toggling the value for ip_early_demux to 0 I am
able to get back to the 12.1Mpps I was previously seeing.
[ Move local variables in ip_rcv_finish() down into the basic
block in which they are actually used. -DaveM ]
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get current rssi (in dBm) from the driver/FW.
Instead of reporting the signal received in the last
rx packet, which might be inaccurate if rx traffic is
low and beacon filtering is enabled, get the signal
from the driver/FW.
Signed-off-by: Victor Goldenshtein <victorg@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Input packet processing for local sockets involves two major demuxes.
One for the route and one for the socket.
But we can optimize this down to one demux for certain kinds of local
sockets.
Currently we only do this for established TCP sockets, but it could
at least in theory be expanded to other kinds of connections.
If a TCP socket is established then it's identity is fully specified.
This means that whatever input route was used during the three-way
handshake must work equally well for the rest of the connection since
the keys will not change.
Once we move to established state, we cache the receive packet's input
route to use later.
Like the existing cached route in sk->sk_dst_cache used for output
packets, we have to check for route invalidations using dst->obsolete
and dst->ops->check().
Early demux occurs outside of a socket locked section, so when a route
invalidation occurs we defer the fixup of sk->sk_rx_dst until we are
actually inside of established state packet processing and thus have
the socket locked.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay
are just a straight arrays. Remove all unnecessary hash masking.
Document MAX_INET_PROTOS.
Use RAW_HTABLE_SIZE when appropriate.
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Linville says:
====================
This is a sizeable batch of updates intended for 3.6...
The bulk of the changes here are Bluetooth. Gustavo says:
Here goes the first Bluetooth pull request for 3.6, we have
queued quite a lot of work. Andrei Emeltchenko added the AMP
Manager code, a lot of work is needed, but the first bit are
already there. This code is disabled by default. Mat Martineau
changed the whole L2CAP ERTM state machine code, replacing
the old one with a new implementation. Besides that we had
lot of coding style fixes (to follow net rules), more l2cap
core separation from socket and many clean ups and fixed all
over the tree.
Along with the above, there is a healthy dose of ath9k, iwlwifi,
and other driver updates. There is also another pull from the
wireless tree to resolve some merge issues. I also fixed-up some
merge discrepencies between net-next and wireless-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ERROR: "nfqnl_ct_parse" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_seq_adjust" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_put" [net/netfilter/nfnetlink_queue.ko] undefined!
ERROR: "nfqnl_ct_get" [net/netfilter/nfnetlink_queue.ko] undefined!
We have to use CONFIG_NETFILTER_NETLINK_QUEUE_CT in
include/net/netfilter/nfnetlink_queue.h, not CONFIG_NF_CONNTRACK.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move AUTO_OFF_TIMEOUT to other constants changing name to
HCI_AUTO_OFF_TIMEOUT and convert to jiffies.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.
I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.
I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.
This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As defined in section 13.10.9.3 Case D (802.11-2012), this
control variable is used to limit the mesh STA to send only
one PREQ to a root mesh STA within this interval of time
(in TUs). The default value for this variable is set to
2000 TUs. However, for current implementation, the maximum
configurable of dot11MeshHWMPconfirmationInterval is
restricted by dot11MeshHWMPactivePathTimeout.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[line-break commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo says:
====================
This is the second batch of Netfilter updates for net-next. It contains the
kernel changes for the new user-space connection tracking helper
infrastructure.
More details on this infrastructure are provides here:
http://lwn.net/Articles/500196/
Still, I plan to provide some official documentation through the
conntrack-tools user manual on how to setup user-space utilities for this.
So far, it provides two helper in user-space, one for NFSv3 and another for
Oracle/SQLnet/TNS. Yet in my TODO list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix code style - place the asterisk where it belongs.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are good reasons to supports helpers in user-space instead:
* Rapid connection tracking helper development, as developing code
in user-space is usually faster.
* Reliability: A buggy helper does not crash the kernel. Moreover,
we can monitor the helper process and restart it in case of problems.
* Security: Avoid complex string matching and mangling in kernel-space
running in privileged mode. Going further, we can even think about
running user-space helpers as a non-root process.
* Extensibility: It allows the development of very specific helpers (most
likely non-standard proprietary protocols) that are very likely not to be
accepted for mainline inclusion in the form of kernel-space connection
tracking helpers.
This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).
I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.
Basic operation, in a few steps:
1) Register user-space helper by means of `nfct':
nfct helper add ftp inet tcp
[ It must be a valid existing helper supported by conntrack-tools ]
2) Add rules to enable the FTP user-space helper which is
used to track traffic going to TCP port 21.
For locally generated packets:
iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp
For non-locally generated packets:
iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp
3) Run the test conntrackd in helper mode (see example files under
doc/helper/conntrackd.conf
conntrackd
4) Generate FTP traffic going, if everything is OK, then conntrackd
should create expectations (you can check that with `conntrack':
conntrack -E expect
[NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.
The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.
With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:
1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.
There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables
By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch uses the new variable length conntrack extensions.
Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.
This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We can now define conntrack extensions of variable size. This
patch is useful to get rid of these unions:
union nf_conntrack_help
union nf_conntrack_proto
union nf_conntrack_nat_help
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch modifies the struct nf_conntrack_helper to allocate
the room for the helper name. The maximum length is 16 bytes
(this was already introduced in 2.6.24).
For the maximum length for expectation policy names, I have
also selected 16 bytes.
This patch is required by the follow-up patch to support
user-space connection tracking helpers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/ipv6/route.c
Pull in 'net' again to get the revert of Thomas's change
which introduced regressions.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2a0c451ade.
It causes crashes, because now ip6_null_entry is used before
it is initialized.
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/route.c
This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.
Signed-off-by: David S. Miller <davem@davemloft.net>
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.
This opens up a short time frame to access fib_table_hash with its pants
down.
fib6_init() as a whole can't be moved to an earlier position as it also
registers the rtnetlink message handlers which should be registered at
the end. Therefore split it into fib6_init() which is run early and
fib6_init_late() to register the rtnetlink message handlers.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
to handle the error pass the 32-bit info cookie in network byte order
whereas ipv4 passes it around in host byte order.
Like the ipv4 side, we have two helper functions. One for when we
have a socket context and one for when we do not.
ip6ip6 tunnels are not handled here, because they handle PMTU events
by essentially relaying another ICMP packet-too-big message back to
the original sender.
This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all
kinds of situations that simply cannot happen when we do the PMTU
update directly using a fully resolved route.
In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
likely be removed or changed into a BUG_ON() check. We should never
have a prefixed ipv6 route when we get there.
Another piece of strange history here is that TCP and DCCP, unlike in
ipv4, never invoke the update_pmtu() method from their ICMP error
handlers. This is incredibly astonishing since this is the context
where we have the most accurate context in which to make a PMTU
update, namely we have a fully connected socket and associated cached
socket route.
Signed-off-by: David S. Miller <davem@davemloft.net>
With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.
Create two helper functions to facilitate this.
1) ipv4_sk_update_pmtu()
This updates the PMTU when we have a socket context to
work with.
2) ipv4_update_pmtu()
Raw version, used when no socket context is available. For this
interface, we essentially just pass in explicit arguments for
the flow identity information we would have extracted from the
socket.
And you'll notice that ipv4_sk_update_pmtu() is simply implemented
in terms of ipv4_update_pmtu()
Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key(). This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like. Instead, we only want the route that
would get us to the node described by the outermost IP header.
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the mesh configuration parameters dot11MeshHWMProotInterval
and dot11MeshHWMPactivePathToRootTimeout to be used by
proactive PREQ mechanism.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
[line-break commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Conflicts:
MAINTAINERS
drivers/net/wireless/iwlwifi/pcie/trans.c
The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.
The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.
Signed-off-by: David S. Miller <davem@davemloft.net>
Change flags field to matches userspace structure.
This field needs to be converted to little endian before forward it.
Signed-off-by: Jefferson Delfes <jefferson.delfes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This also includes a switch to tty refcounting. It makes sure, the
code no longer can access a freed TTY struct.
Sometimes the only thing needed is to pass tty down to the callies.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use self->spinlock only for ctrl_skb and tx_skb. TTY stuff is now
protected by tty_port->lock. This is needed for further cleanup (and
conversion to tty_port helpers).
This also closes the race in the end of close.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switch to tty_port->flags. And while at it, remove redefined flags for
them.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
And use close/open_wait from there.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping. This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.
When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit. In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.
Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.
If we want to save skb->queue_mapping into skb->cb[], best place is to
add a field in struct qdisc_skb_cb, to make sure it wont conflict with
other layers (eg : Qdiscc, Infiniband...)
This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8
bytes :
netem qdisc for example assumes it can store an u64 in it, without
misalignment penalty.
Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[].
The largest user is CHOKe and it fills it.
Based on a previous patch from Tom Herbert.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Tom Herbert <therbert@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HCI constants are always used in form of jiffies. So just
include the conversion from msecs in the define itself. This has the
advantage of making the code where the timeout is used more readable
and avoiding unnecessary conversions.
The patch is similar to commit ba13ccd9 doing the same job for L2CAP
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
If no peer actually gets attached (either because create is zero or
the peer allocation fails) we'll trigger a BUG because we
unconditionally do an rt{,6}_peer_ptr() afterwards.
Fix this by guarding it with the proper check.
Signed-off-by: David S. Miller <davem@davemloft.net>
We handle NULL in rt{,6}_set_peer but then our caller will try to pass
that NULL pointer into inet_putpeer() which isn't ready for it.
Fix this by moving the NULL check one level up, and then remove the
now unnecessary NULL check from inetpeer_ptr_set_peer().
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implementation can deal with having many inetpeer roots, which is
a necessary prerequisite for per-FIB table rooted peer tables.
Each family (AF_INET, AF_INET6) has a sequence number which we bump
when we get a family invalidation request.
Each peer lookup cheaply checks whether the flush sequence of the
root we are using is out of date, and if so flushes it and updates
the sequence number.
Signed-off-by: David S. Miller <davem@davemloft.net>
There is zero point to this function.
It's only real substance is to perform an extremely outdated BSD4.2
ICMP check, which we can safely remove. If you really have a MTU
limited link being routed by a BSD4.2 derived system, here's a nickel
go buy yourself a real router.
The other actions of ip_rt_frag_needed(), checking and conditionally
updating the peer, are done by the per-protocol handlers of the ICMP
event.
TCP, UDP, et al. have a handler which will receive this event and
transmit it back into the associated route via dst_ops->update_pmtu().
This simplification is important, because it eliminates the one place
where we do not have a proper route context in which to make an
inetpeer lookup.
Signed-off-by: David S. Miller <davem@davemloft.net>
We encode the pointer(s) into an unsigned long with one state bit.
The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.
Later the peer roots will be per-FIB table, and this change works to
facilitate that.
Signed-off-by: David S. Miller <davem@davemloft.net>
fix the coding style related to mesh parameters, especially the indentation,
as pointed out by Johannes Berg.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the missing kernel-doc for mesh configuration parameters as pointed
out by Johannes Berg.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If I build with W=1, for every file that includes <net/route.h>, I get the warning
include/net/route.h: In function 'ip_route_output':
include/net/route.h:135:3: warning: initialized field overwritten [-Woverride-init]
include/net/route.h:135:3: warning: (near initialization for 'fl4') [-Woverride-init]
(This is with "gcc (Debian 4.6.3-1) 4.6.3")
A fix seems pretty trivial: move the initialization of .flowi4_tos
earlier. As far as I can tell, this has no effect on code generation.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only need one interface for this operation, since we always know
which inetpeer root we want to flush.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since it's guarenteed that we will access the inetpeer if we're trying
to do timewait recycling and TCP options were enabled on the
connection, just cache the peer in the timewait socket.
In the future, inetpeer lookups will be context dependent (per routing
realm), and this helps facilitate that as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a few kernel-doc descriptions that were missed
during development.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The get_peer method TCP uses is full of special cases that make no
sense accommodating, and it also gets in the way of doing more
reasonable things here.
First of all, if the socket doesn't have a usable cached route, there
is no sense in trying to optimize timewait recycling.
Likewise for the case where we have IP options, such as SRR enabled,
that make the IP header destination address (and thus the destination
address of the route key) differ from that of the connection's
destination address.
Just return a NULL peer in these cases, and thus we're also able to
get rid of the clumsy inetpeer release logic.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a lot of places that open-code rt{,6}_get_peer() only because
they want to set 'create' to one. So add an rt{,6}_get_peer_create()
for their sake.
There were also a few spots open-coding plain rt{,6}_get_peer() and
those are transformed here as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
With LE/SMP the completion of a security level elavation from medium to
high is indicated by a HCI Encryption Key Refresh Complete event. The
necessary behavior upon receiving this event is a mix of what's done for
auth_complete and encryption_change, which is also where most of the
event handling code has been copied from.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
/proc/net/unix has quadratic behavior, and can hold unix_table_lock for
a while if high number of unix sockets are alive. (90 ms for 200k
sockets...)
We already have a hash table, so its quite easy to use it.
Problem is unbound sockets are still hashed in a single hash slot
(unix_socket_table[UNIX_HASH_TABLE])
This patch also spreads unbound sockets to 256 hash slots, to speedup
both /proc/net/unix and unix_diag.
Time to read /proc/net/unix with 200k unix sockets :
(time dd if=/proc/net/unix of=/dev/null bs=4k)
before : 520 secs
after : 2 secs
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.
and modify some places to provide net for inet_getpeer_v[4,6]
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
now inetpeer doesn't support namespace,the information will
be leaking across namespace.
this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.
add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.
and change family_to_base and inet_getpeer to support namespace.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds namespace support for cttimeout.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since the sysctl data for l[3|4]proto now resides in pernet nf_proto_net.
We can now remove this unused fields from struct nf_contrack_l[3,4]proto.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds namespace support for ICMPv6 protocol tracker.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds namespace support for ICMP protocol tracker.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds namespace support for UDP protocol tracker.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds namespace support for TCP protocol tracker.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds namespace support for the generic layer 4 protocol
tracker.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch prepares the namespace support for layer 3 protocol trackers.
Basically, this modifies the following interfaces:
* nf_ct_l3proto_[un]register_sysctl.
* nf_conntrack_l3proto_[un]register.
We add a new nf_ct_l3proto_net is used to get the pernet data of l3proto.
This adds rhe new struct nf_ip_net that is used to store the sysctl header
and l3proto_ipv4,l4proto_tcp(6),l4proto_udp(6),l4proto_icmp(v6) because the
protos such tcp and tcp6 use the same data,so making nf_ip_net as a field
of netns_ct is the easiest way to manager it.
This patch also adds init_net to struct nf_conntrack_l3proto to initial
the layer 3 protocol pernet data.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch prepares the namespace support for layer 4 protocol trackers.
Basically, this modifies the following interfaces:
* nf_ct_[un]register_sysctl
* nf_conntrack_l4proto_[un]register
to include the namespace parameter. We still use init_net in this patch
to prepare the ground for follow-up patches for each layer 4 protocol
tracker.
We add a new net_id field to struct nf_conntrack_l4proto that is used
to store the pernet_operations id for each layer 4 protocol tracker.
Note that AF_INET6's protocols do not need to do sysctl compat. Thus,
we only register compat sysctl when l4proto.l3proto != AF_INET6.
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Redesign all the off-channel code, getting rid of
the generic off-channel work concept, replacing
it with a simple remain-on-channel list.
This fixes a number of small issues with the ROC
implementation:
* offloaded remain-on-channel couldn't be queued,
now we can queue it as well, if needed
* in iwlwifi (the only user) offloaded ROC is
mutually exclusive with scanning, use the new
queue to handle that case -- I expect that it
will later depend on a HW flag
The bigger issue though is that there's a bad bug
in the current implementation: if we get a mgmt
TX request while HW roc is active, and this new
request has a wait time, we actually schedule a
software ROC instead since we can't guarantee the
existing offloaded ROC will still be that long.
To fix this, the queuing mechanism was needed.
The queuing mechanism for offloaded ROC isn't yet
optimal, ideally we should add API to have the HW
extend the ROC if needed. We could add that later
but for now use a software implementation.
Overall, this unifies the behaviour between the
offloaded and software-implemented case as much
as possible.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The IDLE handling in HW off-channel is broken right
now since we turn off IDLE only when the off-channel
period already started. Therefore, all drivers that
use it today (only iwlwifi!) must support off-channel
while idle, so playing with idle isn't needed at all.
Off-channel in general, since it's no longer used for
authentication/association, shouldn't affect PS, so
also remove that logic.
Also document a small caveat for reporting TX status
from off-channel frames in HW remain-on-channel.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that we've removed all uses of the set_channel
API except for the monitor channel and in libertas,
clarify this. Split the libertas mesh use into a
new libertas_set_mesh_channel() operation, just to
keep backward compatibility, and rename the normal
set_channel() to set_monitor_channel().
Also describe the desired set_monitor_channel()
semantics more clearly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
commit 5faa5df1fa (inetpeer: Invalidate the inetpeer tree along with
the routing cache) added a race :
Before freeing an inetpeer, we must respect a RCU grace period, and make
sure no user will attempt to increase refcnt.
inetpeer_invalidate_tree() waits for a RCU grace period before inserting
inetpeer tree into gc_list and waking the worker. At that time, no
concurrent lookup can find a inetpeer in this tree.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like the AP mode patch, instead of setting
the channel and then joining the mesh network,
provide the channel to join the network on to
the join_mesh() function.
Like in AP mode, you can also give the channel
to the join-mesh nl80211 command now.
Unlike AP mode, it picks a default channel if
none was given.
As libertas uses mesh mode interfaces but has
no join_mesh callback and we can't simply break
it, keep some compatibility code for that case
and configure the channel directly for it.
In the non-libertas case, where we store the
channel until join, allow setting it while the
interface is down.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Instead of setting the channel first and then
starting the AP, let cfg80211 store the channel
and provide it as one of the AP settings.
This means that now you have to set the channel
before you can start an AP interface, but since
hostapd/wpa_supplicant always do that we're OK
with this change.
Alternatively, it's now possible to give the
channel as an attribute to the start-ap nl80211
command, overriding any preset channel.
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change cfg80211_can_beacon_sec_chan() to return true
if there is no secondary channel to simplify all the
current users of it. They all check the channel type
before calling the function because it returns false
if there's no secondary channel.
Also actually document the return value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ieee80211_get_operstate() was used by drivers in order to
know whether the sta link is up, but it's no longer needed
(nor used) as mac80211 notifies the drivers about
authorization changes (via the sta_state callback)
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Simplify the use of #ifdef CONFIG_MAC80211_IBSS_DEBUG/#endif
by adding a logging macro to encapsulate the test.
Convert the appropriate uses too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Simplify the use of #ifdef CONFIG_MAC80211_HT_DEBUG/#endif
by adding a logging macro to encapsulate the test.
Convert the appropriate uses too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Low level drivers can now set certain netdev feature bits in
netdev_features member of the ieee80211_hw struct. These will be
propagated to every netdev created from this HW.
The white-listed features currently include only ones related to HW
checksumming.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A2MP doesn't use part of the L2CAP chan ops API so we just create general
empty function instead of the A2MP specific one.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch renames L2CAP_LE_DEFAULT_MTU macro to L2CAP_LE_MIN_MTU
since it represents the minimum MTU value, not the default MTU
value for LE.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Those are not used anywhere in code (and never were since introduction
in 2006) so just remove them.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
A2MP fixed channel do not have sk
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
A2MP status codes copied from Bluez patch sent by Peter Krystad
<pkrystad@codeaurora.org>.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Helper function to build and send A2MP messages.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Define AMP Manager and some basic functions.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This move socket specific code to l2cap_sock.c.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This remove a bit more of socket code from l2cap core, this calls set the
SOCK_ZAPPED and do some clean up depending on the socket state.
Reported-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Use chan instead of void * makes more sense here.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add HCI commands to deal with Bluetooth AMP controllers.
Those commands will be used by bluetooth and softamp code.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Define assigned Protocol and Service Multiplexor (PSM) identifiers
and use them instead of magic numbers.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Define Continuation flag which the only flag used from Flags field
in L2CAP Configuration Request and Response.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Most of the include were unnecessary or already included by some other
header.
Replace module.h by export.h where possible.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Fix all warning and errors reported by checkpatch but license trailing
whitespace and bdaddr_t definition.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Remove magic number with defined link key size.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
HCI_QUIRK_NO_RESET name is misleading - purpose of this quirk is to
reset device on close instead of init, not to not reset at all.
Rename it to HCI_QUIRK_RESET_ON_CLOSE to avoid confusion.
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that l2cap_ctrl is used to set up control fields, these macros are
not needed.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
The ERTM specification requires the retransmit timer to be cancelled
when the monitor timer is set. The retransmit timer cannot be set
again while the monitor timer is pending.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This deletes the receive code that had handlers for each frame type at
the top level, and then had logic to determine the receive state
within each handler.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This fixes a regression from commit
2ead70b839 that is present in all
kernels starting at v3.0.
When L2CAP information was moved to struct l2cap_chan, a check was
added to l2cap_chan_del to avoid certain cleanup operations when ERTM
or streaming mode had not yet been initialized. The logic in the
check did not take in to account that chan->conf_state is set to 0 in
l2cap_chan_ready, so l2cap_chan_del failed to cancel timers and leaked
memory any time the ERTM queues or lists were not empty.
This change makes sure that l2cap_chan_del only returns early if
ERTM initialization was not performed.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This routine will be called by drivers whenever they receive data in target
mode. This should be unexpected events and as such should be handled by a
standalone API (i.e. not as a callback pointer from an existing API).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Userspace gets a netlink event upon target mode activation.
The LLCP layer is also signaled when we get an ATR_REQ in order to get
the remote general bytes.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system
receives a CIPSO tagged packet it is dropped (cipso_v4_validate()
returns non-zero). In most cases this is the correct and desired
behavior, however, in the case where we are simply forwarding the
traffic, e.g. acting as a network bridge, this becomes a problem.
This patch fixes the forwarding problem by providing the basic CIPSO
validation code directly in ip_options_compile() without the need for
the NetLabel or CIPSO code. The new validation code can not perform
any of the CIPSO option label/value verification that
cipso_v4_validate() does, but it can verify the basic CIPSO option
format.
The behavior when NetLabel is enabled is unchanged.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking changes from David S. Miller:
1) Fix IPSEC header length calculation for transport mode in ESP. The
issue is whether to do the calculation before or after alignment.
Fix from Benjamin Poirier.
2) Fix regression in IPV6 IPSEC fragment length calculations, from Gao
Feng. This is another transport vs tunnel mode issue.
3) Handle AF_UNSPEC connect()s properly in L2TP to avoid OOPSes. Fix
from James Chapman.
4) Fix USB ASIX driver's reception of full sized VLAN packets, from
Eric Dumazet.
5) Allow drop monitor (and, more generically, all generic netlink
protocols) to be automatically loaded as a module. From Neil
Horman.
Fix up trivial conflict in Documentation/feature-removal-schedule.txt
due to new entries added next to each other at the end. As usual.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
net/smsc911x: Repair broken failure paths
virtio-net: remove useless disable on freeze
netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG
drop_monitor: Add module alias to enable automatic module loading
genetlink: Build a generic netlink family module alias
net: add MODULE_ALIAS_NET_PF_PROTO_NAME
r6040: Do a Proper deinit at errorpath and also when driver unloads (calling r6040_remove_one)
r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
skb: avoid unnecessary reallocations in __skb_cow
net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens
asix: allow full size 8021Q frames to be received
rds_rdma: don't assume infiniband device is PCI
l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case
mac80211: fix ADDBA declined after suspend with wowlan
wlcore: fix undefined symbols when CONFIG_PM is not defined
mac80211: fix flag check for QoS NOACK frames
ath9k_hw: apply internal regulator settings on AR933x
ath9k_hw: update AR933x initvals to fix issues with high power devices
ath9k: fix a use-after-free-bug when ath_tx_setup_buffer() fails
ath9k: stop rx dma before stopping tx
...
We call the destroy function when a cgroup starts to be removed, such as
by a rmdir event.
However, because of our reference counters, some objects are still
inflight. Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference, some
objects will still have charges, but the code to properly uncharge them
won't be run.
This becomes a problem specially if it is ever enabled again, because now
new charges will be added to the staled charges making keeping it pretty
much impossible.
We just need to be careful with the static branch activation: since there
is no particular preferred order of their activation, we need to make sure
that we only start using it after all call sites are active. This is
achieved by having a per-memcg flag that is only updated after
static_key_slow_inc() returns. At this time, we are sure all sites are
active.
This is made per-memcg, not global, for a reason: it also has the effect
of making socket accounting more consistent. The first memcg to be
limited will trigger static_key() activation, therefore, accounting. But
all the others will then be accounted no matter what. After this patch,
only limited memcgs will have its sockets accounted.
[akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
document enum sock_flag_bits,
convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
redo tcp_update_limit() comment to 80 cols]
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit ad0081e43a
"ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
the fragment of packets is incorrect.
because tunnel mode needs IPsec headers and trailer for all fragments,
while on transport mode it is sufficient to add the headers to the
first fragment and the trailer to the last.
so modify mtu and maxfraglen base on ipsec mode and if fragment is first
or last.
with my test,it work well(every fragment's size is the mtu)
and does not trigger slow fragment path.
Changes from v1:
though optimization, mtu_prev and maxfraglen_prev can be delete.
replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
add fuction ip6_append_data_mtu to make codes clearer.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull more networking updates from David Miller:
"Ok, everything from here on out will be bug fixes."
1) One final sync of wireless and bluetooth stuff from John Linville.
These changes have all been in his tree for more than a week, and
therefore have had the necessary -next exposure. John was just away
on a trip and didn't have a change to send the pull request until a
day or two ago.
2) Put back some defines in user exposed header file areas that were
removed during the tokenring purge. From Stephen Hemminger and Paul
Gortmaker.
3) A bug fix for UDP hash table allocation got lost in the pile due to
one of those "you got it.. no I've got it.." situations. :-)
From Tim Bird.
4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
try to coalesce overlapping frags and crash. Fix from Eric Dumazet.
5) RCU routing table lookups can race with free_fib_info(), causing
crashes when we deref the device pointers in the route. Fix by
releasing the net device in the RCU callback. From Yanmin Zhang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
tcp: take care of overlaps in tcp_try_coalesce()
ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
mm: add a low limit to alloc_large_system_hash
ipx: restore token ring define to include/linux/ipx.h
if: restore token ring ARP type to header
xen: do not disable netfront in dom0
phy/micrel: Fix ID of KSZ9021
mISDN: Add X-Tensions USB ISDN TA XC-525
gianfar:don't add FCB length to hard_header_len
Bluetooth: Report proper error number in disconnection
Bluetooth: Create flags for bt_sk()
Bluetooth: report the right security level in getsockopt
Bluetooth: Lock the L2CAP channel when sending
Bluetooth: Restore locking semantics when looking up L2CAP channels
Bluetooth: Fix a redundant and problematic incoming MTU check
Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
Bluetooth: Fix EIR data generation for mgmt_device_found
Bluetooth: Fix Inquiry with RSSI event mask
Bluetooth: improve readability of l2cap_seq_list code
Bluetooth: Fix skb length calculation
...
Pull cgroup updates from Tejun Heo:
"cgroup file type addition / removal is updated so that file types are
added and removed instead of individual files so that dynamic file
type addition / removal can be implemented by cgroup and used by
controllers. blkio controller changes which will come through block
tree are dependent on this. Other changes include res_counter cleanup
and disallowing kthread / PF_THREAD_BOUND threads to be attached to
non-root cgroups.
There's a reported bug with the file type addition / removal handling
which can lead to oops on cgroup umount. The issue is being looked
into. It shouldn't cause problems for most setups and isn't a
security concern."
Fix up trivial conflict in Documentation/feature-removal-schedule.txt
* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
res_counter: Account max_usage when calling res_counter_charge_nofail()
res_counter: Merge res_counter_charge and res_counter_charge_nofail
cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads
cgroup: remove cgroup_subsys->populate()
cgroup: get rid of populate for memcg
cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
cgroup: make css->refcnt clearing on cgroup removal optional
cgroup: use negative bias on css->refcnt to block css_tryget()
cgroup: implement cgroup_rm_cftypes()
cgroup: introduce struct cfent
cgroup: relocate __d_cgrp() and __d_cft()
cgroup: remove cgroup_add_file[s]()
cgroup: convert memcg controller to the new cftype interface
memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP
cgroup: convert all non-memcg controllers to the new cftype interface
cgroup: relocate cftype and cgroup_subsys definitions in controllers
cgroup: merge cft_release_agent cftype array into the base files array
cgroup: implement cgroup_add_cftypes() and friends
cgroup: build list of all cgroups under a given cgroupfs_root
cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()
...
Mostly bool conversions, some inline removals and const additions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_opt_accepted() returns a bool, and can use const pointers
ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- match() method returns a boolean
- return (A && B && C && D) -> return A && B && C && D
- fix indentation
Signed-off-by: Eric Dumazet <edumazet@google.com>
Enable dynamic debugging and remove a bunch of #ifdef/#endifs.
Add a lapb_dbg(level, fmt, ...) macro and replace the
printk(KERN_DEBUG uses.
Add pr_fmt and remove embedded prefixes.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bool conversions where possible.
__inline__ -> inline
space cleanups
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bool/const conversions where possible
__inline__ -> inline
space cleanups
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- sock_flag() accepts a const pointer
- sock_flag() returns a boolean
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
codel_should_drop() logic allows a packet being not dropped if queue
size is under max packet size.
In fq_codel, we have two possible backlogs : The qdisc global one, and
the flow local one.
The meaningful one for codel_should_drop() should be the global backlog,
not the per flow one, so that thin flows can have a non zero drop/mark
probability.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for monitor device intended to capture all the network activity.
This interface could be used by networks sniffers and is already
supported by WireShark. That's a good test point to check that basic
MAC support works.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This stack implementation distinguishes several types of slave
interfaces. Another parameter to 'add_iface_' function is added
to clarify the interface type is going to be registered.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According IEEE 802.15.4 standard each node can be either full functionality
device (FFD) or reduce functionality device (RFD). So 2 sets of operations
are needed. This patch declare RFD operations structure.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Main RX data path implementation between physical and mac layers.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An interface to allocate and register ieee802154 compatible device.
The allocated device has the following representation in memory:
+-----------------------+
| struct wpan_phy |
+-----------------------+
| struct mac802154_priv |
+-----------------------+
| driver's private data |
+-----------------------+
Used by device drivers to register new instance in the stack.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IEEE 802.15.4 Working Group focuses on the standardization of the
bottom two layers of ISO/OSI protocol stack: Physical (PHY) and MAC.
The MAC layer provides access control to a shared channel and reliable
data delivery. The main functions performed by the MAC sublayer are:
association and disassociation, security control, optional star
network topology functions, such as beacon generation and Guaranteed
Time Slots (GTSs) management, generation of ACK frames (if used), and,
finally, application support for the two possible network topologies
described in the standard.
This is an initial commit which describes main data structures needed
for ieee802.15.4 compatible devices representation in the MAC layer.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
defer_setup and suspended are now flags into bt_sk().
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The ERTM and streaming mode transmit queue must only be accessed while
the L2CAP channel lock is held. Locking the channel before calling
l2cap_chan_send ensures that multiple threads cannot simultaneously
manipulate the queue when sending and receiving concurrently.
L2CAP channel locking had previously moved to the l2cap_chan struct
instead of the associated socket, so some of the old socket locking
can also be removed in this patch.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
The mgmt_device_found function expects to receive only the significant
part of the EIR data so it needs to be removed before calling the
function. This patch adds a new eir_get_length() helper function to
calculate the length of the significant part.
Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
It should return bool, not int. The function even
does return true/false.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a flag for the HT format (mixed vs. greenfield)
to allow drivers to report that on receive. Not all
drivers will do that though, so allow drivers to set
which radiotap MCS details they report.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add IV-room in skb also for TKIP and WEP.
Extend patch: "mac80211: support adding IV-room in the skb for CCMP keys"
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We are going to delete the Token ring support. This removes any
special processing in the core networking for token ring, (aside
from net/tr.c itself), leaving the drivers and remaining tokenring
support present but inert.
The mass removal of the drivers and net/tr.c will be in a separate
commit, so that the history of these files that we still care
about won't have the giant deletion tied into their history.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The NFC core code already does that for them.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is now specified that nfc_target_found() and nfc_target_lost() core
functions must not be called from an atomic context. This allow us to
serialize calls and protect the targets table using the nfc device lock
instead of a spinlock.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The NFC Core now caches the active nfc target pointer, thereby avoiding
the need to lookup the target table for each invocation of a driver ops.
Consequently, pn533, HCI and NCI now directly receive an nfc_target
pointer instead of a target index.
Cc: Ilan Elias <ilane@ti.com>
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
David pointed out gcc might generate poor code with 31bit fields.
Using u16 is more than enough and permits a better code output.
Also make the code intent more readable using constants, fixed point arithmetic
not being trivial for everybody.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It fixes L2CAP socket based security level elevation during a
connection. The HID profile needs this (for keyboards) and it is the only
way to achieve the security level elevation when using the management
interface to talk to the kernel (hence the management enabling patch
being the one that exposes this issue).
It enables the userspace a security level change when the socket is
already connected and create a way to notify the socket the result of the
request. At the moment of the request the socket is made non writable, if
the request fails the connections closes, otherwise the socket is made
writable again, POLL_OUT is emmited.
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As Van pointed out, interval/sqrt(count) can be implemented using
multiplies only.
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
This patch implements the Newton method and reciprocal divide.
Total cost is 15 cycles instead of 120 on my Corei5 machine (64bit
kernel).
There is a small 'error' for count values < 5, but we don't really care.
I reuse a hole in struct codel_vars :
- pack the dropping boolean into one bit
- use 31bit to store the reciprocal value of sqrt(count).
Suggested-by: Van Jacobson <van@pollere.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
http://queue.acm.org/detail.cfm?id=2209336
This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.
As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :
target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)
Based on initial work from Dave Taht.
Refactored to help future codel inclusion as a plugin for other linux
qdisc (FQ_CODEL, ...), like RED.
include/net/codel.h contains codel algorithm as close as possible than
Kathleen reference.
net/sched/sch_codel.c contains the linux qdisc specific glue.
Separate structures permit a memory efficient implementation of fq_codel
(to be sent as a separate work) : Each flow has its own struct
codel_vars.
timestamps are taken at enqueue() time with 1024 ns precision, allowing
a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
usec as base unit.
Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.
Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
tg3 drivers (BQL enabled).
Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
[ interval TIME ] [ ecn ]
qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
maxpacket 1514 ecn_mark 84399 drop_overlimit 0
CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.
One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It actually works on the input queue and will use its read mem
routines, thus it's better to have in in the tcp_input.c file.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dst_check() will take care of SA (and obsolete field), hence
IPsec rekeying scenario is taken into account.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yaseivch <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use more common code for ERTM and streaming mode segmentation and
transmission, and begin using skb control block data for delaying
extended or enhanced header generation until just before the packet is
transmitted. This code is also better suited for resegmentation,
which is needed when L2CAP links are reconfigured after an AMP channel
move.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Reviewed-by: Ulisses Furquim <ulisses@profusion.mobi>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
The Bluetooth Low Energy support so far was disabled by default via
a module parameter. With this change the module parameter will be removed
and Low Energy is enabled by default.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
No one is using hci_le_ltk_neg_reply() in bluetooth subsystem.
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
In this API, we were using sizeof operator for an array
given as function argument, which is invalid.
However this API is not used anywhere.
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
These values are now in the nested l2cap_ctrl struct.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Instead of using modular division, the offset can be calculated using
only addition and subtraction. The previous calculation did not work
as intended and was more difficult to understand, involving unsigned
integer underflow and a check for a negative value where one was not
possible.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
User-space pass the remote device address type to kernel through
struct sockaddr_l2 what makes the advertising useless. This patch
removes all advertising cache code.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In order to establish a LE connection we need the address type
information. User-space already pass this information to kernel
through struct sockaddr_l2.
This patch adds the dst_type parameter to l2cap_chan_connect so we
are able to pass the address type info from user-space down to
hci_conn layer.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the dst_type parameter to hci_connect function.
Instead of searching the address type in advertising cache, we
use the dst_type parameter to establish LE connections.
The dst_type is ignored for BR/EDR connection establishment.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the helper function bdaddr_to_le to hci_core, so it
can be used in mgmt.c and hci_conn.c.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the address type info to struct sockaddr_l2 so
user-space can inform the remote device address type required
to establish LE connections.
Soon, instead of looking the advertising cache up to discover the
address type, we'll use this address type info to establish LE
connections.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves address type macros to bluetooth.h since they will be
used by management interface and Bluetooth socket interface. It also
replaces the macro prefix MGMT_ADDR_ by BDADDR_.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
No one is using strtoba() in the bluetooth subsystem.
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
A sequence list is a data structure used to track frames that need to
be retransmitted, and frames that have been requested for
retransmission by the remote device. It can compactly represent a
list of sequence numbers within the ERTM transmit window. Memory for
the list is allocated once at connection time, and common operations
in ERTM are O(1).
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Some parameters in L2CAP chan are set to default similar way in
socket based channels and A2MP channels. Adds common function which
sets all defaults.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
This patch removes the MGMT_ADDR_INVALID macro. If the address type
isn't LE, we consider it is BR/EDR type.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Following the separation if core and sock code this change avoid
manipulation of sk inside l2cap_chan_create().
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Every field from ERTM control headers is now carried in the control
block so it only has to be parsed or generated once, and can be
efficiently accessed throughout the ERTM code.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Adds some missing values for control field parsing, additional data
for the new state machine, and enumerations for states, incoming
packet classification, and state machine events.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
This patch adds the HCI_PERIODIC_INQ flag to dev_flags. This flag
tracks if periodic inquiry is enabled or not.
Signed-off-by: Andre Guedes <aguedespe@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
This patch adds a handler function to Periodic Inquiry command
complete event.
Signed-off-by: Andre Guedes <aguedespe@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
This patch adds to hci_core the hci_cancel_le_scan function which
should be used to cancel an ongoing LE scan.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
ediv is already in little endian order.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Device ID details need to be programmed into the kernel for every
controller at least once. So provide management command for this.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Device ID information can be provided via Extended Inquiry Data
as well. If a valid source is present, then include it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The Inquiry Response TX power tag should be added to the Extended
Inquiry Data (EIR) as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
We initialize the "struct device" in hci_alloc_dev() for a long time now
so we can access hdev->dev.parent directly. Hence, we can drop the
temporary field hdev->parent which is used in no other place than
hci_add_sysfs().
SET_HCIDEV_DEV() is never called after registering a device by the
drivers so we do not overwrite internal device-state. Furthermore,
hdev->dev is initialized to 0 by kzalloc() inside hci_alloc_dev() so the
default behavior with dev.parent = NULL is kept.
Signed-off-by: David Herrmann <dh.herrmann@googlemail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Correct type warnings reported by sparse to show that this
functions takes ediv argument in __le16 format.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Keep lmp_subver in host byte order. We have following conversion
in hci_cc_read_local_version:
hdev->lmp_subver = __le16_to_cpu(rp->lmp_subver);
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch introduces a new mesh configuration parameter "ht_opmode" and will
allow user to check the current HT protection mode selected. Users could
configure the protection mode by the command "iw mesh_iface set mesh_param
mesh_ht_protection_mode=2". The default protection mode of mesh is set to
non-HT mixed mode.
Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Reviewed-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds hooks to call into the driver to get additional
stats for the ethtool API.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.
The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.
Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.
Make sure the backup socks do not block on reading.
Special thanks to Aleksey Chudov for helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.
sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.
sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.
Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.
Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.
As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.
Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,
Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.
As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).
Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.
Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.
Thanks to Pablo Neira Ayuso for his valuable feedback!
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]
Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.
There are good reasons to stop supporting automatic helper
assignment, for further information, please read:
http://www.netfilter.org/news.html#2012-04-03
This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
drivers/net/ethernet/intel/e1000e/param.c
drivers/net/wireless/iwlwifi/iwl-agn-rx.c
drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
drivers/net/wireless/iwlwifi/iwl-trans.h
Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell. In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.
In e1000e a conflict arose in the validation code for settings of
adapter->itr. 'net-next' had more sophisticated logic so that
logic was used.
Signed-off-by: David S. Miller <davem@davemloft.net>
It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.
Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Perry Lorier <perryl@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Wilmer van der Gaast <wilmer@google.com>
Cc: Ankur Jain <jankur@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Dave Täht <dave.taht@bufferbloat.net>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().
Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()
This gives good results especially if GRO could not kick, and if skb
head is a fragment.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed. When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.
While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf
Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.
ER is enabled by sysctl_tcp_early_retrans:
0: Disables ER
1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.
2: (Default) reduce dupthresh like mode 1. In addition, delay
entering fast recovery by RTT/4.
Note: mode 2 is implemented in the third part of this patch series.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change order of init so netns init is ready
when register ioctl and netlink.
Ver2
Whitespace fixes and __init added.
Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572
When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.
RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.
The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).
This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.
After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.
With help from David S. Miller
Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: set fake_rtable's dst to NULL to avoid kernel Oops
when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.
v4 reformat, change br_drop_fake_rtable(skb) to {}
v3 enrich commit header
v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.
[ Use "do { } while (0)" for nop br_drop_fake_rtable()
implementation -DaveM ]
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Huang <peter.huangpeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")
The former moved around the sysctl register/unregister calls, the
later simply removed them.
With help from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.
No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
> include/net/ax25.h:447:75: error: expected ';' before '}' token
>
> static inline int ax25_register_dev_sysctl(ax25_dev *ax25_dev) { return 0 };
> static inline void ax25_unregister_dev_sysctl(ax25_dev *ax25_dev) {};
>
> First function: move ';' inside braces.
> Second function: drop the ';'.
Put the semicolons where it makes sense.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
>
> ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined!
> ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined!
>
> when CONFIG_SYSCTL is not enabled.
Add static inline stub functions to gracefully handle the case when sysctl
support is not present.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_ave_rssi need to be declare as export for driver to use it.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This commit moves the (substantial) common code shared between
tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family
independent function, tcp_init_sock().
Centralizing this functionality should help avoid drift issues,
e.g. where the IPv4 side is updated without a corresponding update to
IPv6. There was already some drift: IPv4 initialized snd_cwnd to
TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to
2 (in this case it should not matter, since snd_cwnd is also
initialized in tcp_init_metrics(), but the general risks and
maintenance overhead remain).
When diffing the old and new code, note that new tcp_init_sock()
function uses the order of steps from the tcp_v4_init_sock()
implementation (the order is slightly different in
tcp_v6_init_sock()).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes (according the the previous description):
* TCP_REPAIR sockoption
This one just puts the socket in/out of the repair mode.
Allowed for CAP_NET_ADMIN and for closed/establised sockets only.
When repair mode is turned off and the socket happens to be in
the established state the window probe is sent to the peer to
'unlock' the connection.
* TCP_REPAIR_QUEUE sockoption
This one sets the queue which we're about to repair. The
'no-queue' is set by default.
* TCP_QUEUE_SEQ socoption
Sets the write_seq/rcv_nxt of a selected repaired queue.
Allowed for TCP_CLOSE-d sockets only. When the socket changes
its state the other seq-s are changed by the kernel according
to the protocol rules (most of the existing code is actually
reused).
* Ability to forcibly bind a socket to a port
The sk->sk_reuse is set to SK_FORCE_REUSE.
* Immediate connect modification
The connect syscall initializes the connection, then directly jumps
to the code which finalizes it.
* Silent close modification
The close just aborts the connection (similar to SO_LINGER with 0
time) but without sending any FIN/RST-s to peer.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just the preparation patch, which makes the needed for
TCP repair code ready for use.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't use struct ctl_path anymore so delete the exported constants.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sysctl core no longer natively understands sysctl tables
with .child entries.
Split the ipv6_table to remove the .child entries.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't register/unregister every ax25 table in a batch. Instead register
and unregister per device ax25 sysctls as ax25 devices come and go.
This moves ax25 to be a completely modern sysctl user. Registering the
sysctls in just the initial network namespace, removing the use of
.child entries that are no longer natively supported by the sysctl core
and taking advantage of the fact that there are no longer any ordering
constraints between registering and unregistering different sysctl
tables.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sysctl no longer requires explicit creation of directories. The neigh
directory is always populated with at least a default entry so this
won't cause any user visible changes.
Delete the ipv4_path and the ipv4_skeleton these are no longer needed.
Directly register the ipv4_route_table.
And since I am an idiot remove the header definitions that I should
have removed in the previous patch.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_sysctl_rotable never caught on as an interesting way to
register sysctls. My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace. What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.
That is a very silly way to go. Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.
The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf
I really don't expect anyone will miss them if they can't read them in a
child user namespace.
CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough. So to avoid races call net_sysctl_init from sock_init().
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now all of the networking sysctl registrations are running in a
compatibiity mode. The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table. Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.
Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.
I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built. Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warning in net/sock.h:
Warning(include/net/sock.h:377): No description found for parameter 'sk_peek_off'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functionally, this change is a NOP.
Semantically, rt6_clean_expires() wants to do rt->dst.from = NULL instead of
rt->dst.expires = 0. It is clearing the RTF_EXPIRES flag, so the union is going
to be treated as a pointer (dst.from) not a long (dst.expires).
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1716a961 (ipv6: fix problem with expired dst cache) broke PMTU
discovery. rt6_update_expires() calls dst_set_expires(), which only updates
dst->expires if it has not been set previously (expires == 0) or if the new
expires is earlier than the current dst->expires.
rt6_update_expires() needs to zero rt->dst.expires, otherwise it will contain
ivalid data left over from rt->dst.from and will confuse dst_set_expires().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add utility function to provide the average rssi per vif
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Commit b82d1bb4 inadvertendly placed unrelated new code between
TCPCB_EVER_RETRANS and TCPCB_RETRANS and the other macros that refer
to the sacked field in the struct tcp_skb_cb (probably because there
was a misleading empty line there). This commit fixes up the
formatting so that all macros related to the sacked field are adjacent
again.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My grand plan to allow drivers to gradually move over
to advertising virtual interface combinations and only
enforce with drivers that do want it enforced doesn't
seem to be working out, only Christian ever added the
advertising (to carl9170), nobody else did.
Begin enforcing combinations in cfg80211 so that users
can rely on the information reported about a device.
Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Nick Kossifidis <mickflemm@gmail.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Gertjan van Wingerde <gwingerde@gmail.com>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If a key is non persistent then it should not be used in future
connections but it should be kept for current connection. And it
should be removed when connecion is removed.
Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch changes the return type of function hci_persistent_key
from int to bool because it makes more sense to return information
whether a key is persistent or not as a bool.
Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Conflicts:
drivers/net/ethernet/atheros/atlx/atl1.c
drivers/net/ethernet/atheros/atlx/atl1.h
Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.
We achieve this by adding a relaxation parameter to
inet_csk_bind_conflict. When 'relax' parameter is off
we return a conflict whenever the current searched
pair (addr, port) is not unique.
This tries to address the problems reported in patch:
8d238b25b1
Revert "tcp: bind() fix when many ports are bound"
Tests where ran for creating and binding(0) many sockets
on 100 IPs. The results are, on average:
* 60000 sockets, 600 ports / IP:
* 0.210 s, 620 (IP, port) duplicates without patch
* 0.219 s, no duplicates with patch
* 100000 sockets, 1000 ports / IP:
* 0.371 s, 1720 duplicates without patch
* 0.373 s, no duplicates with patch
* 200000 sockets, 2000 ports / IP:
* 0.766 s, 6900 duplicates without patch
* 0.768 s, no duplicates with patch
* 500000 sockets, 5000 ports / IP:
* 2.227 s, 41500 duplicates without patch
* 2.284 s, no duplicates with patch
Signed-off-by: Alex Copot <alex.mihai.c@gmail.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updates some comments to track RFC6298
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes hw.conf.channel usage from the following functions:
* ieee80211_mandatory_rates
* ieee80211_sta_get_rates
* ieee80211_frame_duration
* ieee80211_rts_duration
* ieee80211_ctstoself_duration
This is in preparation for multi-channel operation.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If the current channel is known, add frequency and channel type to
NL80211_CMD_GET_INTERFACE.
Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
net/core/rtnetlink.c: In function ‘rtnl_create_link’:
net/core/rtnetlink.c:1645:3: warning: passing argument 2 of ‘ops->get_tx_queues’ from incompatible pointer type [enabled by default]
net/core/rtnetlink.c:1645:3: note: expected ‘const struct nlattr **’ but argument is of type ‘struct nlattr **’
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh_table_init_no_netlink() is only used in net/core/neighbour.c file.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most machines dont use UDP encapsulation (L2TP)
Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a
test if L2TP never setup the encap_rcv on a socket.
Idea of this patch came after Simon Horman proposal to add a hook on TCP
as well.
If static_key is not yet enabled, the fast path does a single JMP .
When static_key is enabled, JMP destination is patched to reach the real
encap_type/encap_rcv logic, possibly adding cache misses.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix spelling and references in rtnetlink.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change get_tx_queues, drop unsused arg/return value real_tx_queues,
and use return by value (with error) rather than call by reference.
Probably bonding should just change to LLTX and the whole get_tx_queues
API could disappear!
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.
Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.
rt6_check_expired check if rt6_info.dst.from is expired.
ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
ip6_dst_destroy release the ort.
Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.
Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.
Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement aggregation algorithm, combining more data into a single
HSI transfer. 4 different traffic categories are supported:
1. TC_PRIO_CONTROL .. TC_PRIO_MAX (CTL)
2. TC_PRIO_INTERACTIVE (VO)
3. TC_PRIO_INTERACTIVE_BULK (VI)
4. TC_PRIO_BESTEFFORT, TC_PRIO_BULK, TC_PRIO_FILLER (BEBK)
Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set traffic class for CAIF packets, based on socket
priority, CAIF protocol type, or type of message.
Traffic class mapping for different packet types:
- control: TC_PRIO_CONTROL;
- flow control: TC_PRIO_CONTROL;
- at: TC_PRIO_CONTROL;
- rfm: TC_PRIO_INTERACTIVE_BULK;
- other sockets: equals to socket's TC;
- network data: no change.
Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As specified in RFC6106, DNSSL option contains one or more domain names
of DNS suffixes. 8-bit identifier of the DNSSL option type as assigned
by the IANA is 31. This option should also be treated as userland.
Signed-off-by: Alexey I. Froloff <raorn@raorn.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some HW/drivers get notifications when a tag moves out of the radio field.
This notification is now forwarded to user space through netlink.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The target index can be used by userspace to uniquely identify a target
and thus should be kept unique, per NFC adapter. Moreover, some protocols
do not provide a logical index when discovering new targets, so we have to
generate one for them.
For NCI or pn533 to fetch their logical index, we added a logical_idx field
to the target structure.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Most NFC HCI chipsets actually use a simplified HDLC link layer to
carry HCI payloads.
This implementation registers itself as an HCI device on behalf of the
NFC driver.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is an implementation of ETSI TS 102 622 specification.
Many NFC chipsets use HCI as the host <-> target protocol on top of a
serial link like i2c.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>