TCP listener refactoring, part 5 :
We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.
This patch includes the needed struct sock_common in front
of struct request_sock
This means there is no more inet6_request_sock IPv6 specific
structure.
Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.
loc_port -> ir_loc_port
loc_addr -> ir_loc_addr
rmt_addr -> ir_rmt_addr
rmt_port -> ir_rmt_port
iif -> ir_iif
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_IPV6=n is still a valid choice ;)
It appears we can remove dead code.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 4 :
To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common
Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.
Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).
inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.
inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.
We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 3 :
Our goal is to hash SYN_RECV sockets into main ehash for fast lookup,
and parallel SYN processing.
Current inet_ehash_bucket contains two chains, one for ESTABLISH (and
friend states) sockets, another for TIME_WAIT sockets only.
As the hash table is sized to get at most one socket per bucket, it
makes little sense to have separate twchain, as it makes the lookup
slightly more complicated, and doubles hash table memory usage.
If we make sure all socket types have the lookup keys at the same
offsets, we can use a generic and faster lookup. It turns out TIME_WAIT
and ESTABLISHED sockets already have common lookup fields for IPv4.
[ INET_TW_MATCH() is no longer needed ]
I'll provide a follow-up to factorize IPv6 lookup as well, to remove
INET6_TW_MATCH()
This way, SYN_RECV pseudo sockets will be supported the same.
A new sock_gen_put() helper is added, doing either a sock_put() or
inet_twsk_put() [ and will support SYN_RECV later ].
Note this helper should only be called in real slow path, when rcu
lookup found a socket that was moved to another identity (freed/reused
immediately), but could eventually be used in other contexts, like
sock_edemux()
Before patch :
dmesg | grep "TCP established"
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
After patch :
TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
include/linux/netdevice.h
net/core/sock.c
Trivial merge issues.
Removal of "extern" for functions declaration in netdevice.h
at the same time "const" was added to an argument.
Two parallel line additions in net/core/sock.c
Signed-off-by: David S. Miller <davem@davemloft.net>
The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received. This has introduced a performance regression for some
UDP workloads.
This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.
Benchmark results from a netperf UDP_RR test:
Before 89789.68 transactions/s
After 90587.62 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.63us RTT
After 12.48us RTT
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.
For UDP multicast we can only cache the dst if there is only one
receiving socket on the host. Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.
For UDP unicast we only demux sockets with an exact match in order to
not break forwarding setups. Additionally since the hash chains may be
long we only check the first socket to see if it is a match and not
waste extra time searching the whole chain when we might not find an
exact match.
Benchmark results from a netperf UDP_RR test:
Before 87961.22 transactions/s
After 89789.68 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.97us RTT
After 12.63us RTT
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
drivers/net/wireless/rtlwifi/rtl8188ee/phy.h
drivers/net/wireless/rtlwifi/rtl8192ce/phy.h
drivers/net/wireless/rtlwifi/rtl8192de/phy.h
drivers/net/wireless/rtlwifi/rtl8723ae/phy.h
Just some minor conflicts between the wireless-next changes
and Joe Perches's "extern" removal from function prototypes
in header files.
John W. Linville says:
====================
Regarding the Bluetooth bits, Gustavo says:
"The big work here is from Marcel and Johan. They did a lot of work
in the L2CAP, HCI and MGMT layers. The most important ones are the
addition of a new MGMT command to enable/disable LE advertisement
and the introduction of the HCI user channel to allow applications
to get directly and exclusive access to Bluetooth devices."
As to the ath10k bits, Kalle says:
"Bartosz dropped support for qca98xx hw1.0 hardware from ath10k, it's
just too much to support it. Michal added support for the new firmware
interface. Marek fixed WEP in AP and IBSS mode. Rest of the changes are
minor fixes or cleanups."
And also:
"Major changes are:
* throughput improvements including aligning the RX frames correctly and
optimising HTT layer (Michal)
* remove qca98xx hw1.0 support (Bartosz)
* add support for firmware version 999.999.0.636 (Michal)
* firmware htt statistics support (Kalle)
* fix WEP in AP and IBSS mode (Marek)
* fix a mutex unlock balance in debugfs file (Shafi)
And of course there's a lot of smaller fixes and cleanup."
For the wl12xx bits, Luca says:
"Here are some patches intended for 3.13. Eliad is upstreaming a bunch
of patches that have been pending in the internal tree. Mostly bugfixes
and other small improvements."
Along with that...
Arend and friends bring us a batch of brcmfmac updates, Larry Finger
offers some rtlwifi refactoring, and Sujith sends the usual batch of
ath9k updates. As usual, there are a number of other small updates
from a variety of players as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the unreg_list and the close_list in dev_close_many preventing
dev_close_many from permuting the unreg_list. The permutations of the
unreg_list have resulted in cases where the loopback device is accessed
it has been freed in code such as dst_ifdown. Resulting in subtle memory
corruption.
This is the second bug from sharing the storage between the close_list
and the unreg_list. The issues that crop up with sharing are
apparently too subtle to show up in normal testing or usage, so let's
forget about being clever and use two separate lists.
v2: Make all callers pass in a close_list to dev_close_many
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio wants to pass in cpumask_of(cpu), make parameter
const to avoid build warnings.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter updates for your net-next tree,
mostly ipset improvements and enhancements features, they are:
* Don't call ip_nest_end needlessly in the error path from me, suggested
by Pablo Neira Ayuso, from Jozsef Kadlecsik.
* Fixed sparse warnings about shadowed variable and missing rcu annotation
and fix of "may be used uninitialized" warnings, also from Jozsef.
* Renamed simple macro names to avoid namespace issues, reported by David
Laight, again from Jozsef.
* Use fix sized type for timeout in the extension part, and cosmetic
ordering of matches and targets separatedly in xt_set.c, from Jozsef.
* Support package fragments for IPv4 protos without ports from Anders K.
Pedersen. For example this allows a hash:ip,port ipset containing the
entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN
tunnels to/from the host. Without this patch only the first package
fragment (with fragment offset 0) was matched.
* Introduced a new operation to get both setname and family, from Jozsef.
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating an error message to the user.
* Reworked extensions support in ipset types from Jozsef. The approach of
defining structures with all variations is not manageable as the
number of extensions grows. Therefore a blob for the extensions is
introduced, somewhat similar to conntrack. The support of extensions
which need a per data destroy function is added as well.
* When an element timed out in a list:set type of set, the garbage
collector skipped the checking of the next element. So the purging
was delayed to the next run of the gc, fixed by Jozsef.
* A small Kconfig fix: NETFILTER_NETLINK cannot be selected and
ipset requires it.
* hash:net,net type from Oliver Smith. The type provides the ability to
store pairs of subnets in a set.
* Comment for ipset entries from Oliver Smith. This makes possible to
annotate entries in a set with comments, for example:
ipset n foo hash:net,net comment
ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B"
* Fix of hash types resizing with comment extension from Jozsef.
* Fix of new extensions for list:set type when an element is added
into a slot from where another element was pushed away from Jozsef.
* Introduction of a common function for the listing of the element
extensions from Jozsef.
* Net namespace support for ipset from Vitaly Lavrov.
* hash:net,port,net type from Oliver Smith, which makes possible
to store the triples of two subnets and a protocol, port pair in
a set.
* Get xt_TCPMSS working with net namespace, by Gao feng.
* Use the proper net netnamespace to allocate skbs, also by Gao feng.
* A couple of cleanups for the conntrack SIP helper, by Holger
Eitzenberger.
* Extend cttimeout to allow setting default conntrack timeouts via
nfnetlink, so we can get rid of all our sysctl/proc interfaces in
the future for timeout tuning, from me.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on tcp listener refactoring, I found that it
would really make things easier if sock_common could include
the IPv6 addresses needed in the lookups, instead of doing
very complex games to get their values (depending on sock
being SYN_RECV, ESTABLISHED, TIME_WAIT)
For this to happen, I need to be sure that tcp6_timewait_sock
and tcp_timewait_sock consume same number of cache lines.
This is possible if we only use 32bits for tw_ttd, as we remove
one 32bit hole in inet_timewait_sock
inet_tw_time_stamp() is defined and used, even if its current
implementation looks like tcp_time_stamp : We might need finer
resolution for tcp_time_stamp in the future.
Before patch : sizeof(struct tcp6_timewait_sock) = 0xc8
After patch : sizeof(struct tcp6_timewait_sock) = 0xc0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);
Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.
The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out the code that extracts the ports from skb_flow_dissect and
add a new function skb_flow_get_ports which can be re-used.
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 2 :
We can use a generic lookup, sockets being in whatever state, if
we are sure all relevant fields are at the same place in all socket
types (ESTABLISH, TIME_WAIT, SYN_RECV)
This patch removes these macros :
inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
And adds :
sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
Then, INET_TW_MATCH() is really the same than INET_MATCH()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And thus we have only one function definition
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jamal sent patch to add tc user simple actions to iproute2
but required header was not being exported.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a function to provide the phy address which should be used to the
Gigabit Ethernet driver connected to ssb.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Slave Page Response Timeout event indicates to the Host that a
slave page response timeout has occurred in the BR/EDR Controller.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 110
"7.7.72 Slave Page Response Timeout Event [New Section]
...
Note: this event will be generated if the slave BR/EDR Controller
responds to a page but does not receive the master FHS packet
(see Baseband, Section 8.3.3) within pagerespTO.
Event Parameters: NONE"
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Synchronization Train Complete event indicates that the Start
Synchronization Train command has completed.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 103
"7.7.67 Synchronization Train Complete Event [New Section]
...
Event Parameters:
Status 0x00 Start Synchronization Train command completed
successfully.
0x01-0xFF Start Synchronization Train command failed.
See Part D, Error Codes, for error codes and
descriptions."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Start_Synchronization_Train command controls the Synchronization
Train functionality in the BR/EDR Controller.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 86
"7.1.51 Start Synchronization Train Command [New Section]
...
If connectionless slave broadcast mode is not enabled, the Command
Disallowed (0x0C) error code shall be returned. After receiving this
command and returning a Command Status event, the Baseband starts
attempting to send synchronization train packets containing information
related to the enabled Connectionless Slave Broadcast packet timing.
Note: The AFH_Channel_Map used in the synchronization train packets is
configured by the Set_AFH_Channel_Classification command and the local
channel classification in the BR/EDR Controller.
The synchronization train packets will be sent using the parameters
specified by the latest Write_Synchronization_Train_Parameters command.
The Synchronization Train will continue until synchronization_trainTO
slots (as specified in the last Write_Synchronization_Train command)
have passed or until the Host disables the Connectionless Slave Broadcast
logical transport."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
he Set_Connectionless_Slave_Broadcast command controls the
Connectionless Slave Broadcast functionality in the BR/EDR
Controller.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 78
"7.1.49 Set Connectionless Slave Broadcast Command [New Section]
...
The LT_ADDR indicated in the Set_Connectionless_Slave_Broadcast shall be
pre-allocated using the HCI_Set_Reserved_LT_ADDR command. If the
LT_ADDR has not been reserved, the Unknown Connection Identifier (0x02)
error code shall be returned. If the controller is unable to reserve
sufficient bandwidth for the requested activity, the Connection Rejected
Due to Limited Resources (0x0D) error code shall be returned.
The LPO_Allowed parameter informs the BR/EDR Controller whether it is
allowed to sleep.
The Packet_Type parameter specifies which packet types are allowed. The
Host shall either enable BR packet types only, or shall enable EDR and DM1
packet types only.
The Interval_Min and Interval_Max parameters specify the range from which
the BR/EDR Controller must select the Connectionless Slave Broadcast
Interval. The selected Interval is returned."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Write_Synchronization_Train_Parameters command configures
the Synchronization Train functionality in the BR/EDR Controller.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 97
"7.3.90 Write Synchronization Train Parameters Command [New Section]
...
Note: The AFH_Channel_Map used in the Synchronization Train packets is
configured by the Set_AFH_Channel_Classification command and the local
channel classification in the BR/EDR Controller.
Interval_Min and Interval_Max specify the allowed range of
Sync_Train_Interval. Refer to [Vol. 2], Part B, section 2.7.2 for
a detailed description of Sync_Train_Interval. The BR/EDR Controller shall
select an interval from this range and return it in Sync_Train_Interval.
If the Controller is unable to select a value from this range, it shall
return the Invalid HCI Command Parameters (0x12) error code.
Once started (via the Start_Synchronization_Train Command) the
Synchronization Train will continue until synchronization_trainTO slots have
passed or Connectionless Slave Broadcast has been disabled."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Set_Connectionless_Slave_Broadcast_Data command provides the
ability for the Host to set Connectionless Slave Broadcast data in
the BR/EDR Controller.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 93
"7.3.88 Set Connectionless Slave Broadcast Data Command [New Section]
...
If connectionless slave broadcast mode is disabled, this data shall be
kept by the BR/EDR Controller and used once connectionless slave broadcast
mode is enabled. If connectionless slave broadcast mode is enabled,
and this command is successful, this data will be sent starting with
the next Connectionless Slave Broadcast instant.
The Data_Length field may be zero, in which case no data needs to be
provided.
The Host may fragment the data using the Fragment field in the command. If
the combined length of the fragments exceeds the capacity of the largest
allowed packet size specified in the Set Connectionless Slave Broadcast
command, all fragments associated with the data being assembled shall be
discarded and the Invalid HCI Command Parameters error (0x12) shall be
returned."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Delete_Reserved_LT_ADDR command requests that the BR/EDR
Controller cancel the reservation for a specific LT_ADDR reserved for the
purposes of Connectionless Slave Broadcast.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 92
"7.3.87 Delete Reserved LT_ADDR Command [New Section]
...
If the LT_ADDR indicated in the LT_ADDR parameter is not reserved by the
BR/EDR Controller, it shall return the Unknown Connection Identifier (0x02)
error code.
If connectionless slave broadcast mode is still active, then the Controller
shall return the Command Disallowed (0x0C) error code."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Set_Reserved_LT_ADDR command allows the host to request that the
BR/EDR Controller reserve a specific LT_ADDR for Connectionless Slave
Broadcast.
The Core Spec Addendum 4 adds this command in part B Connectionless
Slave Broadcast.
Bluetooth Core Specification Addendum 4 - Page 90
"7.3.86 Set Reserved LT_ADDR Command [New Section]
...
If the LT_ADDR indicated in the LT_ADDR parameter is already in use by the
BR/EDR Controller, it shall return the ACL Connection Already Exists (0x0B)
error code. If the LT_ADDR indicated in the LT_ADDR parameter is out of
range, the controller shall return the Invalid HCI Command Parameters (0x12)
error code. If the command succeeds, then the reserved LT_ADDR shall be
used when issuing subsequent Set Connectionless Slave Broadcast Data and
Set Connectionless Slave Broadcast commands.
To ensure that the reserved LT_ADDR is not already allocated, it is
recommended that this command be issued at some point after HCI_Reset is
issued but before page scanning is enabled or paging is initiated."
Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com>
Signed-off-by: C S Bhargava <cs.bhargava@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Limit the current implementation to a single channel context used by
a single vif, thereby avoiding multi-vif/channel complexities.
Reuse the main function from AP CSA code, but move a portion out in
order to fit the STA scenario.
Add a new mac80211 HW flag so we don't break devices that don't support
channel switch with channel-contexts. The new behavior will be opt-in.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On dual-mode BR/EDR/LE and LE only controllers it is possible
to configure a random address. There are two types or random
addresses, one is static and the other private. Since the
random private addresses require special privacy feature to
be supported, the configuration of these two are kept separate.
This command allows for setting the static random address. It is
only supported on controllers with LE support. The static random
address is suppose to be valid for the lifetime of the controller
or at least until the next power cycle. To ensure such behavior,
setting of the address is limited to when the controller is
powered off.
The special BDADDR_ANY address (00:00:00:00:00:00) can be used to
disable the static address. This is also the default value.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This patch introduces a new mgmt command for enabling/disabling BR/EDR
functionality. This can be convenient when one wants to make a dual-mode
controller behave like a single-mode one. The command is only available
for dual-mode controllers and requires that LE is enabled before using
it. The BR/EDR setting can be enabled at any point, however disabling it
requires the controller to be powered off (otherwise a "rejected"
response will be sent).
Disabling the BR/EDR setting will automatically disable all other BR/EDR
related settings.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To allow treating dual-mode (BR/EDR/LE) controllers as single-mode ones
(LE-only) we want to introduce a new HCI_BREDR_ENABLED flag to track
whether BR/EDR is enabled or not (previously we simply looked at the
feature bit with lmp_bredr_enabled).
This patch add the new flag and updates the relevant places to test
against it instead of using lmp_bredr_enabled. The flag is by default
enabled when registering an adapter and only cleared if necessary once
the local features have been read during the HCI init procedure.
We cannot completely block BR/EDR usage in case user space uses raw HCI
sockets but the patch tries to block this in places where possible, such
as the various BR/EDR specific ioctls.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Hiding the Bluetooth high speed support behind a module parameter is
not really useful. This can be enabled and disabled at runtime via
the management interface. This also has the advantage that this can
now be changed per controller and not just global.
This patch removes the module parameter and exposes the high speed
setting of the management interface to all controllers.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The BDADDR_LOCAL is a relict from userspace and has never been used
within the kernel. So remove that constant and replace it with a new
BDADDR_NONE that is similar to HCI_DEV_NONE with all bits set.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/usb/qmi_wwan.c
drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
include/net/netfilter/nf_conntrack_synproxy.h
include/net/secure_seq.h
The conflicts are of two varieties:
1) Conflicts with Joe Perches's 'extern' removal from header file
function declarations. Usually it's an argument signature change
or a function being added/removed. The resolutions are trivial.
2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
a new value, another changes an existing value. That sort of
thing.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking changes from David Miller:
1) Multiply in netfilter IPVS can overflow when calculating destination
weight. From Simon Kirby.
2) Use after free fixes in IPVS from Julian Anastasov.
3) SFC driver bug fixes from Daniel Pieczko.
4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.
5) Locking and encapsulation fixes to serial line CAN driver, from
Andrew Naujoks.
6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
Eilon Greenstein, and Ariel Elior.
7) In lapb, if no other packets are outstanding, T1 timeouts actually
stall things and no packet gets sent. Fix from Josselin Costanzi.
8) ICMP redirects should not make it to the socket error queues, from
Duan Jiong.
9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.
10) Fix setting of VLAN priority field on via-rhine driver, from Roget
Luethi.
11) Fix TX stalls and VLAN promisc programming in be2net driver from
Ajit Khaparde.
12) Packet padding doesn't get handled correctly in new usbnet SG
support code, from Ming Lei.
13) Fix races in netdevice teardown wrt. network namespace closing.
From Eric W. Biederman.
14) Fix potential missed initialization of net_secret if not TCP
connections are openned. From Eric Dumazet.
15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
Aleksander Morgado.
16) skb_cow_head() can change skb->data and thus packet header pointers,
don't use stale ip_hdr reference in ip_tunnel code.
17) Backend state transition handling fixes in xen-netback, from Paul
Durrant.
18) Packet offset for AH protocol is handled wrong in flow dissector,
from Eric Dumazet.
19) Taking down an fq packet scheduler instance can leave stale packets
in the queues, fix from Eric Dumazet.
20) Fix performance regressions introduced by TCP Small Queues. From
Eric Dumazet.
21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
Hannes Frederic Sowa.
22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
reference to the ipv4/ipv6 specific network device state, so use the
reference put that will check and release the object if the
reference hits zero. From Salam Noureddine.
23) Fix memory corruption in ip_tunnel driver, and use skb_push()
instead of __skb_push() so that similar bugs are less hard to find.
From Steffen Klassert.
24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
Nicolas Dichtel.
25) fq scheduler doesn't accurately rate limit in certain circumstances,
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
pkt_sched: fq: rate limiting improvements
ip6tnl: allow to use rtnl ops on fb tunnel
sit: allow to use rtnl ops on fb tunnel
ip_tunnel: Remove double unregister of the fallback device
ip_tunnel_core: Change __skb_push back to skb_push
ip_tunnel: Add fallback tunnels to the hash lists
ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
qlcnic: Fix SR-IOV configuration
ll_temac: Reset dma descriptors indexes on ndo_open
skbuff: size of hole is wrong in a comment
ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
ethernet: moxa: fix incorrect placement of __initdata tag
ipv6: gre: correct calculation of max_headroom
powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
bonding: Fix broken promiscuity reference counting issue
tcp: TSQ can use a dynamic limit
dm9601: fix IFF_ALLMULTI handling
pkt_sched: fq: qdisc dismantle fixes
...
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:
* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
Patrick McHardy.
* Fix possible weight overflow in lblc and lblcr schedulers due to
32-bits arithmetics, from Simon Kirby.
* Fix possible memory access race in the lblc and lblcr schedulers,
introduced when it was converted to use RCU, two patches from
Julian Anastasov.
* Fix hard dependency on CPU 0 when reading per-cpu stats in the
rate estimator, from Julian Anastasov.
* Fix race that may lead to object use after release, when invoking
ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Default timeouts are currently set via proc/sysctl interface, the
typical pattern is a file name like:
/proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE
This results in one entry per default protocol state timeout.
This patch simplifies this by allowing to set default protocol
timeouts via cttimeout netlink interface.
This should allow us to get rid of the existing proc/sysctl code
in the midterm.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are currently seven different NAT hooks used in both
nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in
nf_conntrack_sip, then set from the nf_nat_sip NAT helper.
And because each of them is exported there is quite some overhead
introduced due of this.
By introducing nf_nat_sip_hooks I am able to reduce both text/data
somewhat. For nf_conntrack_sip e. g. I get
text data bss dec
old 15243 5256 32 20531
new 15010 5192 32 20234
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some devices may not be able to report A-MSDUs in
single buffers. Drivers for such devices were
forced to re-assemble A-MSDUs which would then
be eventually disassembled by mac80211. This could
lead to CPU cache thrashing and poor performance.
Since A-MSDU has a single sequence number all
subframes share it. This was in conflict with
retransmission/duplication recovery
(IEEE802.11-2012: 9.3.2.10).
Patch introduces a new flag that is meant to be
set for all individually reported A-MSDU subframes
except the last one. This ensures the
last_seq_ctrl is updated after the last subframe
is processed. If an A-MSDU is actually a duplicate
transmission all reported subframes will be
properly discarded.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[johannes: add braces that were missing even before]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can be useful for drivers if they have any failure cases
when joining an IBSS. Also move setting the queue parameters
to before this new call, in case the new driver op needs them
already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
VHT_CAP_BEAMFORMER_ANTENNAS cap is actually defined in the draft as
VHT_CAP_BEAMFORMEE_STS_MAX, and its size is 3 bits long.
VHT_CAP_SOUNDING_DIMENSIONS is also 3 bits long.
Fix the definitions and change the cap masking accordingly.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since commit c93bdd0e03 ("netvm: allow skb allocation to use PFMEMALLOC
reserves"), hole size is one bit less than what is written in the comment.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x makes a dangerous use of skb_is_gso_v6().
It should first make sure skb is a gso packet
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Warning(include/net/sock.h:411): No description found for parameter
'sk_max_pacing_rate'
Lets please "make htmldocs" and kbuild bot.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c
v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next
v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root filesystem
- Fix a memory ordering issue in the pNFS files layout driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSSfNNAAoJEGcL54qWCgDybGYQAJGm4/vd7/rWZ49KIjGFGkFo
sCt0UOK6Y6ALhUOIlIreXsQ+Iwn9aAoIIRgx8UwnB+hO6PGnSyFuJZZx1KE8V2kj
6JlE5FbsWV+3uFQzNJQsNcoj7NZMzIRZT7x+7QansBOdSQjgQc3ig2sAMWREZjn8
GxMOl8FNRrnP8gRom30ZScgMp1YDM8J1ql80S/nbxh2NOLBsvgg9VapzJhhqkMyl
b7WKX4Qbg4AeSaxIAIrIwcZ7L2YS09JGC40VSybQARs0/7J8fjOZPs7CmrUCoB5F
DmT5vfEC4+dqDf8PMyoFVfxK5ua5Sb/FGQmagYYa8bSgY7Uq03akYI++co+4PZU1
f3SN6CSvVffzGMdXAhUupOZQbkKvKFxR2MTGy8s7dxdkQudd4RioYPDmLfCHlbmb
VY5kFh/Duqso1FCrcfvZoC88ElrWUz5yoVzZyECOEwCs1wjI6bjmGdSqCSbU75Lm
Z0XOAn1cStwFvGwCbGZPUzlvueji3coDdCFPBXAOFHzisLYoo/Lxenw7l5D1qM5b
02iZllcIo340vw8wxHZxVebecFo33P90X1gjv0HQQkV/6EeNgq4D47SWTPxRq3Ai
Dl9MFjTPl51oseDLrH6I/hBvcqjksB1M1+WjifT0bCIi3Y0HAea2U0wgweHS3vAd
QHqIpIJxNHDjPBMDWEZW
=ScfI
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver
* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...
patch(1) can't handle zero-length files - it appears to simply not create
the file, so my powerpc build fails.
Put something in here to make life easier.
Cc: Hugh Dickins <hughd@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds netns support for ipset.
Major changes were made in ip_set_core.c and ip_set.h.
Global variables are moved to per net namespace.
Added initialization code and the destruction of the network namespace ipset subsystem.
In the prototypes of public functions ip_set_* added parameter "struct net*".
The remaining corrections related to the change prototypes of public functions ip_set_*.
The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347
Signed-off-by: Vitaly Lavrov <lve@guap.ru>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Conflicts:
include/linux/netdevice.h
More extern removals from Joe Perches.
Minor conflict with the dev_notify_flags changes which added a new
argument to __dev_notify_flags().
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the core support for having comments on ipset entries.
The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Get rid of the structure based extensions and introduce a blob for
the extensions. Thus we can support more extension types easily.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Default timeout and extension offsets are moved to struct set, because
all set types supports all extensions and it makes possible to generalize
extension support.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
In order to support hash:net,net, hash:net,port,net etc. types,
arrays are introduced for the book-keeping of existing cidr sizes
and network numbers in a set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating a clear error message to the user, which is not
helpful.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Conflicts:
include/net/xfrm.h
Simple conflict between Joe Perches "extern" removal for function
declarations in header files and the changes in Steffen's tree.
Steffen Klassert says:
====================
Two patches that are left from the last development cycle.
Manual merging of include/net/xfrm.h is needed. The conflict
can be solved as it is currently done in linux-next.
1) We announce the creation of temporary acquire state via an asyc event,
so the deletion should be annunced too. From Nicolas Dichtel.
2) The VTI tunnels do not real tunning, they just provide a routable
IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier
instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch only prepares the next one, there is no functional change.
Now, __dev_notify_flags() can also be used to notify flags changes via
rtnetlink.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of RCU api makes vxlan code easier to understand. It also
fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
In rare case without ACCESS_ONCE() compiler might omit vs on
sk_user_data dereference.
Compiler can use vs as alias for sk->sk_user_data, resulting in
multiple sk_user_data dereference in rcu read context which
could change.
CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP packets hitting the SYN proxy through the SYNPROXY target are not
validated by TCP conntrack. When th->doff is below 5, an underflow happens
when calculating the options length, causing skb_header_pointer() to
return NULL and triggering the BUG_ON().
Handle this case gracefully by checking for NULL instead of using BUG_ON().
Reported-by: Martin Topholm <mph@one.com>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Here are some HyperV and MEI driver fixes for 3.12-rc3. They resolve some
issues that people have been reporting for them.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (GNU/Linux)
iEYEABECAAYFAlJIgcwACgkQMUfUDdst+ynRGwCgoIIGIXG2RTVaTDm8wezRERGA
ZDoAoKZe5Bec2CSqydcPNZWYg4ATTMjn
=6r+r
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some HyperV and MEI driver fixes for 3.12-rc3. They resolve
some issues that people have been reporting for them"
* tag 'char-misc-3.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: vmbus: Terminate vmbus version negotiation on timeout
Drivers: hv: util: Correctly support ws2008R2 and earlier
mei: cancel stall timers in mei_reset
mei: bus: stop wait for read during cl state transition
mei: make me client counters less error prone
Pull drm fixes from Dave Airlie:
"Nothing too major, radeon still has some dpm changes for off by
default.
Radeon, intel, msm:
- radeon: a few more dpm fixes (still off by default), uvd fixes
- i915: runtime warn backtrace and regression fix
- msm: iommu changes fallout"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (27 commits)
drm/msm: use drm_gem_dumb_destroy helper
drm/msm: deal with mach/iommu.h removal
drm/msm: Remove iommu include from mdp4_kms.c
drm/msm: Odd PTR_ERR usage
drm/i915: Fix up usage of SHRINK_STOP
drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
drm/i915: preserve pipe A quirk in i9xx_set_pipeconf
drm/i915/tv: clear adjusted_mode.flags
drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFER
drm/radeon/cik: fix overflow in vram fetch
drm/radeon: add missing hdmi callbacks for rv6xx
drm/i915: Use a temporary va_list for two-pass string handling
drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
drm/radeon: disable tests/benchmarks if accel is disabled
drm/radeon: don't set default clocks for SI when DPM is disabled
drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
drm/radeon/dpm: fetch the max clk from voltage dep tables helper
...
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.
SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.
u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
To be effectively paced, a flow must use FQ packet scheduler.
Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.
I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().
The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.
Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:
- the TTL, expressed as __u8.
The allowed values are in the [1-255].
A value of 0 means that the TTL is not specified.
- the TOS, expressed as __s16.
The allowed values are in the range [0,255].
A value of -1 means that the TOS is not specified.
- the priority, expressed as a char and computed when
handling the ancillary data.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A host might need net_secret[] and never open a single socket.
Problem added in commit aebda156a5
("net: defer net_secret[] initialization")
Based on prior patch from Hannes Frederic Sowa.
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is currently serialization network namespaces exiting and
network devices exiting as the final part of netdev_run_todo does not
happen under the rtnl_lock. This is compounded by the fact that the
only list of devices unregistering in netdev_run_todo is local to the
netdev_run_todo.
This lack of serialization in extreme cases results in network devices
unregistering in netdev_run_todo after the loopback device of their
network namespace has been freed (making dst_ifdown unsafe), and after
the their network namespace has exited (making the NETDEV_UNREGISTER,
and NETDEV_UNREGISTER_FINAL callbacks unsafe).
Add the missing serialization by a per network namespace count of how
many network devices are unregistering and having a wait queue that is
woken up whenever the count is decreased. The count and wait queue
allow default_device_exit_batch to wait until all of the unregistration
activity for a network namespace has finished before proceeding to
unregister the loopback device and then allowing the network namespace
to exit.
Only a single global wait queue is used because there is a single global
lock, and there is a single waiter, per network namespace wait queues
would be a waste of resources.
The per network namespace count of unregistering devices gives a
progress guarantee because the number of network devices unregistering
in an exiting network namespace must ultimately drop to zero (assuming
network device unregistration completes).
The basic logic remains the same as in v1. This patch is now half
comment and half rtnl_lock_unregistering an expanded version of
wait_event performs no extra work in the common case where no network
devices are unregistering when we get to default_device_exit_batch.
Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a router is doing DNAT for 6to4/6rd packets the latest
anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
6to4 and 6rd") will drop them because the IPv6 address embedded does
not match the IPv4 destination. This patch will allow them to pass by
testing if we have an address that matches on 6to4/6rd interface. I
have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
Also, log the dropped packets (with rate limit).
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 638c5115a7949(USBNET: support DMA SG) introduces DMA SG
if the usb host controller is capable of building packet from
discontinuous buffers, but missed handling padding packet when
building DMA SG.
This patch attachs the pre-allocated padding packet at the
end of the sg list, so padding packet can be sent to device
if drivers require that.
Reported-by: David Laight <David.Laight@aculab.com>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull s390 lockref enablement from Heiko Carstens:
"Enabling the new lockless lockref variant on s390 would have been
trivial until Tony Luck added a cpu_relax() call into the
CMPXCHG_LOOP(), with commit d472d9d98b ("lockref: Relax in cmpxchg
loop")
As already mentioned cpu_relax() is very expensive on s390 since it
yields() the current virtual cpu. So we are talking of several
thousand cycles. Considering this enabling the lockless lockref
variant would contradict the intention of the new semantics. And also
some quick measurements show performance regressions of 50% and more.
Simply removing the cpu_relax() call again seems also not very
desireable since Waiman Long reported that for some workloads the call
improved performance by 5%."
* 'lockref' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: enable ARCH_USE_CMPXCHG_LOCKREF
lockref: use arch_mutex_cpu_relax() in CMPXCHG_LOOP()
mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef
Clean-up to fix some warnings for !OF builds and spelling fixes in docs
- Clean-up openrisc prom.h
- Fix warnings caused by of_irq.h ifdefs
- Spelling fix for Synopsys
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSQk1CAAoJEMhvYp4jgsXilvEH/RztxbKNiscz7TYg3BDEKMaY
Wg029sdygyhl+Xb78FaS3Qwt98pK2JSWqzoNv+Tv4tfksGeLWVpe6HZlGvE/PEhb
2gUppV/ILyFSH8NfeaPkfyqdqr2XpCkcpkyGgYz7jQzNti5si448LC4oJR4tcXbo
FyKguRPpTtCwDsG86fPpQgc3fHK+C7W0oQu+PCB0DEk2ApMR05rGpPiMEYsKlj56
FE9n0DMZ+hfMCUivraVTyHB9Xjo4/kTv9s88hkjwkli+N3UrIUgOGsQpNt27QIFP
F95roRtVOmgRbgtMAFaEy7nfd+3QuVfonjfYNJzxpRHWew7LxFcVEwWRjb0zaKQ=
=KDHt
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Clean-up to fix some warnings for !OF builds and spelling fixes in
docs:
- Clean-up openrisc prom.h
- Fix warnings caused by of_irq.h ifdefs
- Spelling fix for Synopsys"
* tag 'devicetree-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dts: Fix misspelling of Synopsys
of: clean-up ifdefs in of_irq.h
openrisc: clean-up prom.h
Linus suggested to replace
#ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
#define arch_mutex_cpu_relax() cpu_relax()
#endif
with just a simple
#ifndef arch_mutex_cpu_relax
# define arch_mutex_cpu_relax() cpu_relax()
#endif
to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
simply define arch_mutex_cpu_relax if they want an architecture
specific function instead of having to add a select statement in
their Kconfig in addition.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
More radeon fixes for 3.12. Kind of all over the place: UVD, DPM,
tiling, etc.
* 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix hdmi audio on DCE3.0/3.1 asics
drm/radeon/cik: fix overflow in vram fetch
drm/radeon: add missing hdmi callbacks for rv6xx
drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
drm/radeon: disable tests/benchmarks if accel is disabled
drm/radeon: don't set default clocks for SI when DPM is disabled
drm/radeon/dpm/ci: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/si: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/ni: filter clocks based on voltage/clk dep tables
drm/radeon/dpm/btc: filter clocks based on voltage/clk dep tables
drm/radeon/dpm: fetch the max clk from voltage dep tables helper
drm/radeon: fix missed variable sized access
drm/radeon: Make r100_cp_ring_info() and radeon_ring_gfx() safe (v2)
drm/radeon/cik: Add tiling mode index for 1D tiled depth/stencil surfaces
drm/radeon/cik: Fix encoding of number of banks in tiling configuration info
drm/radeon/cik: Fix printing of client name on VM protection fault
drm/radeon: additional gcc fixes for radeon_atombios.c
drm/radeon: avoid UVD corruption on AGP cards using GPU gart
Dumping routes on a system with lots rt6_infos in the fibs causes up to
11-order allocations in seq_file (which fail). While we could switch
there to vmalloc we could just implement the streaming interface for
/proc/net/ipv6_route. This patch switches /proc/net/ipv6_route from
single_open_net to seq_open_net.
loff_t *pos tracks dst entries.
Also kill never used struct rt6_proc_arg and now unused function
fib6_clean_all_ro.
Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
The current code does not correctly negotiate the version numbers for the util
driver when hosted on earlier hosts. The version numbers presented by this
driver were not compatible with the version numbers supported by Windows Server
2008. Fix this problem.
I would like to thank Olaf Hering (ohering@suse.com) for identifying the problem.
Reported-by: Olaf Hering <ohering@suse.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It will be useful to get first/last element.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a possibility to iterate through netdev_adjacent's private, currently
only for lower neighbours.
Add both RCU and RTNL/other locking variants of iterators, and make the
non-rcu variant to be safe from removal.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, even though we can access any linked device, we can't attach
anything to it, which is vital to properly manage them.
To fix this, add a new void *private to netdev_adjacent and functions
setting/getting it (per link), so that we can save, per example, bonding's
slave structures there, per slave device.
netdev_master_upper_dev_link_private(dev, upper_dev, private) links dev to
upper dev and populates the neighbour link only with private.
netdev_lower_dev_get_private{,_rcu}() returns the private, if found.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we distinguish neighbours (first-level linked devices) from
non-neighbours by the neighbour bool in the netdev_adjacent. This could be
quite time-consuming in case we would like to traverse *only* through
neighbours - cause we'd have to traverse through all devices and check for
this flag, and in a (quite common) scenario where we have lots of vlans on
top of bridge, which is on top of a bond - the bonding would have to go
through all those vlans to get its upper neighbour linked devices.
This situation is really unpleasant, cause there are already a lot of cases
when a device with slaves needs to go through them in hot path.
To fix this, introduce a new upper/lower device lists structure -
adj_list, which contains only the neighbours. It works always in
pair with the all_adj_list structure (renamed from upper/lower_dev_list),
i.e. both of them contain the same links, only that all_adj_list contains
also non-neighbour device links. It's really a small change visible,
currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't
change the main linked logic at all.
Also, add some comments a fix a name collision in
netdev_for_each_upper_dev_rcu() and rework the naming by the following
rules:
netdev_(all_)(upper|lower)_*
If "all_" is present, then we work with the whole list of upper/lower
devices, otherwise - only with direct neighbours. Uninline functions - to
get better stack traces.
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Determine if we've created a new file by examining the directory change
attribute and/or the O_EXCL flag.
This fixes a regression when doing a non-exclusive create of a new file.
If the FILE_CREATED flag is not set, the atomic_open() command will
perform full file access permissions checks instead of just checking
for MAY_OPEN.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It will be used later by the IBSS CSA implementation of mac80211.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If it is needed to disconnect multiple virtual interfaces after
(WoWLAN-) suspend, the most obvious approach would be to iterate
all interfaces by calling ieee80211_iterate_active_interfaces()
and then call ieee80211_resume_disconnect() for each one. This
is what the iwlmvm driver does.
Unfortunately, this causes a locking dependency from mac80211's
iflist_mtx to the key_mtx. This is problematic as the former is
intentionally never held while calling any driver operation to
allow drivers to iterate with their own locks held. The key_mtx
is held while installing a key into the driver though, so this
new lock dependency means drivers implementing the logic above
can no longer hold their own lock while iterating.
To fix this, add a new ieee80211_iterate_active_interfaces_rtnl()
function that iterates while the RTNL is already held. This is
true during suspend/resume, so that then the locking dependency
isn't introduced.
While at it, also refactor the various interface iterators and
keep only a single implementation called by the various cases.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A few fixes for dm-snapshot, a 32 bit fix for dm-stats, a couple error
handling fixes for dm-multipath. A fix for the thin provisioning target
to not expose non-zero discard limits if discards are disabled.
Lastly, add two DM module parameters which allow users to tune the
emergency memory reserves that DM mainatins per device -- this helps fix
a long-standing issue for dm-multipath. The conservative default
reserve for request-based dm-multipath devices (256) has proven
problematic for users with many multipathed SCSI devices but relatively
little memory. To responsibly select a smaller value users should use
the new nr_bios tracepoint info (via commit 75afb352 "block: Add nr_bios
to block_rq_remap tracepoint") to determine the peak number of bios
their workloads create.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSQMVHAAoJEMUj8QotnQNaOXgIAJS6/XJKMoHfiDJ9M+XD34rZ
Uyr9TEnubX3DKCRBiY23MUcCQn3fx6BjCGv5/c8L4jQFIuLyDi2yatqpwXcbGSJh
G/S/y6u0Axek+ew7TS80OFop4nblW6MoKnoh9/4N55Ofa+1WvKM4ERUGjHGbauyS
TxmLQPToCFPLYRIOZ+imd6hQuIZ1+FFdJFvi7kY9O6Llx2sLD6fWi1iruBd/Da2H
ByMX3biGN45mSpcBzRbSC/FkJ9CRIvT9n82BDPS0o3Tllt8NaVlEDaovB7h4ncc0
bFuT2Z3Q38B9uZ8Lj0bqdGzv3kXMLCkLo6WhWjyUt84hmDPAzRpBwt60jUqWyZs=
=bjVp
-----END PGP SIGNATURE-----
Merge tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device-mapper fixes from Mike Snitzer:
"A few fixes for dm-snapshot, a 32 bit fix for dm-stats, a couple error
handling fixes for dm-multipath. A fix for the thin provisioning
target to not expose non-zero discard limits if discards are disabled.
Lastly, add two DM module parameters which allow users to tune the
emergency memory reserves that DM mainatins per device -- this helps
fix a long-standing issue for dm-multipath. The conservative default
reserve for request-based dm-multipath devices (256) has proven
problematic for users with many multipathed SCSI devices but
relatively little memory. To responsibly select a smaller value users
should use the new nr_bios tracepoint info (via commit 75afb352
"block: Add nr_bios to block_rq_remap tracepoint") to determine the
peak number of bios their workloads create"
* tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: add reserved_bio_based_ios module parameter
dm: add reserved_rq_based_ios module parameter
dm: lower bio-based mempool reservation
dm thin: do not expose non-zero discard limits if discards disabled
dm mpath: disable WRITE SAME if it fails
dm-snapshot: fix performance degradation due to small hash size
dm snapshot: workaround for a false positive lockdep warning
dm stats: fix possible counter corruption on 32-bit systems
dm mpath: do not fail path on -ENOSPC