By semantics, xfrm layer is fully name space aware,
so will the locks, e.g. xfrm_state/pocliy_lock.
Ensure exclusive access into state/policy link list
for different name space with one global lock is not
right in terms of semantics aspect at first place,
as they are indeed mutually independent with each
other, but also more seriously causes scalability
problem.
One practical scenario is on a Open Network Stack,
more than hundreds of lxc tenants acts as routers
within one host, a global xfrm_state/policy_lock
becomes the bottleneck. But onces those locks are
decoupled in a per-namespace fashion, locks contend
is just with in specific name space scope, without
causing additional SPD/SAD access delay for other
name space.
Also this patch improve scalability while as without
changing original xfrm behavior.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
because the home agent could surely be run on a different
net namespace other than init_net. The original behavior
could lead into inconsistent of key info.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
xfrm code always searches for unused policy index for
newly created policy regardless whether or not user
space policy index hint supplied.
This patch enables such feature so that using
"ip xfrm ... index=xxx" can be used by user to set
specific policy index.
Currently this beahvior is broken, so this patch make
it happen as expected.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We must clear local_df when passing the skb between namespaces as the
packet is not local to the new namespace any more and thus may not get
fragmented by local rules. Fred Templin noticed that other namespaces
do fragment IPv6 packets while forwarding. Instead they should have send
back a PTB.
The same problem should be present when forwarding DF-IPv4 packets
between namespaces.
Reported-by: Templin, Fred L <Fred.L.Templin@boeing.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
kill memcg_tcp_enter_memory_pressure. The only function of
memcg_tcp_enter_memory_pressure was to reduce deal with the
unnecessary abstraction that was tcp_memcontrol. Now that struct
tcp_memcontrol is gone remove this unnecessary function, the
unnecessary function pointer, and modify sk_enter_memory_pressure to
set this field directly, just as sk_leave_memory_pressure cleas this
field directly.
This fixes a small bug I intruduced when killing struct tcp_memcontrol
that caused memcg_tcp_enter_memory_pressure to never be called and
thus failed to ever set cg_proto->memory_pressure.
Remove the cg_proto enter_memory_pressure function as it now serves
no useful purpose.
Don't test cg_proto->memory_presser in sk_leave_memory_pressure before
clearing it. The test was originally there to ensure that the pointer
was non-NULL. Now that cg_proto is not a pointer the pointer does not
matter.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Michael pointed out that when max_burst is 0, it just disable
max_burst. It declared in rfc6458#section-8.1.24. so add the check
in sctp_transport_burst_limited, when it 0, just do nothing.
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Suggested-by: Michael Tuexen <Michael.Tuexen@lurchi.franken.de>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
SIOCGHWTSTAMP ioctl
1. Add the SIOCGHWTSTAMP ioctl and update the timestamping
documentation.
2. Implement SIOCGHWTSTAMP in most drivers that support SIOCSHWTSTAMP.
3. Add a test program to exercise SIOC{G,S}HWTSTAMP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
- Stable fix for a loop on irrecoverable errors when returning delegations
- Fix a 3-way deadlock between layoutreturn, open, and state recovery
- Update the MAINTAINERS file with contact information for Trond Myklebust
- Close needs to handle NFS4ERR_ADMIN_REVOKED
- Enabling v4.2 should not recompile nfsd and lockd
- Fix a couple of compile warnings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSoLTpAAoJEGcL54qWCgDy2dgQAIKkKAXccg3OG2b1SxJmiaja
PcrovNmgg3HvYQ7clUMqtrMByiXEpSybl6tAeXYUWE3sS1DISSBVEwO3MoOiASiM
951Ssx+CoyhsHYo5aH83sUIiWFl/YsRhpKmSr2cdQd13DQTFbPq896k64Inf6L2/
9fngoqOD7FunQHn8AiVPoDOQzObB0OuKhYCwuwLt47oPiwgmm12JQNCDxU1i4sxb
lkGUBLkPMs6D5IyI8XHaMyX3+8MvmPiIsjIKaNJRdhkuX/k7ollucTJXyvyEQKK0
PhBIWyUULmKcAXYwCfHf9UoyGZFvmj47YggyKcBd26OZUEFekcWrULfym46F1xak
EcO6D4mlTy5i5W0RBqYCj1oGud57rixZBmhLTbeq6sSJaiqBfGEs225Q17H7rsEB
YIghHiEFNnBmVWELhHxbJHQoY6HOugmZOuc0dxopaikN/7to8gnYoVyTIVlMfe/t
UNXZoer6GOOohJGtZ7s7v4Al7EzvwnVnBCBklEAKFJ7Ca2LEmq+b58oQW3nJ1mPn
y4TnihxYXsSEbqy+Lds9rumRhJLG1oVTpwficAm7N3HdK3abzCIPEt6iOHoCmXQz
J1B4gmwOKsDqVlCSpBsnc3ZiBlSJGOn6MmVQUCNFpzv/DetWn/BxEUPE8cNm8DaI
WioD0grC0/9bR8oD1m+w
=UZ51
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
- Stable fix for a loop on irrecoverable errors when returning
delegations
- Fix a 3-way deadlock between layoutreturn, open, and state recovery
- Update the MAINTAINERS file with contact information for Trond
Myklebust
- Close needs to handle NFS4ERR_ADMIN_REVOKED
- Enabling v4.2 should not recompile nfsd and lockd
- Fix a couple of compile warnings
* tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix do_div() warning by instead using sector_div()
MAINTAINERS: Update contact information for Trond Myklebust
NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
SUNRPC: do not fail gss proc NULL calls with EACCES
NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
NFSv4: Update list of irrecoverable errors on DELEGRETURN
NFSv4 wait on recovery for async session errors
NFS: Fix a warning in nfs_setsecurity
NFS: Enabling v4.2 should not recompile nfsd and lockd
The values in l2cap_chan that are used for actually transmitting data
only need to be initialized right after we've received an L2CAP Connect
Request or just before we send one. The only thing that we need to
initialize though bind() and connect() is the chan->mode value. This way
all other initializations can be done in the l2cap_le_flowctl_init
function (which now becomes private to l2cap_core.c) and the
l2cap_le_flowctl_start function can be completely removed.
Also, since the l2cap_sock_init function initializes the imtu and omtu
to adequate values these do not need to be part of l2cap_le_flowctl_init.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds entries to debugfs to control the values used for the
MPS and Credits for LE Flow Control Mode.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
LE PSM values have different ranges than those for BR/EDR. The valid
ranges for fixed, SIG assigned values is 0x0001-0x007f and for dynamic
PSM values 0x0080-0x00ff. We need to ensure that bind() and connect()
calls conform to these ranges when operating on LE CoC sockets.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
LE CoC used differend CIC ranges than BR/EDR L2CAP. The start of the
range is the same (0x0040) but the range ends at 0x007f (unlike BR/EDR
where it goes all the way to 0xffff).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The outgoing MTU should only be set upon channel creation to the initial
minimum value (23) or from a remote connect req/rsp PDU.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It doesn't make sense to have an MPS value greater than the configured
MTU value since we will then not be able to fill up the packets to their
full possible size. We need to set the MPS both in flowctl_init()
as well as flowctl_start() since the imtu may change after init() but
start() is only called after we've sent the LE Connection Request PDU
which depends on having a valid MPS value.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If the peer gives us zero credits in its connection request or response
we must call the suspend channel callback so the BT_SK_SUSPEND flag gets
set and user space is blocked from sending data to the socket.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Unlike BR/EDR, for LE when we're in the BT_CONNECT state we may or may
not have already have sent the Connect Request. This means that we need
some extra tracking of the request. This patch adds an extra channel
flag to prevent the request from being sent a second time.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When receiving fragments over an LE Connection oriented Channel they
need to be collected up and eventually merged into a single SDU. This
patch adds the necessary code for collecting up the fragment skbs to the
channel context and passing them to the recv() callback when the entire
SDU has been received.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds segmentation support for outgoing data packets. Packets
are segmented based on the MTU and MPS values. The l2cap_chan struct
already contains many helpful variables from BR/EDR Enhanced L2CAP which
can be used for this.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Setting the BT_SK_SUSPEND socket flag from the L2CAP core is causing a
dependency on the socket. So instead of doing that, use a channel
callback into the socket handling to suspend.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Since LE CoC support needs to be enabled through a module option for now
we need to reject any related signaling PDUs in addition to rejecting
the creation of LE CoC sockets (which we already do).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the necessary discipline for reacting to LE L2CAP
Credits packets, sending those packets, and modifying the known credits
accordingly.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We should return credits to the remote side whenever they fall below a
certain level (in our case under half of the initially given amount).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds new socket options for LE sockets since the existing
L2CAP_OPTIONS socket option is not usable for LE. For now, the new
socket options also require LE CoC support to be explicitly enabled to
leave some playroom in case something needs to be changed in a backwards
incompatible way.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Most of the values in L2CAP_OPTIONS are not applicable for LE and those
that are have different semantics. It makes therefore sense to
completely block this socket option for LE and add (in a separate patch)
a new socket option for tweaking the values that do make sense (mainly
the send and receive MTU). Legacy user space ATT code still depends on
getsockopt for L2CAP_OPTIONS though so we need to plug a hole for that
for backwards compatibility.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds tracking of L2CAP connection oriented channel local and
remote credits to struct l2cap_chan and ensures that connect requests
and responses contain the right values.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The LE connection oriented channels have their own mode with its own
data transfer rules. In order to implement this properly we need to
distinguish L2CAP channels operating in this mode from other modes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch makes the error handling and return logic of l2cap_le_sig_cmd
consistent with its BR/EDR counterpart.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The normal L2CAP Disconnect request and response are also used for LE
connection oriented channels. Therefore, we can simply use the existing
handler functions for terminating LE based L2CAP channels.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Sending of the L2CAP Disconnect request should also be performed for LE
based channels. The proper thing to do is therefore to look at whether
there's a PSM specified for the channel instead of looking at the link
type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the necessary boiler plate code to handle receiving
L2CAP connect requests over LE and respond to them with a proper connect
response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll need to have a separate code path for LE based connection
rejection so it's cleaner to move out the response construction code
into its own function (and later a second one for LE).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This function is needed both by the smp_conn_security function as well
as upcoming code to check for the security requirements when receiving
an L2CAP connect request over LE.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the necessary code to send an LE L2CAP Connect Request
and handle its response when user space has provided us with an LE
socket with a PSM instead of a fixed CID.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Once connection oriented L2CAP channels over LE are supported they will
need a completely separate handling from BR/EDR channels.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The LE signaling PDU length is already calculated in the
l2cap_le_sig_channel function so we can just pass the value to the
various handler functions to avoid unnecessary recalculations (byte
order conversions). Right now the only user is the connection parameter
update procedure, but as new LE signaling operations become available
(for connection oriented channels) they will also be able to make use of
the value.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With connection oriented L2CAP channels some code paths will be shared
with BR/EDR links. It is therefore necessary to allow the
l2cap_chan_check_security function to be usable also for LE links in
addition to BR/EDR ones. This means that smp_conn_security() needs to be
called instead of hci_conn_security() in the case of an LE link.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Once connection oriented L2CAP channels become possible for LE we need
to be able to specify the link type we're interested in when looking up
L2CAP channels. Therefore, add a link_type parameter to the
l2cap_global_chan_by_psm() function which gets compared to the address
type associated with each l2cap_chan.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Along with the L2CAP Connection Oriented Channels features it is now
allowed to use both custom fixed CIDs as well as PSM based (connection
oriented connections). Since the support for this (with the subsequent
patches) is still on an experimental stage, add a module parameter to
enable it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch is just a trivial coding style fix to remove unnecessary
braces from a one-line if-statement.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The AES cipher is used in ECB mode by SMP and therefore doesn't use an
IV (crypto_blkcipher_ivsize returns 0) so the code trying to set the IV
was never getting called. Simply remove this code to avoid anyone from
thinking it actually makes some difference.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This function was always just making a single get_random_bytes() call
and always returning the value 0. It's simpler to just call
get_random_bytes() directly where needed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hci_disconn_complete_evt() logic is more complicated than what it
should be, making it hard to follow and add new features.
So this patch does some code refactoring by handling the error cases
in the beginning of the function and by moving the main flow into the
first level of function scope. No change is done in the event handling
logic itself.
Besides organizing this messy code, this patch makes easier to add
code for handling LE auto connection (which will be added in a further
patch).
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
According to b644ba336 (patch that introduced HCI_CONN_MGMT_CONNECTED
flag), the HCI_CONN_MGMT_CONNECTED flag tracks when mgmt has been
notified about the connection.
That being said, there is no point in calling mgmt_disconnect_failed()
conditionally based on this flag. mgmt_disconnect_failed() removes
pending MGMT_OP_DISCONNECT commands, it doesn't matter if that
connection was notified or not.
Moreover, if the Disconnection Complete event has status then we have
nothing else to do but call mgmt_disconnect_failed() and return.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The send parameter has only been used for determining whether to send a
Pairing Failed PDU or not. However, the function can equally well use
the already existing reason parameter to make this choice and send the
PDU whenever a non-zero value was passed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We can safely remove the link type check from hci_disconn_complete_
evt() since this check in not required for mgmt_disconnect_failed()
and mgmt_device_disconnected() does it internally.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an extra check in mgmt_device_disconnected() so we only
send the "Device Disconnected" event if it is ACL_LINK or LE_LINK link
type.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Check the address and address type in mgmt_disconnect_failed() otherwise
we may wrongly fail the MGMT_OP_DISCONNECT command.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The list of supported commands of a controller can not change during
its lifetime. So store the list just once during the setup procedure
and not every time the HCI command is executed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The complete list of local features are available through debugfs and
so there is no need to add a debug print here.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The default own address type is currently set at every power on of
a controller. This overwrites the value set via debugfs. To avoid
this issue, set the default own address type only during controller
setup.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
There is an old Panasonic module with a Zeevo chip in there that is
not really operating according to Bluetooth core specification when
it comes to setting the IAC LAP for limited discoverable mode.
For reference, this is the vendor information about this module:
< HCI Command: Read Local Version Information (0x04|0x0001) plen 0
> HCI Event: Command Complete (0x0e) plen 12
Read Local Version Information (0x04|0x0001) ncmd 1
Status: Success (0x00)
HCI version: Bluetooth 1.2 (0x02) - Revision 196 (0x00c4)
LMP version: Bluetooth 1.2 (0x02) - Subversion 61 (0x003d)
Manufacturer: Zeevo, Inc. (18)
The module reports only the support for one IAC at a time. And that
is totally acceptable according to the Bluetooth core specification
since the minimum supported IAC is only one.
< HCI Command: Read Number of Supported IAC (0x03|0x0038) plen 0
> HCI Event: Command Complete (0x0e) plen 5
Read Number of Supported IAC (0x03|0x0038) ncmd 1
Status: Success (0x00)
Number of IAC: 1
The problem arises when trying to program two IAC into the module
on a controller that only supports one.
< HCI Command: Write Current IAC LAP (0x03|0x003a) plen 7
Number of IAC: 2
Access code: 0x9e8b00 (Limited Inquiry)
Access code: 0x9e8b33 (General Inquiry)
> HCI Event: Command Status (0x0f) plen 4
Write Current IAC LAP (0x03|0x003a) ncmd 1
Status: Unknown HCI Command (0x01)
While this looks strange, but according to the Bluetooth core
specification it is a legal operation. The controller has to
ignore the other values and only program as many as it supports.
This command shall clear any existing IACs and stores Num_Current_IAC
and the IAC_LAPs in to the controller. If Num_Current_IAC is greater
than Num_Support_IAC then only the first Num_Support_IAC shall be
stored in the controller, and a Command Complete event with error
code Success (0x00) shall be generated.
This specific controller has a bug here and just returns an error. So
in case the number of supported IAC is less than two and the limited
discoverable mode is requested, now only the LIAC is written to
the controller.
< HCI Command: Write Current IAC LAP (0x03|0x003a) plen 4
Number of IAC: 1
Access code: 0x9e8b00 (Limited Inquiry)
> HCI Event: Command Complete (0x0e) plen 4
Write Current IAC LAP (0x03|0x003a) ncmd 1
Status: Success (0x00)
All other controllers that only support one IAC seem to handle this
perfectly fine, but this fix will only write the LIAC for these
controllers as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.
For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
The commit 6094628bfd
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
&& rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
= 8192
c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
while (ret) {
tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
[tmp = 65536 - 57632 = 7904]
conn->c_xmit_data_off += tmp;
[c_xmit_data_off = 57632 + 7904 = 65536]
ret -= tmp;
[ret = 8240 - 7904 = 336]
if (conn->c_xmit_data_off == sg->length) {
conn->c_xmit_data_off = 0;
sg++;
conn->c_xmit_sg++;
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
[c_xmit_sg = 1, rm->data.op_nents = 1]
What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.
Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The behaviour of blackhole and prohibit routes has been corrected by setting
the input and output pointers of the dst variable appropriately. For
blackhole routes, they are set to dst_discard and to ip6_pkt_discard and
ip6_pkt_discard_out respectively for prohibit routes.
ipv6: ip6_pkt_prohibit(_out) should not depend on
CONFIG_IPV6_MULTIPLE_TABLES
We need ip6_pkt_prohibit(_out) available without
CONFIG_IPV6_MULTIPLE_TABLES
Signed-off-by: Kamala R <kamala@aristanetworks.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
when dealing with a RA message, if accept_ra_defrtr is false,
the kernel will not add the default route, and then deal with
the following route information options. Unfortunately, those
options maybe contain default route, so let's judge the
accept_ra_defrtr before calling rt6_route_rcv.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Here is a pile of bug fixes that accumulated while I was in Europe"
1) In fixing kernel leaks to userspace during copying of socket
addresses, we broke a case that used to work, namely the user
providing a buffer larger than the in-kernel generic socket address
structure. This broke Ruby amongst other things. Fix from Dan
Carpenter.
2) Fix regression added by byte queue limit support in 8139cp driver,
from Yang Yingliang.
3) The addition of MSG_SENDPAGE_NOTLAST buggered up a few sendpage
implementations, they should just treat it the same as MSG_MORE.
Fix from Richard Weinberger and Shawn Landden.
4) Handle icmpv4 errors received on ipv6 SIT tunnels correctly, from
Oussama Ghorbel. In particular we should send an ICMPv6 unreachable
in such situations.
5) Fix some regressions in the recent genetlink fixes, in particular
get the pmcraid driver to use the new safer interfaces correctly.
From Johannes Berg.
6) macvtap was converted to use a per-cpu set of statistics, but some
code was still bumping tx_dropped elsewhere. From Jason Wang.
7) Fix build failure of xen-netback due to missing include on some
architectures, from Andy Whitecroft.
8) macvtap double counts received packets in statistics, fix from Vlad
Yasevich.
9) Fix various cases of using *_STATS_BH() when *_STATS() is more
appropriate. From Eric Dumazet and Hannes Frederic Sowa.
10) Pktgen ipsec mode doesn't update the ipv4 header length and checksum
properly after encapsulation. Fix from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
net/mlx4_en: Remove selftest TX queues empty condition
{pktgen, xfrm} Update IPv4 header total len and checksum after tranformation
virtio_net: make all RX paths handle erors consistently
virtio_net: fix error handling for mergeable buffers
virtio_net: Fixed a trivial typo (fitler --> filter)
netem: fix gemodel loss generator
netem: fix loss 4 state model
netem: missing break in ge loss generator
net/hsr: Support iproute print_opt ('ip -details ...')
net/hsr: Very small fix of comment style.
MAINTAINERS: Added net/hsr/ maintainer
ipv6: fix possible seqlock deadlock in ip6_finish_output2
ixgbe: Make ixgbe_identify_qsfp_module_generic static
ixgbe: turn NETIF_F_HW_L2FW_DOFFLOAD off by default
ixgbe: ixgbe_fwd_ring_down needs to be static
e1000: fix possible reset_task running after adapter down
e1000: fix lockdep warning in e1000_reset_task
e1000: prevent oops when adapter is being closed and reset simultaneously
igb: Fixed Wake On LAN support
inet: fix possible seqlock deadlocks
...
Drivers with hardware rate control were given
sta->rx_nss set to 0. This was because rx_nss
calculation procedure was protected by hw/sw rate
control check.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When external CSA IEs are received (beacons or action messages), a
channel switch is triggered as well. This should only be allowed on
devices which actually support channel switches, otherwise disconnect.
(For the corresponding userspace invocation, the wiphy flag is checked
in nl80211).
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The channel switch announcement code has some major locking problems
which can cause a deadlock in worst case. A series of fixes has been
proposed, but these are non-trivial and need to be tested first.
Therefore disable CSA completely for 3.13.
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current channel switch code has a potential deadlock:
1) * cfg80211_stop_ap acquires wdev-lock
* ieee80211_stop_ap calls cancel_work_sync for the csa_finalize_work,
which acquires the associated worker-lock
2) * ieee80211_csa_finalize_work holds the worker-lock when run
* it calls cfg80211_ch_switch_notify which will claim the wdev-lock,
and also needs to claim the sdata-lock (which is the same as the
wdev-lock) to modify the beacons.
It is sufficient to just set the channel switch active to false. If the
worker is running later, it will find the channel switch to not be
active anymore and returns immediately without changing anything.
Canceling the worker is done anyway when the interface goes down
(ieee80211_do_stop).
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The channel switch notification should be sent under the
wdev/sdata-lock, preferably in the same moment as the channel change
happens, to avoid races by other callers (e.g. start/stop_ap).
This also adds the previously missing sdata_lock protection in
csa_finalize_work.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The csa finalize worker needs to change the beacon information (for
different modes). These are normally protected under rtnl lock, but the
csa finalize worker is called by drivers and should not acquire the RTNL
lock. Therefore change access protection for beacons to sdata/wdev lock.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[fix sdata_dereference]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To avoid race conditions in functions which modify the beacon
information, lock these using the wdev lock. This is especially required
to avoid problems for csa handling functions which modify beacons but
can not be called under rtnl lock.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The local TSF timer is used to compute the timing offset between
mesh peers on beacon reception. However, asking the device for
the TSF is not very accurate, so we prefer to use rx->mactime
if available. In the latter case, calling drv_get_tsf() just
adds more delay into the RX path, so skip it if we can.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change cfg80211 and mac80211 to use cfg80211_mgmt_tx_params
struct to aggregate parameters for mgmt_tx functions.
This makes the functions' signatures less clumsy and allows
less painful parameters extension.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
[fix all other drivers]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's a bug in tracking HT opmode changes in mac80211, it
fails to update the driver when the channel parameters don't
change.
Move the code to do the HT opmode checking independently of
the channel/bandwidth tracking.
Signed-off-by: Avri Altman <avri.altman@intel.com>
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
All interface types now properly clean up their stations
using some form of sta_info_flush() themselves, so there's
no need to try it again at teardown. Remove the call to
get rid of the extra delay from the synchronize_net() and
rcu_barrier() calls.
Reported-by: Moshe Benji <moshe.benji@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Measure TX latency and jitter statistics per station per TID.
These Measurements are disabled by default and can be enabled
via debugfs.
Features included for each station's TID:
1. Keep count of the maximum and average latency of Tx frames.
2. Keep track of many frames arrived in a specific time range
(need to enable through debugfs and configure the bins ranges)
Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nla_put_flag needs a real nl80211 attribute id, not a wiphy flag bit.
While at it, split 5 and 10 MHz capability flags in case we ever need
to support hardware that can only do one of the two.
Also move the flag settings to the split-only information so we don't
increase the space needed for old userspace.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[change location of flag setting]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit a553e4a631 ("[PKTGEN]: IPSEC support")
tried to support IPsec ESP transport transformation for pktgen, but acctually
this doesn't work at all for two reasons(The orignal transformed packet has
bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
- After transpormation, IPv4 header total length needs update,
because encrypted payload's length is NOT same as that of plain text.
- After transformation, IPv4 checksum needs re-caculate because of payload
has been changed.
With this patch, armmed pktgen with below cofiguration, Wireshark is able to
decrypted ESP packet generated by pktgen without any IPv4 checksum error or
auth value error.
pgset "flag IPSEC"
pgset "flows 1"
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from developers of the alternative loss models, downloaded from:
http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG
"in case 2, of the switch we change the direction of the inequality to
net_random()>clg->a3, because clg->a3 is h in the GE model and when h
is 0 all packets will be lost."
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch from developers of the alternative loss models, downloaded from:
http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG
"In the case 1 of the switch statement in the if conditions we
need to add clg->a4 to clg->a1, according to the model."
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a missing break statement in the Gilbert Elliot loss model
generator which makes state machine behave incorrectly.
Reported-by: Martin Burri <martin.burri@ch.abb.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements the rtnl_link_ops fill_info routine for HSR.
Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 stats are 64 bits and thus are protected with a seqlock. By not
disabling bottom-half we could deadlock here if we don't disable bh and
a softirq reentrantly updates the same mib.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit c9e9042994 ("ipv4: fix possible seqlock deadlock") I left
another places where IP_INC_STATS_BH() were improperly used.
udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from
process context, not from softirq context.
This was detected by lockdep seqlock support.
Reported-by: jongman heo <jongman.heo@samsung.com>
Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag MSG_SENDPAGE_NOTLAST, similar to
MSG_MORE.
algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
and need to see the new flag as identical to MSG_MORE.
This fixes sendfile() on AF_ALG.
v3: also fix udp
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> # 3.4.x + 3.2.x
Reported-and-tested-by: Shawn Landden <shawnlandden@gmail.com>
Original-patch: Richard Weinberger <richard@nod.at>
Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
original code that would lead to memory corruption in the kernel if you
had audit configured. If you didn't have audit configured it was
harmless.
There are some programs such as beta versions of Ruby which use too
large of a buffer and returning an error code breaks them. We should
clamp the ->msg_namelen value instead.
Fixes: 1661bf364a ("net: heap overflow in __audit_sockaddr()")
Reported-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Eric Wong <normalperson@yhbt.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we're using plain spin_lock() in prb_shutdown_retire_blk_timer(),
however the timer might fire right in the middle and thus try to re-aquire
the same spinlock, leaving us in a endless loop.
To fix that, use the spin_lock_bh() to block it.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
CC: "David S. Miller" <davem@davemloft.net>
CC: Daniel Borkmann <dborkman@redhat.com>
CC: Willem de Bruijn <willemb@google.com>
CC: Phil Sutter <phil@nwl.cc>
CC: Eric Dumazet <edumazet@google.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
since f9242b6b28
inet: Sanitize inet{,6} protocol demux.
there are not pretended hash tables for ipv4 or
ipv6 protocol handler.
Signed-off-by: Baker Zhang <Baker.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In failure case, we should use kfree_skb not
dev_kfree_skb to free skbuff, dev_kfree_skb
is defined as consume_skb.
Trace takes advantage of this point.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently retransmitted DATA chunks could also be used for
RTT measurements since there are no flag to identify whether
the transmitted DATA chunk is a new one or a retransmitted one.
This problem is introduced by commit ae19c5486 ("sctp: remove
'resent' bit from the chunk") which inappropriately removed the
'resent' bit completely, instead of doing this, we should set
the resent bit only for the retransmitted DATA chunks.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pmcraid driver is abusing the genetlink API and is using its
family ID as the multicast group ID, which is invalid and may
belong to somebody else (and likely will.)
Make it use the correct API, but since this may already be used
as-is by userspace, reserve a family ID for this code and also
reserve that group ID to not break userspace assumptions.
My previous patch broke event delivery in the driver as I missed
that it wasn't using the right API and forgot to update it later
in my series.
While changing this, I noticed that the genetlink code could use
the static group ID instead of a strcmp(), so also do that for
the VFS_DQUOT family.
Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netlink/genetlink.c: In function ‘genl_validate_assign_mc_groups’:
net/netlink/genetlink.c:217: warning: ‘err’ may be used uninitialized in this
function
Commit 2a94fe48f3 ("genetlink: make multicast
groups const, prevent abuse") split genl_register_mc_group() in multiple
functions, but dropped the initialization of err.
Initialize err to zero to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise RPCSEC_GSS_DESTROY messages are not sent.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Both sides of the comparison are the same, looks like a cut-and-paste error.
Spotted by Coverity.
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ATM minstrel_ht does not check whether a sampling rate is supported.
Unsupported rates attempts can trigger when there are holes in bitfields
of supported MCSes belonging to the same group (e.g many devices are
MCS32 capable without MCS33->39 capable, also we systematically have a
hole for CCK rates).
Drop any attempts to sample unsupported rates, as suggested by Felix.
This is not a problem in minstrel which fills a per STA sample table
with only supported rates (though only at init).
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This lets us later reuse the more generic reg_dfs_region_str().
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Only allow DFS to be set if the DFS regions agree.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
u8 was used in some other places, just stick to the enum,
this forces us to express the values that are expected.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ATM, only the first array value returned by get_random_bytes is used.
This change moves the call to get_random_bytes from the nested loop it
is in to its parent.
While at it, replace get_random_bytes with prandom_bytes since PRNs are
way enough for the selection process.
After this, minstrel_ht reclaims 80 PR-bytes instead of 640 R-bytes.
minstrels use sample tables to probe different rates in a randomized
manner.
minstrel_ht inits one single sample table upon registration (during
subsys_initcalls) and minstrel uses one per STA addition in minstrel.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Consecutive MCSes in [8*(NSS-1)->8*NSS[ have the same number NSS of
streams (except for MCS32 which is mishandled ATM).
ATM minstrel_ht uses MCS_GROUP_RATES in place of this 8 modulus.
This change replaces such occurences and by doing so allows for different
values of MCS_GROUP_RATES (e.g to cope with VHT MCS8,9).
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new field to ieee80211_chanctx_conf to indicate
the min required channel configuration.
Tuning to a narrower channel might help reducing
the noise level and saving some power.
The min required channel definition is the max of
all min required channel definitions of the interfaces
bound to this channel context.
In AP mode, use 20MHz when there are no connected station.
When a new station is added/removed, calculate the new max
bandwidth supported by any of the stations (e.g. 80MHz when
80MHz and 40MHz stations are connected).
In other cases, simply use bss_conf.chandef as the
min required chandef.
Notify drivers about changes to this field by calling
drv_change_chanctx with a new CHANGE_MIN_WIDTH notification.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce shift and mask defines for beamformee STS cap and number
of sounding dimensions cap as these can take any 3 bit value.
While at it also cleanup an unrequired parenthesis.
Signed-off-by: Eyal Shapira <eyal@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There is no reason why we should have only one channel switch
announcement at a time for a single phy. When support for channel
switch with multiple contexts and multiple vifs per context is
implemented, we will need the chandef data for each vif. Move the
csa_chandef structure to sdata to prepare for this.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
[Fixed compilation with mesh]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use put_unaligned_le16 and put_unaligned_le32 for
mesh_path_error_tx and mesh_path_sel_frame_tx.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use put_unaligned_le16 in mesh_plink_frame_tx.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Certain vendors may want to disable the processing of
country IEs so that they can continue using the regulatory
domain the driver or user has set. Currently there is no
way to stop the core from processing country IEs, so add
support to the core to ignore country IE hints.
Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
802.11 cards may have different country IE parsing behavioural
preferences and vendors may want to support these. These preferences
were managed by the REGULATORY_CUSTOM_REG and the REGULATORY_STRICT_REG
flags and their combination. Instead of using this existing notation,
split out the country IE behavioural preferences as a new flag. This
will allow us to add more customizations easily and make the code more
maintainable.
Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up conflicts]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
use put_unaligned_le16 for precedence value in mesh
channel switch support
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reflects that case is now completely separated.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This splits up the driver regulatory update on its
own, this helps simplify the reading the case.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This splits out the user regulatory update on its
own, this helps simplify reading the case.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This splits up the core regulatory update to be
set on its own helper. This should make it easier
to read exactly what type of requests should be
expected there. In this case its clear that
NL80211_REGDOM_SET_BY_CORE is only used by the
core for updating the world regulatory domain.
This is consistant with the nl80211.h documentation.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[add warning to default switch case to avoid compiler warning]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
last_request is RCU protected, since we're getting it
on set_regdom() we might as well pass it to ensure the
same request is being processed, otherwise there is a
small race it could have changed. This makes processing
of the request atomic.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers that set the WIPHY_FLAG_CUSTOM_REGULATORY skip
the core world regulatory domain updates, but do want
their reg_notifier() called. Move the check for this
closer to the source of the check that detected skipped
was required and while at it add a helper for the notifier
calling. This has no functional changes. This brings together
the place where we call the reg_notifier() will be called.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It seems some out of tree drivers were using a regulatory_hint("00")
to trigger off the wiphy regulatory notifier, for those cases just
setting the WIPHY_FLAG_CUSTOM_REGULATORY would suffice to call
the reg_notifier() for a world regulatory domain update. If drivers
find other needs for calling the reg_notifier() a proper implemenation
is preferred.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
All the regulatory request process routines use the
same pattern.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This enforces proper RCU APIs accross the code.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is common code, this reduces the chance of making
a mistake of how we free it.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
By dealing with non country IE conficts first we can shift
the code that deals with the conflict to the left. This has
no functional changes.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is the last split up of the old unified __regultory_hint()
processing set of functionality, it moves the country IE processing
all on its own. This makes it easier to follow and read what exactly
is going on for the case of processing country IEs.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes the code easier to read and follow.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes the code path easier to read and lets us
split out some functionality that is only user specific,
that makes it easier to read the other types of requests.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes the code path easier to read for the core case.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[add warning to default case in switch to avoid compile warning]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently nl80211 allows userspace to send the kernel
a bogus regulatory domain with at most 32 rules set
and it won't reject it until after its allocated
memory. Let's be smart about it and take advantage
that the last_request is now available under RTNL
and check if the alpha2 matches an expected request
and reject any bogus userspace requests prior to
hitting the memory allocator.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a custom regulatory domain is passed and if a rule for a
channel indicates it should be disabled that channel should
always remain disabled as per its documentation and design.
Likewise if WIPHY_FLAG_STRICT_REGULATORY flag is set and a
regulatory_hint() is issued if a channel is disabled that
channel should remain disabled.
Without this change only drivers that set the _orig flags
appropriately on their own would ensure disallowed channels
remaind disabled. This helps drivers save code by relying on
the APIS provided to entrust channels that should not be enabled
be respected by only having to use wiphy_apply_custom_regulatory()
or regulatory_hint() with the WIPHY_FLAG_STRICT_REGULATORY set.
If wiphy_apply_custom_regulatory() is used together with
WIPHY_FLAG_STRICT_REGULATORY and a regulatory_hint() issued
later, the incoming regulatory domain can override previously
set _orig parameters from the initial custom regulatory
setting.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If genregdb.awk assumes the file will end with an
extra empty line or a comment line. This is could
not be true so just address this.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This has no functional change, this just lets us reuse
helpers at a later time.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This adds generic cipher scheme support to mac80211, such schemes
are fully under control by the driver. On hw registration drivers
may specify additional HW ciphers with a scheme how these ciphers
have to be handled by mac80211 TX/RR. A cipher scheme specifies a
cipher suite value, a size of the security header to be added to
or stripped from frames and how the PN is to be verified on RX.
Signed-off-by: Max Stepanov <Max.Stepanov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow beconing after we pass Channel Availability Check (CAC).
Allow non-DFS and DFS channels mix. All DFS channels have to
be in NL80211_DFS_AVAILABLE state (pass CAC).
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To report channel width correctly we have
to send correct channel parameters from
mac80211 when calling cfg80211_cac_event().
This is required in case of using channel width
higher than 20MHz and we have to set correct
dfs channel state after CAC (NL80211_DFS_AVAILABLE).
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no code calling ieee80211_key_replace() with both
arguments NULL and it wouldn't make sense, but in the
interest of maintainability add a warning for it. As a
side effect, this also shuts up a smatch warning.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As the flag is entirely implemented in cfg80211, it should
have been a global feature flag (which I believe didn't
exist at the time). However, there's no reason to allow
drivers to unset the flag, so don't allow it and remove
the validation of NL80211_SCAN_FLAG_FLUSH.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Improve readability of the function by adding the break,
there's no functional impact but it's confusing to fall
through.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Coverity points out that checking assoc_data->ie is
completely useless since it's an array in the struct
and can't be NULL - remove the useless checks.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
802.11-2012 13.3.1 implicitly limits the mesh local link
ID range to that of AID, since for mesh PS the local link
ID must be indicated in the TIM IE, which only holds
IEEE80211_MAX_AID bits.
Also the code was allowing a local link ID of 0, but this
is not correct since that TIM bit is used for indicating
buffered mcast frames.
Generate a random, unique, link ID from 1 - 2007, and drop
a modulo conversion for the local link ID, but keep it for
the peer link ID in case he chose something > MAX_AID.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
[fix some indentation, squash llid assignment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If we store the peer link ID right after initializing a
new neighbor, there is no need to do it later in the
peering FSM.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
All other paths return an error code, do the same here.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The ignore_plink_timer flag is set when doing mod_timer() if
the timer was not previously active. This is to avoid executing
the timeout if del_timer() was subsequently called. However,
del_timer() only happens if we are moving to ESTAB state or
get a close frame while in HOLDING.
We cannot leave HOLDING and re-enter ESTAB unless we receive a
close frame (in which case ignore_plink_timer is already set) or
if the timeout expires, so there actually isn't a case where
this is needed on mod_timer().
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The matches_local check can just be done when looking at the
individual action types.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use C instead of cpp for type checking. Also swap the arguments
into the usual sdata -> sta order.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The initial frame checks differ depending on whether this is
a new peer or not, but they were all intermixed with sta checks
as necessary. Group them together so the two cases are clearer.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reject and accepted close events always put the host in the
holding state and compute a reason code based only on the
current state. Likewise on establish we always do the same
setup. Put these in functions to save some duplicated code.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rather than unlock at the end of each case, do it once after
all is said and done.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Do all frame transfers in one place at the end of the
big switch statements. sta->plid and sta->reason can
be passed in any case, since they are only used for
the frames that need them. Remove assignments to locals
for values already stored in the sta structure.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
According to IEEE 802.11-2012 (8.4.2.104), no peering
management element exists with length 7. This code is checking
to see if llid is present to ignore close frames with different
llid, which would be IEs with length 8.
Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The iniator is already available to us, so use it.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
wiphy_apply_custom_regulatory() implies WIPHY_FLAG_CUSTOM_REGULATORY
but we never enforced it, do that now and warn if the driver
didn't set it. All drivers should be following this today already.
Having WIPHY_FLAG_CUSTOM_REGULATORY does not however mean you will
use wiphy_apply_custom_regulatory() though, you may have your own
_orig value set up tools / helpers. The intel drivers are examples
of this type of driver.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Check chandef we get in CAC request is usable for CAC.
All channels have to be DFS channels. Allow DFS_USABLE
and DFS_AVAILABLE channels mix. At least one channel
has to be DFS_USABLE (require CAC).
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add helper fuctions for start/end freq.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently, frames that go into the reordering buffer are stored at
index ieee80211_sn_sub(sn, tid_rx->ssn) % tid_rx->buf_size.
The offset calculation to the starting sequence number (SSN) is
useless and just adds overhead so simply use sn % tid_rx->buf_size.
This means the reordering buffer will start to be filled somewhere
in the middle (at SSN % buf_size) and continue to get used from
there, but there's no reason to start from the beginning.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These two flags are used for the same purpose, just
combine them into a no-ir flag to annotate no initiating
radiation is allowed.
Old userspace sending either flag will have it treated as
the no-ir flag. To be considerate to older userspace we
also send both the no-ir flag and the old no-ibss flags.
Newer userspace will have to be aware of older kernels.
Update all places in the tree using these flags with the
following semantic patch:
@@
@@
-NL80211_RRF_PASSIVE_SCAN
+NL80211_RRF_NO_IR
@@
@@
-NL80211_RRF_NO_IBSS
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_PASSIVE_SCAN
+IEEE80211_CHAN_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IBSS
+IEEE80211_CHAN_NO_IR
@@
@@
-NL80211_RRF_NO_IR | NL80211_RRF_NO_IR
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IR | IEEE80211_CHAN_NO_IR
+IEEE80211_CHAN_NO_IR
@@
@@
-(NL80211_RRF_NO_IR)
+NL80211_RRF_NO_IR
@@
@@
-(IEEE80211_CHAN_NO_IR)
+IEEE80211_CHAN_NO_IR
Along with some hand-optimisations in documentation, to
remove duplicates and to fix some indentation.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[do all the driver updates in one go]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ATM, the first call of ieee80211_do_open will configure the hw as
non-idle, even if the interface being brought up is not a monitor, and
this leads to inconsistent sequences like:
register_hw()
do_open(sta)
hw_config(non-idle)
(.. sta is non-idle ..)
scan(sta)
hw_config(idle) (after scan finishes)
do_stop(sta)
do_open(sta)
(.. sta is idle ..)
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 3e8b1eb "mac80211/minstrel_ht: improve rate selection stability"
introduced a local capped prob in minstrel_ht_calc_tp but omitted to use
it to compute the per rate throughput.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes wpa_supplicant p2p_find on 5GHz-only devices
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 392b9ff ("mac80211: change beacon/connection polling")
removed the IEEE80211_STA_BEACON_POLL flag.
However, it accidentally removed the setting of
IEEE80211_STA_CONNECTION_POLL, making the connection polling
completely useless (the flag is always clear, so the result
is never being checked). Fix it.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Mesh STA receiving the mesh CSA action frame is not able to trigger
the mesh channel switch due to the incorrect handling and comparison
of mesh channel switch parameters element (MCSP)'s TTL. Make sure
the MCSP's TTL is updated accordingly before calling the
ieee80211_mesh_process_chnswitch. Also, we update the beacon before
forwarding the CSA action frame, so MCSP's precedence value and
initiator flag need to be updated prior to this.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Multicast frames can't be transmitted as part of an aggregation
session (such a session couldn't even be set up) so don't try to
reorder them. Trying to do so would cause the reorder to stop
working correctly since multicast QoS frames (as transmitted by
the Aruba APs this was found with) would cause sequence number
confusion in the buffer.
Cc: stable@vger.kernel.org
Reported-by: Blaise Gassend <blaise@suitabletech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Due to nl80211 API breakage, 5/10 MHz support is broken for
all drivers. Fixing it requires adding new API, but that
can't be done as a bugfix commit since that would require
either updating all APIs in the trees needing the bugfix or
cause different kernels to have incompatible API.
Therefore, just disable 5/10 MHz support for all drivers.
Cc: stable@vger.kernel.org [3.12]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When initializing rates selections starting indexes upon stats update,
the minstrel_sta->max_* rates should be 'group * MCS_GROUP_RATES + i'
not 'i'. This affects settings where one of the peers does not support
any of the rates of the group 0 (i.e. when ht_cap.mcs.rx_mask[0] == 0).
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Mesh beacon was not being rebuild after user triggered a mesh
powersave change.
To solve this issue use ieee80211_mbss_info_change_notify instead
of ieee80211_bss_info_change_notify. This helper function forces
mesh beacon to be rebuild and then notifies the driver about the
beacon change.
Signed-off-by: Javier Lopez <jlopex@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit "mac80211: implement SMPS for AP" applies to AP_VLAN as well.
It assumes that sta->sdata->vif.bss_conf.bssid is present, which did not
get set for AP_VLAN.
Initialize it to sdata->vif.addr like for other interface types.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Coverity pointed out that we might dereference NULL later
if nla_nest_start() returns a failure. This isn't really
true since we'd bomb out before, but we should check the
return value directly, so do that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Coverity pointed out that in the (practically impossible)
error case we leak the message - fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Coverity pointed out that in a few functions we don't
check the return value of the nla_put_*() calls. Most
of these are fairly harmless because the input isn't
very dynamic and controlled by the kernel, but the
pattern is simply wrong, so fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When changing cfg80211 to use RTNL locking, this caused a
deadlock in mac80211 as it calls cfg80211_sched_scan_stopped()
from a work item that's on a workqueue that is flushed with
the RTNL held.
Fix this by simply using schedule_work(), the work only needs
to finish running before the wiphy is unregistered, no other
synchronisation (e.g. with suspend) is really required since
for suspend userspace is already blocked anyway when we flush
the workqueue so will only pick up the event after resume.
Cc: stable@vger.kernel.org
Fixes: 5fe231e873 ("cfg80211: vastly simplify locking")
Reported-and-tested-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Setup chandef for radar event correctly, before we
will clear this in ieee80211_dfs_cac_cancel() function.
Without this patch mac80211 will report wrong channel
width in case we will get radar event during active CAC.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The wext internal chandefs for ibss should be created using the
cfg80211_chandef_create() functions. Initializing fields manually is
error-prone.
Reported-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit ee1f668136.
The aformentioned commit added a check to allow
'iw wlan0 set power_save off' to work for mesh interfaces.
However, this is problematic because it also allows
'iw wlan0 set power_save on', which will crash in short order
because all of the subsequent code manipulates sdata->u.mgd.
The power-saving states for mesh interfaces can be manipulated
through the mesh config, e.g:
'iw wlan0 set mesh_param mesh_power_save=active' (which,
despite the name, actualy disables power saving since the
setting refers to the type of sleep the interface undergoes).
Cc: stable@vger.kernel.org
Fixes: ee1f668136 ("mac80211: allow disable power save in mesh")
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a too small burst is inadvertently set on TBF, we might trigger
a bug in tbf_segment(), as 'skb' instead of 'segs' was used in a
qdisc_reshape_fail() call.
tc qdisc add dev eth0 root handle 1: tbf latency 50ms burst 1KB rate
50mbit
Fix the bug, and add a warning, as such configuration is not
going to work anyway for non GSO packets.
(For some reason, one has to use a burst >= 1520 to get a working
configuration, even with old kernels. This is a probable iproute2/tc
bug)
Based on a report and initial patch from Yang Yingliang
Fixes: e43ac79a4b ("sch_tbf: segment too big GSO packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offenders don't have port numbers, so set it to 0.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bceaa90240 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.
As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.
This broke traceroute and such.
Fixes: bceaa90240 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send icmpv6 error with type "destination unreachable" and code
"address unreachable" when receiving icmpv4 error and sufficient
data bytes are available
This patch enhances the compliance of sit tunnel with section 3.4 of
rfc 4213
Signed-off-by: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch simplifies the checksum verification in tcpX_gro_receive
by reusing the CHECKSUM_COMPLETE code for CHECKSUM_NONE. All it
does for CHECKSUM_NONE is compute the partial checksum and then
treat it as if it came from the hardware (CHECKSUM_COMPLETE).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases we may receive IP packets that are longer than
their stated lengths. Such packets are never merged in GRO.
However, we may end up computing their checksums incorrectly
and end up allowing packets with a bogus checksum enter our
stack with the checksum status set as verified.
Since such packets are rare and not performance-critical, this
patch simply skips the checksum verification for them.
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Thanks,
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function sctp_check_transmitted(transport t, ...) would iterate all of
transport->transmitted queue and looking for the highest __newly__ acked tsn.
The original algorithm would depend on the order of the assoc->transport_list
(in function sctp_outq_sack line 1215 - 1226). The result might not be the
expected due to the order of the tranport_list.
Solution: checking if the exising is smaller than the new one before assigning
Signed-off-by: Chang Xiangzhong <changxiangzhong@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix memory leaks and other issues in mwifiex driver, from Amitkumar
Karwar.
2) skb_segment() can choke on packets using frag lists, fix from
Herbert Xu with help from Eric Dumazet and others.
3) IPv4 output cached route instantiation properly handles races
involving two threads trying to install the same route, but we
forgot to propagate this logic to input routes as well. Fix from
Alexei Starovoitov.
4) Put protections in place to make sure that recvmsg() paths never
accidently copy uninitialized memory back into userspace and also
make sure that we never try to use more that sockaddr_storage for
building the on-kernel-stack copy of a sockaddr. Fixes from Hannes
Frederic Sowa.
5) R8152 driver transmit flow bug fixes from Hayes Wang.
6) Fix some minor fallouts from genetlink changes, from Johannes Berg
and Michael Opdenacker.
7) AF_PACKET sendmsg path can race with netdevice unregister notifier,
fix by using RCU to make sure the network device doesn't go away
from under us. Fix from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
gso: handle new frag_list of frags GRO packets
genetlink: fix genl_set_err() group ID
genetlink: fix genlmsg_multicast() bug
packet: fix use after free race in send path when dev is released
xen-netback: stop the VIF thread before unbinding IRQs
wimax: remove dead code
net/phy: Add the autocross feature for forced links on VSC82x4
net/phy: Add VSC8662 support
net/phy: Add VSC8574 support
net/phy: Add VSC8234 support
net: add BUG_ON if kernel advertises msg_namelen > sizeof(struct sockaddr_storage)
net: rework recvmsg handler msg_name and msg_namelen logic
bridge: flush br's address entry in fdb when remove the
net: core: Always propagate flag changes to interfaces
ipv4: fix race in concurrent ip_route_input_slow()
r8152: fix incorrect type in assignment
r8152: support stopping/waking tx queue
r8152: modify the tx flow
r8152: fix tx/rx memory overflow
netfilter: ebt_ip6: fix source and destination matching
...
Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit 0a06ff068f
("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS").
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently GRO started generating packets with frag_lists of frags.
This was not handled by GSO, thus leading to a crash.
Thankfully these packets are of a regular form and are easy to
handle. This patch handles them in two ways. For completely
non-linear frag_list entries, we simply continue to iterate over
the frag_list frags once we exhaust the normal frags. For frag_list
entries with linear parts, we call pskb_trim on the first part
of the frag_list skb, and then process the rest of the frags in
the usual way.
This patch also kills a chunk of dead frag_list code that has
obviously never ever been run since it ends up generating a bogus
GSO-segmented packet with a frag_list entry.
Future work is planned to split super big packets into TSO
ones.
Fixes: 8a29111c7c ("net: gro: allow to build full sized skb")
Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reported-by: Jerry Chu <hkchu@google.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unfortunately, I introduced a tremendously stupid bug into
genlmsg_multicast() when doing all those multicast group
changes: it adjusts the group number, but then passes it
to genlmsg_multicast_netns() which does that again.
Somehow, my tests failed to catch this, so add a warning
into genlmsg_multicast_netns() and remove the offending
group ID adjustment.
Also add a warning to the similar code in other functions
so people who misuse them are more loudly warned.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salam reported a use after free bug in PF_PACKET that occurs when
we're sending out frames on a socket bound device and suddenly the
net device is being unregistered. It appears that commit 827d9780
introduced a possible race condition between {t,}packet_snd() and
packet_notifier(). In the case of a bound socket, packet_notifier()
can drop the last reference to the net_device and {t,}packet_snd()
might end up suddenly sending a packet over a freed net_device.
To avoid reverting 827d9780 and thus introducing a performance
regression compared to the current state of things, we decided to
hold a cached RCU protected pointer to the net device and maintain
it on write side via bind spin_lock protected register_prot_hook()
and __unregister_prot_hook() calls.
In {t,}packet_snd() path, we access this pointer under rcu_read_lock
through packet_cached_dev_get() that holds reference to the device
to prevent it from being freed through packet_notifier() while
we're in send path. This is okay to do as dev_put()/dev_hold() are
per-cpu counters, so this should not be a performance issue. Also,
the code simplifies a bit as we don't need need_rls_dev anymore.
Fixes: 827d978037 ("af-packet: Use existing netdev reference for bound sockets.")
Reported-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes a code line that is between a "return 0;" and an error label.
This code line can never be reached.
Found by Coverity (CID: 1130529)
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2013-11-21
Please pull this batch of fixes intended for the 3.13 stream!
For the Bluetooth bits, Gustavo says:
"A few fixes for 3.13. There is 3 fixes to the RFCOMM protocol. One
crash fix to L2CAP. A simple fix to a bad behaviour in the SMP
protocol."
On top of that...
Amitkumar Karwar sends a quintet of mwifiex fixes -- two fixes related
to failure handling, two memory leak fixes, and a NULL pointer fix.
Felix Fietkau corrects and earlier rt2x00 HT descriptor handling fix
to address a crash.
Geyslan G. Bem fixes a memory leak in brcmfmac.
Larry Finger address more pointer arithmetic errors in rtlwifi.
Luis R. Rodriguez provides a regulatory fix in the shared ath code.
Sujith Manoharan brings a couple ath9k initialization fixes.
Ujjal Roy offers one more mwifiex fix to avoid invalid memory accesses
when unloading the USB driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter fixes for net
The following patchset contains fixes for your net tree, they are:
* Remove extra quote from connlimit configuration in Kconfig, from
Randy Dunlap.
* Fix missing mss option in syn packets sent to the backend in our
new synproxy target, from Martin Topholm.
* Use window scale announced by client when sending the forged
syn to the backend, from Martin Topholm.
* Fix IPv6 address comparison in ebtables, from Luís Fernando
Cornachioni Estrozi.
* Fix wrong endianess in sequence adjustment which breaks helpers
in NAT configurations, from Phil Oester.
* Fix the error path handling of nft_compat, from me.
* Make sure the global conntrack counter is decremented after the
object has been released, also from me.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In that case it is probable that kernel code overwrote part of the
stack. So we should bail out loudly here.
The BUG_ON may be removed in future if we are sure all protocols are
conformant.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs bits and pieces from Al Viro:
"Assorted bits that got missed in the first pull request + fixes for a
couple of coredump regressions"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fold try_to_ascend() into the sole remaining caller
dcache.c: get rid of pointless macros
take read_seqbegin_or_lock() and friends to seqlock.h
consolidate simple ->d_delete() instances
gfs2: endianness misannotations
dump_emit(): use __kernel_write(), not vfs_write()
dump_align(): fix the dumb braino
Pull slave-dmaengine changes from Vinod Koul:
"This brings for slave dmaengine:
- Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
dmaengine can only transfer and not verify validaty of dma
transfers
- Bunch of fixes across drivers:
- cppi41 driver fixes from Daniel
- 8 channel freescale dma engine support and updated bindings from
Hongbo
- msx-dma fixes and cleanup by Markus
- DMAengine updates from Dan:
- Bartlomiej and Dan finalized a rework of the dma address unmap
implementation.
- In the course of testing 1/ a collection of enhancements to
dmatest fell out. Notably basic performance statistics, and
fixed / enhanced test control through new module parameters
'run', 'wait', 'noverify', and 'verbose'. Thanks to Andriy and
Linus [Walleij] for their review.
- Testing the raid related corner cases of 1/ triggered bugs in
the recently added 16-source operation support in the ioatdma
driver.
- Some minor fixes / cleanups to mv_xor and ioatdma"
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
dma: mv_xor: Remove unneeded NULL address check
ioat: fix ioat3_irq_reinit
ioat: kill msix_single_vector support
raid6test: add new corner case for ioatdma driver
ioatdma: clean up sed pool kmem_cache
ioatdma: fix selection of 16 vs 8 source path
ioatdma: fix sed pool selection
ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
dmatest: verbose mode
dmatest: convert to dmaengine_unmap_data
dmatest: add a 'wait' parameter
dmatest: add basic performance metrics
dmatest: add support for skipping verification and random data setup
dmatest: use pseudo random numbers
dmatest: support xor-only, or pq-only channels in tests
dmatest: restore ability to start test at module load and init
dmatest: cleanup redundant "dmatest: " prefixes
dmatest: replace stored results mechanism, with uniform messages
Revert "dmatest: append verify result to results"
...
bridge dev
When the following commands are executed:
brctl addbr br0
ifconfig br0 hw ether <addr>
rmmod bridge
The calltrace will occur:
[ 563.312114] device eth1 left promiscuous mode
[ 563.312188] br0: port 1(eth1) entered disabled state
[ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects
[ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9
[ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8
[ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8
[ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78
[ 563.468209] Call Trace:
[ 563.468218] [<ffffffff814d1c92>] dump_stack+0x6a/0x78
[ 563.468234] [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100
[ 563.468242] [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge]
[ 563.468247] [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge]
[ 563.468254] [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0
[ 563.468259] [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b
[ 570.377958] Bridge firewalling registered
--------------------------- cut here -------------------------------
The reason is that when the bridge dev's address is changed, the
br_fdb_change_mac_address() will add new address in fdb, but when
the bridge was removed, the address entry in the fdb did not free,
the bridge_fdb_cache still has objects when destroy the cache, Fix
this by flushing the bridge address entry when removing the bridge.
v2: according to the Toshiaki Makita and Vlad's suggestion, I only
delete the vlan0 entry, it still have a leak here if the vlan id
is other number, so I need to call fdb_delete_by_port(br, NULL, 1)
to flush all entries whose dst is NULL for the bridge.
Suggested-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following commit:
b6c40d68ff
net: only invoke dev->change_rx_flags when device is UP
tried to fix a problem with VLAN devices and promiscuouse flag setting.
The issue was that VLAN device was setting a flag on an interface that
was down, thus resulting in bad promiscuity count.
This commit blocked flag propagation to any device that is currently
down.
A later commit:
deede2fabe
vlan: Don't propagate flag changes on down interfaces
fixed VLAN code to only propagate flags when the VLAN interface is up,
thus fixing the same issue as above, only localized to VLAN.
The problem we have now is that if we have create a complex stack
involving multiple software devices like bridges, bonds, and vlans,
then it is possible that the flags would not propagate properly to
the physical devices. A simple examle of the scenario is the
following:
eth0----> bond0 ----> bridge0 ---> vlan50
If bond0 or eth0 happen to be down at the time bond0 is added to
the bridge, then eth0 will never have promisc mode set which is
currently required for operation as part of the bridge. As a
result, packets with vlan50 will be dropped by the interface.
The only 2 devices that implement the special flag handling are
VLAN and DSA and they both have required code to prevent incorrect
flag propagation. As a result we can remove the generic solution
introduced in b6c40d68ff and leave
it to the individual devices to decide whether they will block
flag propagation or not.
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CPUs can ask for local route via ip_route_input_noref() concurrently.
if nh_rth_input is not cached yet, CPUs will proceed to allocate
equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
via rt_cache_route()
Most of the time they succeed, but on occasion the following two lines:
orig = *p;
prev = cmpxchg(p, orig, rt);
in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
so dst is leaking. dst_destroy() is never called and 'lo' device
refcnt doesn't go to zero, which can be seen in the logs as:
unregister_netdevice: waiting for lo to become free. Usage count = 1
Adding mdelay() between above two lines makes it easily reproducible.
Fix it similar to nh_pcpu_rth_output case.
Fixes: d2d68ba9fe ("ipv4: Cache input routes in fib_info nexthops.")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Mostly these are fixes for fallout due to merge window changes, as
well as cures for problems that have been with us for a much longer
period of time"
1) Johannes Berg noticed two major deficiencies in our genetlink
registration. Some genetlink protocols we passing in constant
counts for their ops array rather than something like
ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
using fixed IDs for their multicast groups.
We have to retain these fixed IDs to keep existing userland tools
working, but reserve them so that other multicast groups used by
other protocols can not possibly conflict.
In dealing with these two problems, we actually now use less state
management for genetlink operations and multicast groups.
2) When configuring interface hardware timestamping, fix several
drivers that simply do not validate that the hwtstamp_config value
is one the driver actually supports. From Ben Hutchings.
3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.
4) In dev_forward_skb(), set the skb->protocol in the right order
relative to skb_scrub_packet(). From Alexei Starovoitov.
5) Bridge erroneously fails to use the proper wrapper functions to make
calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
Makita.
6) When detaching a bridge port, make sure to flush all VLAN IDs to
prevent them from leaking, also from Toshiaki Makita.
7) Put in a compromise for TCP Small Queues so that deep queued devices
that delay TX reclaim non-trivially don't have such a performance
decrease. One particularly problematic area is 802.11 AMPDU in
wireless. From Eric Dumazet.
8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
here. Fix from Eric Dumzaet, reported by Dave Jones.
9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.
10) When computing mergeable buffer sizes, virtio-net fails to take the
virtio-net header into account. From Michael Dalton.
11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
bumping, this one has been with us for a while. From Eric Dumazet.
12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
Hugne.
13) 6lowpan bit used for traffic classification was wrong, from Jukka
Rissanen.
14) macvlan has the same issue as normal vlans did wrt. propagating LRO
disabling down to the real device, fix it the same way. From Michal
Kubecek.
15) CPSW driver needs to soft reset all slaves during suspend, from
Daniel Mack.
16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.
17) The xen-netfront RX buffer refill timer isn't properly scheduled on
partial RX allocation success, from Ma JieYue.
18) When ipv6 ping protocol support was added, the AF_INET6 protocol
initialization cleanup path on failure was borked a little. Fix
from Vlad Yasevich.
19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
blocks we can do the wrong thing with the msg_name we write back to
userspace. From Hannes Frederic Sowa. There is another fix in the
works from Hannes which will prevent future problems of this nature.
20) Fix route leak in VTI tunnel transmit, from Fan Du.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
genetlink: make multicast groups const, prevent abuse
genetlink: pass family to functions using groups
genetlink: add and use genl_set_err()
genetlink: remove family pointer from genl_multicast_group
genetlink: remove genl_unregister_mc_group()
hsr: don't call genl_unregister_mc_group()
quota/genetlink: use proper genetlink multicast APIs
drop_monitor/genetlink: use proper genetlink multicast APIs
genetlink: only pass array to genl_register_family_with_ops()
tcp: don't update snd_nxt, when a socket is switched from repair mode
atm: idt77252: fix dev refcnt leak
xfrm: Release dst if this dst is improper for vti tunnel
netlink: fix documentation typo in netlink_set_err()
be2net: Delete secondary unicast MAC addresses during be_close
be2net: Fix unconditional enabling of Rx interface options
net, virtio_net: replace the magic value
ping: prevent NULL pointer dereference on write to msg_name
bnx2x: Prevent "timeout waiting for state X"
bnx2x: prevent CFC attention
bnx2x: Prevent panic during DMAE timeout
...
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.
This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.
At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to unregister the multicast group if the
generic netlink family is registered immediately after.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The quota code is abusing the genetlink API and is using
its family ID as the multicast group ID, which is invalid
and may belong to somebody else (and likely will.)
Make the quota code use the correct API, but since this
is already used as-is by userspace, reserve a family ID
for this code and also reserve that group ID to not break
userspace assumptions.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The drop monitor code is abusing the genetlink API and is
statically using the generic netlink multicast group 1, even
if that group belongs to somebody else (which it invariably
will, since it's not reserved.)
Make the drop monitor code use the proper APIs to reserve a
group ID, but also reserve the group id 1 in generic netlink
code to preserve the userspace API. Since drop monitor can
be a module, don't clear the bit for it on unregistration.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.
The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
snd_nxt must be updated synchronously with sk_send_head. Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.
Here is a kernel panic from my host.
[ 103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[ 146.301158] Call Trace:
[ 146.301158] [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0
Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.
In that moment we had a socket where write queue looks like this:
snd_una snd_nxt write_seq
|_________|________|
|
sk_send_head
After a second switching from repair mode the state of socket was
changed:
snd_una snd_nxt, write_seq
|_________ ________|
|
sk_send_head
This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.
Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.
tcp_write_wakeup
skb = tcp_send_head(sk);
tcp_fragment
if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
tcp_adjust_pcount(sk, skb, diff);
tcp_event_new_data_sent
tp->packets_out += tcp_skb_pcount(skb);
I think update of snd_nxt isn't required, when a socket is switched from
repair mode. Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.
I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.
otherwise causing dst memory leakage.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The parameter is just 'group', not 'groups', fix the documentation typo.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCSHWTSTAMP returns the real configuration to the application
using it, but there is currently no way for any other
application to find out the configuration non-destructively.
Add a new ioctl for this, making it unprivileged.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
This bug was introduced on commit 0898f99a2. This just recovers two
checks that existed before as suggested by Bart De Schuymer.
Signed-off-by: Luís Fernando Cornachioni Estrozi <lestrozi@uolinc.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We don't need to check that ifr_data itself is a valid user pointer,
but we should check &ifr_data is. Thankfully the copy of ifr_name is
checked, so this can only leak a few bytes from immediately above the
user address limit.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6d0bfe2261
net: ipv6: Add IPv6 support to the ping socket
introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse pointed out that the new flags variable I had added
shadowed an existing one, rename the new one to avoid that,
making the code clearer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.
If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.
Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SYSCTL=n the following build warning happens:
net/ipv6/ndisc.c:1730:1: warning: label 'out' defined but not used [-Wunused-label]
The 'out' label is only used when CONFIG_SYSCTL=y, so move it inside the
'ifdef CONFIG_SYSCTL' block.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch 0ca743a55991: "netfilter: nf_tables: add compatibility
layer for x_tables", leads to the following Smatch
warning: "net/netfilter/nft_compat.c:140 nft_parse_compat()
warn: signedness bug returning '(-34)'"
This nft_parse_compat function returns error codes but the return
type is u8 so the error codes are transformed into small positive
values. The callers don't check the return.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In commit 41d73ec053, sequence number adjustments were moved to a
separate file. Unfortunately, the sequence numbers that are stored
in the nf_ct_seqadj structure are expressed in host byte order. The
necessary ntohl call was removed when the call to adjust_tcp_sequence
was collapsed into nf_ct_seqadj_set. This broke the FTP NAT helper.
Fix it by adding back the byte order conversions.
Reported-by: Dawid Stawiarski <dawid.stawiarski@netart.pl>
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Timestamp are used to store additional syncookie parameters such as sack,
ecn, and wscale. The wscale value we need to encode is the client's
wscale, since we can't recover that later in the session. Next overwrite
the wscale option so the later synproxy_send_client_synack will send
the backend's wscale to the client.
Signed-off-by: Martin Topholm <mph@one.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When the synproxy_parse_options is called on the client ack the mss
option will not be present. Consequently mss wont be included in the
backend syn packet, which falls back to 536 bytes mss.
Therefore XT_SYNPROXY_OPT_MSS is explicitly flagged when recovering mss
value from cookie.
Signed-off-by: Martin Topholm <mph@one.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSh95hAAoJEGcL54qWCgDy1wgP/1zc4C7sMBQFWpIo676MHT4n
m5v4bWgYhRBC0dne5GG8dC4+Q2cPkua4H7cWHCJKQmMuDmbzgOB33RVyQdwU/YNp
ItLIZLz2EySCKo8OOKvbf4l5jDFeoBYEbheB2bmcE42BgixaTbiHKXpgCtoHr5pT
qOX0JI29QtstAY3heiLW52bA3OqNJGwfE595KKEHXZwcD0n8izjqOU7Vrqj0E8/Q
S+Xw9a613fo7chzbdcugR+iW6kkr7qtjxXiI5OXvplGyHycbBJRfvAqHkg01Z69k
At9Y43cTEFiEx/zfKflmiFkn+IF9xFhABYNCKvpTtLFvQkwJDfYHa6h2jrFac/87
mTRZHIzJ0nghhE1VxOEjA2zvIE3Hd5Xk4By+2BKJaB/Tp0RPbSsHs7t0s8t7RdHi
ZwP/bNDynZY3S+HlbMor3A3900bUXLQBpCpRt/0+Hvc5bGLRszA5/Jinv+EqwOT9
LHXTE/CsQGJCOz72SjDZT4Gsa0t11UKdRpznk4XCEvH9tflK78nS32XUktZEC9u/
bCycLbvX+LrquxjQ9WN2TCmwnwyEiv45tSK2b8gf8JS1zJmePDKdnQ1dpHbiZAIO
uhEhAqDwAY64+T2+AGncITh8ZfthZhU6wkfGoepqYvC1/5AaeSWrFidDvE1NJUGh
xjcsGH6Ym8NnnT3rt/qp
=uOIM
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes:
- Stable fix for data corruption when retransmitting O_DIRECT writes
- Stable fix for a deep recursion/stack overflow bug in rpc_release_client
- Stable fix for infinite looping when mounting a NFSv4.x volume
- Fix a typo in the nfs mount option parser
- Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is
* tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix pnfs Kconfig defaults
NFS: correctly report misuse of "migration" mount option.
nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
SUNRPC: Avoid deep recursion in rpc_release_client
SUNRPC: Fix a data corruption issue when retransmitting RPC calls
Pull nfsd changes from Bruce Fields:
"This includes miscellaneous bugfixes and cleanup and a performance fix
for write-heavy NFSv4 workloads.
(The most significant nfsd-relevant change this time is actually in
the delegation patches that went through Viro, fixing a long-standing
bug that can cause NFSv4 clients to miss updates made by non-nfs users
of the filesystem. Those enable some followup nfsd patches which I
have queued locally, but those can wait till 3.14)"
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (24 commits)
nfsd: export proper maximum file size to the client
nfsd4: improve write performance with better sendspace reservations
svcrpc: remove an unnecessary assignment
sunrpc: comment typo fix
Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation"
nfsd4: fix discarded security labels on setattr
NFSD: Add support for NFS v4.2 operation checking
nfsd4: nfsd_shutdown_net needs state lock
NFSD: Combine decode operations for v4 and v4.1
nfsd: -EINVAL on invalid anonuid/gid instead of silent failure
nfsd: return better errors to exportfs
nfsd: fh_update should error out in unexpected cases
nfsd4: need to destroy revoked delegations in destroy_client
nfsd: no need to unhash_stid before free
nfsd: remove_stid can be incorporated into nfs4_put_delegation
nfsd: nfs4_open_delegation needs to remove_stid rather than unhash_stid
nfsd: nfs4_free_stid
nfsd: fix Kconfig syntax
sunrpc: trim off EC bytes in GSSAPI v2 unwrap
gss_krb5: document that we ignore sequence number
...
Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicates
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For performance reasons, sch_fq tried hard to not setup timers for every
sent packet, using a quantum based heuristic : A delay is setup only if
the flow exhausted its credit.
Problem is that application limited flows can refill their credit
for every queued packet, and they can evade pacing.
This problem can also be triggered when TCP flows use small MSS values,
as TSO auto sizing builds packets that are smaller than the default fq
quantum (3028 bytes)
This patch adds a 40 ms delay to guard flow credit refill.
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing")
obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.
Suggested by David Miller
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().
Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...
A macvlan device has always LRO disabled so that calling
dev_disable_lro() on it does nothing. If we need to disable LRO
e.g. because
- the macvlan device is inserted into a bridge
- IPv6 forwarding is enabled for it
- it is in a different namespace than lowerdev and IPv4
forwarding is enabled in it
we need to disable LRO on its underlying device instead (as we
do for 802.1q VLAN devices).
v2: use newly introduced netif_is_macvlan()
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
If priority/traffic class field in IPv6 header is set (seen when
using ssh), the uncompression sets the TC and Flow fields incorrectly.
Example:
This is IPv6 header of a sent packet. Note the priority/TC (=1) in
the first byte.
00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: 02 1e ab ff fe 4c 52 57
This gets compressed like this in the sending side
00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16
00000010: aa 2d fe 92 86 4e be c6 ....
In the receiving end, the packet gets uncompressed to this
IPv6 header
00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: ab ff fe 4c 52 57 ec c2
First four bytes are set incorrectly and we have also lost
two bytes from destination address.
The fix is to switch the case values in switch statement
when checking the TC field.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the following Smatch warning:
net/tipc/link.c:2364 tipc_link_recv_fragment()
warn: variable dereferenced before check '*head' (see line 2361)
A null pointer might be passed to skb_try_coalesce if
a malicious sender injects orphan fragments on a link.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
some robustness fixes for broken virtio devices, plus minor tweaks.
[vs last pull request: added the virtio-scsi broken vq escape patch, which
I somehow lost.]
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSgDsJAAoJENkgDmzRrbjxEE4P/jXqZHS/HdlxW9k0BjKKlEIF
PdtCoP3UhWTdskXvy2pD8m6nYn214MEJYUIa4HFlIEZsdxhuexzQHY19Ynkjagyv
57sRsUUm5fYQLIL7IUh2DUD1VU38hUFinno/y333szzvCj9qITDA/QABsiWxK8NO
dq+Lmeixgrhc5yN9iryW+gZV+hekJIZ4LsU5ejSaJucKblzXUH8qIbmSthG7RTYJ
tr4J7xTTXbhxY4CoC5Dpx2hvsFkvzaAIvI4Nr1mDjfq5cR8BaYvnC89U1IbhdAey
p1AbZE58JLrY+Z8K8LBRGV2KjO8qSZ6R47hbZ9nAnodJYB7sZLyj6jUe1q+/htuC
Dh9Xm9O4eW2xNaFk20dYeIF4UU5/HzdsbvG/IlH8x4sm8/K706ocYyAOHlzYUg2T
k7gltrgDzDokMgb2R44gwnr4oaJ2q8Gne6JXswlPEv2eRs6vNnA5Xhc0rEHGkU6C
gYn1vNFN6yx0vf2syG/Ce5pZtMxGpefKQkHzzWdq8FKr1B9s54dDuf2hls7J8A9t
OQT1gE33yURSelf4Kh4k9zWXaWk/Ohv9l2R1cqpALnJ4/+q0fP5t7HdK500S7aax
DxLeFeqvsBw7nlWgsGxQmt+fjITQFHhcDiwst0ehnt6RbDEW7XPIguz0K/gyhxYG
+UNbl/5Gr64jnUX3YCzm
=vY2L
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"Nothing really exciting: some groundwork for changing virtio endian,
and some robustness fixes for broken virtio devices, plus minor
tweaks"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_scsi: verify if queue is broken after virtqueue_get_buf()
x86, asmlinkage, lguest: Pass in globals into assembler statement
virtio: mmio: fix signature checking for BE guests
virtio_ring: adapt to notify() returning bool
virtio_net: verify if queue is broken after virtqueue_get_buf()
virtio_console: verify if queue is broken after virtqueue_get_buf()
virtio_blk: verify if queue is broken after virtqueue_get_buf()
virtio_ring: add new function virtqueue_is_broken()
virtio_test: verify if virtqueue_kick() succeeded
virtio_net: verify if virtqueue_kick() succeeded
virtio_ring: let virtqueue_{kick()/notify()} return a bool
virtio_ring: change host notification API
virtio_config: remove virtio_config_val
virtio: use size-based config accessors.
virtio_config: introduce size-based accessors.
virtio_ring: plug kmemleak false positive.
virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.
Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If 'hsr_get_node_data()' returns error, going directly to 'fail' label
doesn't free the memory pointed by 'skb_out'.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial sch_fq implementation copied code from pfifo_fast to classify
a packet as a high prio packet.
This clashes with setups using PRIO with say 7 bands, as one of the
band could be incorrectly (mis)classified by FQ.
Packets would be queued in the 'internal' queue, and no pacing ever
happen for this special queue.
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:
@@
identifier ops;
@@
+const
struct genl_ops ops[] = {
...
};
(except the struct thing in net/openvswitch/datapath.c)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow making the ops array const by not modifying the ops
flags on registration but rather only when ops are sent
out in the family information.
No users are updated yet except for the pre_doit/post_doit
calls in wireless (the only ones that exist now.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using a linked list, use an array. This reduces
the data size needed by the users of genetlink, for example
in wireless (net/wireless/nl80211.c) on 64-bit it frees up
over 1K of data space.
Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
-EINVAL anyway, therefore no such event could ever be sent.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
genl_register_ops() is still needed for internal registration,
but is no longer available to users of the API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This simplifies the code since there's no longer a need to
have error handling in the registration.
Unfortunately it means more extern function declarations are
needed, but the overall goal would seem to justify this.
Due to the removal of duplication in the netlink policies,
this reduces the size of wimax by almost 1k.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>