If there are keys left during station removal, then a
synchronize_net() will be done (for each key, I have a
patch to address this for 3.10), otherwise it won't be
done at all which causes issues because the station
could be used for TX while it's being removed from the
driver -- that might confuse the driver.
Fix this by always doing synchronize_net() if no key
was present any more.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There is a corner case which wasn't being covered:
userspace may authenticate and allocate stations,
but still leave the peering up to the kernel.
Initialize the peering timer if the MPM is not in
userspace, in a path which is taken by both the kernel and
userspace when allocating stations.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the user requested a userspace MPM, automatically
disable auto_open_plinks to fully disable the kernel MPM.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Earlier mac80211 would check whether some kind of mesh
security was enabled, when the real question was "is the
MPM in userspace"?
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The mesh station types used to refer to whether the
station was secure or nonsecure. Really the salient
information is whether it is managed by the kernel or
userspace
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Secure mesh had the implicit requirement that the Mesh
Peering Management entity be in userspace. However
userspace might want to implement an open MPM as well, so
specify a mesh setup parameter to indicate this.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch improves the way minstrel sorts rates according to throughput
and success probability. 3 FOR-loops across the entire rate set in function
minstrel_update_stats() which where used to determine the fastest, second
fastest and most robust rate are reduced to 1 FOR-loop.
The sorted list of rates according throughput is extended to the best four
rates as we need them in upcoming joint rate and power control. The sorting
is done via the new function minstrel_sort_best_tp_rates().
The most robust rate selection is aligned with minstrel_ht's approach.
Once any success probability is above 95% the one with the highest
throughput is chosen as most robust rate. If success probabilities of all
rates are below 95%, the rate with the highest succ. prob. is elected as
most robust one
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Based on minstrel_ht this patch treats success probabilities below 10% as
implausible values for throughput calculation in minstrel's statistics.
Current throughput per rate with such a low success probability is reset
to 0 MBit/s.
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While minstrel bootstraps and fills the success probabilities of each
rate the lowest rate has typically a very high success probability
(often 100% in our tests).
Its statistics are never updated but considered to setup the mrr chain.
In our tests we see that especially the 3rd mrr stage (which is that
rate providing highest success probability) is filled with the lowest rate
because its initial high sucess probability is never updated. By design
the 4th mrr stage is filled with the lowest rate so often 3rd and 4th
mrr stage are equal.
This patch follows minstrels general approach of assuming as little
as possible about rate dependencies. Consequently we include the
lowest rate into the random sampling table to get balanced up-to-date
statistics of all rates and therefore balanced decisions.
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Minstrel's decision which rate should be directly sampled within the
1st mrr stage is limited to such rates faster than the current max
throughput rate. All rates below the current max. throughput rate
are indirectly sampled via the 2nd mrr stage.
This approach leads to deprecated per rate statistics and therfore
a deprecated mrr chain setup.
This patch uses the sampling approach from minstrel_ht. A counter is
added to sum all indirect sample attempts per rate. After 20 indirect
sampling attempts the rate is directly sampled within the 1st mrr stage.
Therefore more up-to-date statistics for all rates are maintained and
used to setup the mrr chain.
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add documentation and more verbose variable names to minstrel's
multi-rate-retry setup within function minstrel_get_rate() to
increase the readability of the algorithm.
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Both minstrel versions use individual ways to scale up integer values
to perform calculations. Merge minstrel_ht's scaling macros into
minstrels header file and use them in both minstrel versions.
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Both rate control algorithms (minstrel and minstrel_ht) calculate
averages based on EWMA. Shift function minstrel_ewma() into
rc80211_minstrel.h and make use of it in both minstrel version.
Also shift the default EWMA level (75%) definition to the header file
and clean up variable usage.
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The last minstrel_ht changes increased the sampling frequency for
potentially useful rates to decrease the response time to rate
fluctuations. This caused an increase in sampling frequency that can
slightly reduce throughput, so this patch limits the sampling attempts
to one per rate instead of two.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For VHT, the wider bandwidths (up to 160 MHz) need
to be allowed. Since world roaming only covers the
case of connecting to an AP, it can be opened up
there, we will rely on the AP to know the local
regulations.
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no reason TDLS should be prevented on P2P client
interfaces, and most of the code already handles it, so
allow adding stations for it.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement restricting peer VHT capabilities to the device's own
capabilities. This is useful when a single driver supports more
than one device and the devices have different capabilities
(often they will differ in the number of spatial streams), but
in particular is also necessary for VHT capability overrides to
work correctly -- otherwise it'd be possible to e.g. advertise,
due to overrides, that TX-STBC is not supported, but then still
use it to TX to the AP because it supports RX-STBC.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
HT capabilites are asymmetric -- e.g. beamforming is both an
RX and TX capability. If, for example, we support RX but not
TX, the RX capability of the AP station is masked out (if it
supports it). This works correctly if it's really the driver
capability.
If, on the other hand, the reason for not supporting TX BF
is that it was removed by HT capability overrides then the
wrong thing happens: the AP's TX capability will be removed
rather than its RX capability, because the override function
works on own capabilities, not remote ones, and doesn't take
the asymmetry into account.
To fix this make a copy of our own capabilities, apply the
overrides to them (where needed) and then use that to set up
the peer's capabilities.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The HT overrides are intended only for the connection
to the AP, not for any other purpose. Therefore, don't
apply them to TDLS peers that are also stations added
to a managed station interface.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For AP interfaces, there's no need to flush stations
or keys again when the interface is stopped as already
happened when the BSS was stopped on the interface.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since hostapd will remove keys this isn't usually
an issue, but we shouldn't leak keys to the next
BSS started on the same interface. For VLANs this
also fixes a bug, keys that aren't removed would
otherwise be leaked.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
During roaming, the crypto_tx_tailroom_needed_cnt counter
will often take values 2,1,0,1,2 because first keys are
removed and then new keys are added. This is inefficient
because during the 0->1 transition, synchronize_net must
be called to avoid packet races, although typically no
packets would be flowing during that time.
To avoid that, defer the decrement (2->1, 1->0) when keys
are removed (by half a second). This means the counter
will really have the values 2,2,2,3,4 ... 2, thus never
reaching 0 and having to do the 0->1 transition.
Note that this patch entirely disregards the drivers for
which this optimisation was done to start with, for them
the key removal itself will be expensive because it has
to synchronize_net() after the counter is incremented to
remove the key from HW crypto. For them the sequence will
look like this: 0,1,0,1,0,1,0,1,0 (*) which is clearly a
lot more inefficient. This could be addressed separately,
during key removal the 0->1->0 sequence isn't necessary.
(*) it starts at 0 because HW crypto is on, then goes to
1 when HW crypto is disabled for a key, then back to
0 because the key is deleted; this happens for both
keys in the example. When new keys are added, it goes
to 1 first because they're added in software; when a
key is moved to hardware it goes back to 0
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no driver using this flag, so it seems
that all drivers support HW crypto with WMM or
don't support it at all. Remove the flag and
code setting it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove not used any longer suspend/resume code.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove not used any longer suspend/resume code.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove not used any longer suspend/resume code.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since now we disconnect before suspend, various code which save
connection state can now be removed from suspend and resume
procedure. Cleanup on resume side is smaller as ieee80211_reconfig()
is also used for H/W restart.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If possible that after suspend, cfg80211 will receive request to
disconnect what require action on interface that was removed during
suspend.
Problem can manifest itself by various warnings similar to below one:
WARNING: at net/mac80211/driver-ops.h:12 ieee80211_bss_info_change_notify+0x2f9/0x300 [mac80211]()
wlan0: Failed check-sdata-in-driver check, flags: 0x4
Call Trace:
[<c043e0b3>] warn_slowpath_fmt+0x33/0x40
[<f83707c9>] ieee80211_bss_info_change_notify+0x2f9/0x300 [mac80211]
[<f83a660a>] ieee80211_recalc_ps_vif+0x2a/0x30 [mac80211]
[<f83a6706>] ieee80211_set_disassoc+0xf6/0x500 [mac80211]
[<f83a9441>] ieee80211_mgd_deauth+0x1f1/0x280 [mac80211]
[<f8381b36>] ieee80211_deauth+0x16/0x20 [mac80211]
[<f8261e70>] cfg80211_mlme_down+0x70/0xc0 [cfg80211]
[<f8264de1>] __cfg80211_disconnect+0x1b1/0x1d0 [cfg80211]
To fix the problem disconnect from any associated network before
suspend. User space is responsible to establish connection again
after resume. This basically need to be done by user space anyway,
because associated stations can go away during suspend (for example
NetworkManager disconnects on suspend and connect on resume by default).
Patch also handle situation when driver refuse to suspend with wowlan
configured and try to suspend again without it.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since two years no mac80211 driver implement support for NAPI. Looks
this feature is unneeded, so remove it from generic mac80211 code.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A sample attempt should only count in mi->sample_tries if the sample
attempt wasn't skipped based on slower rate criteria.
This patch increases the sampling frequency for potentially desirable
rates and thus enables faster recovery from interference or collisions.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a rate is below the max_tp_rate, sample it frequently if:
- it is above max_tp_rate2, or
- it is above max_prob_rate and is a candidate for max_prob_rate
(has fewer streams than max_tp_rate).
This helps the retry chain recover more quickly from bad statistics
caused by collisions or interference, and slightly reduces throughput
fluctuations with higher rates.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Try to sample all available rates, as sample attempts do not cost much
airtime and are appropriately spaced based on the average A-MPDU length.
This helps with faster recovery on rate fluctuations.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
max_prob_rate should be selected to be very reliable, however limiting
it to single-stream on 3-stream devices is a bit much.
Allow max_prob_rate to use one stream less than the max_tp_rate.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
At high data rates the average frame transmission durations are small
enough for rounding errors to matter, sometimes causing minstrel to use
slightly lower transmit rates than necessary.
To fix this, change the unit of the duration value to nanoseconds
instead of microseconds, and reorder the multiplications/divisions when
calculating the throughput metric so that they don't overflow or
truncate prematurely.
At 2-stream HT40 this makes TCP throughput a bit more stable.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add NL80211_CMD_UPDATE_FT_IES to support update of FT IEs to the WLAN
driver and NL80211_CMD_FT_EVENT to send FT events from the WLAN driver.
This will carry the target AP's MAC address along with the relevant
Information Elements. This event is used to report received FT IEs
(MDIE, FTIE, RSN IE, TIE, RICIE). These changes allow FT to be supported
with drivers that use an internal SME instead of user space option (like
FT implementation in wpa_supplicant with mac80211-based drivers).
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's not useful to specify a 0 keepalive interval, this
would send too much data. Prohibit this to also avoid
device issues.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some devices can handle remain on channel requests differently
based on the request type/priority. Add support to
differentiate between different ROC types, i.e., indicate that
the ROC is required for sending managment frames.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211_mlme_assoc() has grown far too many arguments,
make the caller build almost all of the driver struct
and pass that to the function instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For testing it's sometimes useful to be able to
override certain VHT capability advertisement,
add the ability to do that in cfg80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is the sort of thing gcc's LTO could do, but since
we don't have that yet we can also do it manually. The
advantage is reduced code, both source and binary, e.g.
on x86-64
text data bss dec hex filename
442825 56230 776 499831 7a077 cfg80211.ko (before)
441585 56230 776 498591 79b9f cfg80211.ko (after)
a reduction of ~1k.
But in order to not complicate the code move only those
functions that are simple wrappers, not those that have
functionality of their own.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add back the channel width and extended capability data
to wiphy information if split information is supported.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move the sequence number arithmetic code from mac80211 to
ieee80211.h so others can use it. Also rename the functions
from _seq to _sn, they operate on the sequence number, not
the sequence_control field.
Also move macros to convert the sequence control to/from
the sequence number value from various drivers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add back the previously removed TCP WoWLAN information,
but only if userspace is prepared to deal with large
wiphy capability data dumps.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If userspace is updated to deal with large split wiphy
information dumps, add back the radar information that
could otherwise push the data over the limit of the
netlink dump messages.
Cc: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The per-wiphy information is getting large, to the point
where with more than the typical number of channels it's
too large and overflows, and userspace can't get any of
the information at all.
To address this (in a way that doesn't require making all
messages bigger) allow userspace to specify that it can
deal with wiphy information split across multiple parts
of the dump, and if it can split up the data. This also
splits up each channel separately so an arbitrary number
of channels can be supported.
Additionally, since GET_WIPHY has the same problem, add
support for filtering the wiphy dump and get information
for a single wiphy only, this allows userspace apps to
use dump in this case to retrieve all data from a single
device.
As userspace needs to know if all this this is supported,
add a global nl80211 feature set and include a bit for
this behaviour in it.
Cc: Dennis H Jensen <dennis.h.jensen@siemens.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The comment says something about __skb_push(), but that
isn't even called in the code any more. Looking at the
git history, that comment never even made sense when it
was still called, so just replace that part to note it
still works even when align isn't 0 or 2.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
'rfkill_regulator_ops' is used only in this file. Hence make it static.
Silences the following warning:
net/rfkill/rfkill-regulator.c:54:19: warning:
symbol 'rfkill_regulator_ops' was not declared. Should it be static?
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The station change API isn't being checked properly before
drivers are called, and as a result it is difficult to see
what should be allowed and what not.
In order to comprehensively check the API parameters parse
everything first, and then have the driver call a function
(cfg80211_check_station_change()) with the additionally
information about the kind of station that is being changed;
this allows the function to make better decisions than the
old code could.
While at it, also add a few checks, particularly in mesh
and clarify the TDLS station lifetime in documentation.
To be able to reduce a few checks, ignore any flag set bits
when the mask isn't set, they shouldn't be applied then.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make the ability to leave the plink_state unchanged not use a
magic -1 variable that isn't in the enum, but an explicit change
flag; reject invalid plink states or actions and move the needed
constants for plink actions to the right header file. Also
reject plink_state changes for non-mesh interfaces.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a router forwards a packet that contains the IPv4 timestamp option,
if there is no space left in the option for the router to add its own
timestamp, then the router increments the Overflow value in the option.
However, if the addresses of the routers are prespecified in the option,
then the overflow condition cannot happen: the option is structured so
that each prespecified router has a place to write its timestamp. Other
routers do not add a timestamp, so there will never be a lack of space.
This fix ensures that the Overflow value in the IPv4 timestamp option is
not incremented when the addresses of the routers are prespecified, even
if the Pointer value is greater than the Length value.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should use time_after_eq() to get maximum latency of two ticks,
instead of three.
Bug added in commit 24f8b2385 (net: increase receive packet quantum)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix new kernel-doc warnings in net/core/dev.c:
Warning(net/core/dev.c:4788): No description found for parameter 'new_carrier'
Warning(net/core/dev.c:4788): Excess function parameter 'new_carries' description in 'dev_change_carrier'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
QFQ+ can select for service only 'eligible' aggregates, i.e.,
aggregates that would have started to be served also in the emulated
ideal system. As a consequence, for QFQ+ to be work conserving, at
least one of the active aggregates must be eligible when it is time to
choose the next aggregate to serve.
The set of eligible aggregates is updated through the function
qfq_update_eligible(), which does guarantee that, after its
invocation, at least one of the active aggregates is eligible.
Because of this property, this function is invoked in
qfq_deactivate_agg() to guarantee that at least one of the active
aggregates is still eligible after an aggregate has been deactivated.
In particular, the critical case is when there are other active
aggregates, but the aggregate being deactivated happens to be the only
one eligible.
However, this precaution is not needed for QFQ+ to be work conserving,
because update_eligible() is always invoked also at the beginning of
qfq_choose_next_agg(). This patch removes the additional invocation of
update_eligible() in qfq_deactivate_agg().
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By definition of (the algorithm of) QFQ+, the system virtual time must
be pushed up only if there is no 'eligible' aggregate, i.e. no
aggregate that would have started to be served also in the ideal
system emulated by QFQ+. QFQ+ serves only eligible aggregates, hence
the aggregate currently in service is eligible. As a consequence, to
decide whether there is no eligible aggregate, QFQ+ must also check
whether there is no aggregate in service.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aggregate budgets are computed so as to guarantee that, after an
aggregate has been selected for service, that aggregate has enough
budget to serve at least one maximum-size packet for the classes it
contains. For this reason, after a new aggregate has been selected
for service, its next packet is immediately dequeued, without any
further control.
The maximum packet size for a class, lmax, can be changed through
qfq_change_class(). In case the user sets lmax to a lower value than
the the size of some of the still-to-arrive packets, QFQ+ will
automatically push up lmax as it enqueues these packets. This
automatic push up is likely to happen with TSO/GSO.
In any case, if lmax is assigned a lower value than the size of some
of the packets already enqueued for the class, then the following
problem may occur: the size of the next packet to dequeue for the
class may happen to be larger than lmax, after the aggregate to which
the class belongs has been just selected for service. In this case,
even the budget of the aggregate, which is an unsigned value, may be
lower than the size of the next packet to dequeue. After dequeueing
this packet and subtracting its size from the budget, the latter would
wrap around.
This fix prevents the budget from wrapping around after any packet
dequeue.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If no aggregate is in service, then the function qfq_dequeue() does
not dequeue any packet. For this reason, to guarantee QFQ+ to be work
conserving, a just-activated aggregate must be set as in service
immediately if it happens to be the only active aggregate.
This is done by the function qfq_enqueue().
Unfortunately, the function qfq_add_to_agg(), used to add a class to
an aggregate, does not perform this important additional operation.
In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
of a class from a source aggregate, becoming, for this move, inactive,
to a destination aggregate, becoming instead active, and 2) the
destination aggregate becomes the only active aggregate, then this
aggregate is not however set as in service. QFQ+ remains then in a
non-work-conserving state until a new invocation of qfq_enqueue()
recovers the situation.
This fix solves the problem by moving the logic for setting an
aggregate as in service directly into the function qfq_activate_agg().
Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
remains work conserving. Since the more-complex logic of this new
version of activate_aggregate() is not necessary, in qfq_dequeue(), to
reschedule an aggregate that finishes its budget, then the aggregate
is now rescheduled by invoking directly the functions needed.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
QFQ+ schedules the active aggregates in a group using a bucket list
(one list per group). The bucket in which each aggregate is inserted
depends on the aggregate's timestamps, and the number
of buckets in a group is enough to accomodate the possible (range of)
values of the timestamps of all the aggregates in the group. For this
property to hold, timestamps must however be computed correctly. One
necessary condition for computing timestamps correctly is that the
number of bits dequeued for each aggregate, while the aggregate is in
service, does not exceed the maximum budget budgetmax assigned to the
aggregate.
For each aggregate, budgetmax is proportional to the number of classes
in the aggregate. If the number of classes of the aggregate is
decreased through qfq_change_class(), then budgetmax is decreased
automatically as well. Problems may occur if the aggregate is in
service when budgetmax is decreased, because the current remaining
budget of the aggregate and/or the service already received by the
aggregate may happen to be larger than the new value of budgetmax. In
this case, when the aggregate is eventually deselected and its
timestamps are updated, the aggregate may happen to have received an
amount of service larger than budgetmax. This may cause the aggregate
to be assigned a higher virtual finish time than the maximum
acceptable value for the last bucket in the bucket list of the group.
This fix introduces a cap that addresses this issue.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DTR/RTS need to be raised, regardless of the open() mode, but not
if the port has already shutdown.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without a memory and compiler barrier, the task state change
can migrate relative to the condition testing in a blocking loop.
However, the task state change must be visible across all cpus
prior to testing those conditions. Failing to do this can result
in the familiar 'lost wakeup' and this task will hang until killed.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although tty_lock() already protects concurrent update to
blocked_open, that fails to meet the separation-of-concerns between
tty_port and tty.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saving the port count bump is unsafe. If the tty is hung up while
this open was blocking, the port count is zeroed.
Explicitly check if the tty was hung up while blocking, and correct
the port count if not.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"A moderately sized pile of fixes, some specifically for merge window
introduced regressions although others are for longer standing items
and have been queued up for -stable.
I'm kind of tired of all the RDS protocol bugs over the years, to be
honest, it's way out of proportion to the number of people who
actually use it.
1) Fix missing range initialization in netfilter IPSET, from Jozsef
Kadlecsik.
2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
Berg.
3) Fix DMA syncing in SFC driver, from Ben Hutchings.
4) Fix regression in BOND device MAC address setting, from Jiri
Pirko.
5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.
6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
fix from Dmitry Kravkov.
7) Missing cfgspace_lock initialization in BCMA driver.
8) Validate parameter size for SCTP assoc stats getsockopt(), from
Guenter Roeck.
9) Fix SCTP association hangs, from Lee A Roberts.
10) Fix jumbo frame handling in r8169, from Francois Romieu.
11) Fix phy_device memory leak, from Petr Malat.
12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
Mehrtens.
13) Missing socket refcount release in L2TP, from Guillaume Nault.
14) sctp_endpoint_init should respect passed in gfp_t, rather than use
GFP_KERNEL unconditionally. From Dan Carpenter.
15) Add AISX AX88179 USB driver, from Freddy Xin.
16) Remove MAINTAINERS entries for drivers deleted during the merge
window, from Cesar Eduardo Barros.
17) RDS protocol can try to allocate huge amounts of memory, check
that the user's request length makes sense, from Cong Wang.
18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
bogus, definition. From Cong Wang.
19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
from Frank Li. Also, fix a build error introduced in the merge
window.
20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.
21) Don't double count RTT measurements when we leave the TCP receive
fast path, from Neal Cardwell."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
tcp: fix double-counted receiver RTT when leaving receiver fast path
CAIF: fix sparse warning for caif_usb
rds: simplify a warning message
net: fec: fix build error in no MXC platform
net: ipv6: Don't purge default router if accept_ra=2
net: fec: put tx to napi poll function to fix dead lock
sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
rds: limit the size allocated by rds_message_alloc()
MAINTAINERS: remove eexpress
MAINTAINERS: remove drivers/net/wan/cycx*
MAINTAINERS: remove 3c505
caif_dev: fix sparse warnings for caif_flow_cb
ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
sctp: use the passed in gfp flags instead GFP_KERNEL
ipv[4|6]: correct dropwatch false positive in local_deliver_finish
l2tp: Restore socket refcount when sendmsg succeeds
net/phy: micrel: Disable asymmetric pause for KSZ9021
bgmac: omit the fcs
phy: Fix phy_device_free memory leak
bnx2x: Fix KR2 work-around condition
...
We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both
before and after going to step5. That wastes CPU and double-counts the
receiver-side RTT sample.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the following sparse warning:
net/caif/caif_usb.c:84:16: warning: symbol 'cfusbl_create' was not
declared. Should it be static?
Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting net.ipv6.conf.<interface>.accept_ra=2 causes the kernel
to accept RAs even when forwarding is enabled. However, enabling
forwarding purges all default routes on the system, breaking
connectivity until the next RA is received. Fix this by not
purging default routes on interfaces that have accept_ra=2.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't definite its own MAX_KMALLOC_SIZE, use the one
defined in mm.
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixed the following sparse warning:
net/caif/caif_dev.c:121:6: warning: symbol 'caif_flow_cb' was not
declared. Should it be static?
Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Don't allow NFS silly-renamed files to be deleted
- Don't start the retransmission timer when out of socket space
- Fix a couple of pnfs-related Oopses.
- Fix one more NFSv4 state recovery deadlock
- Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRMpMhAAoJEGcL54qWCgDy4BMP/0Zl7Ei7x9bJSb1C1lpPSo5p
Lr9XoHLYqhPcAwKUXQfgM5IkC69bE62bD5esmdDqkgZYqnmGE0E4LG6MsbsMmvzk
yug5WOxmjOFee7Bdpd8B86Z0qsa4l2TkQu2h9G3zE36P2rPKQaNzpteIjhis5UEQ
EfNyLoBdFcuUSh4ztMVZOzbAyDcbNfsyl03XVmlv+Qn/o0l42Zjth0qwOP60bjuM
zJF1CkHi5NLbXEhmOev9mA6UYz6zWRbiA/Yu92pomtXVDtOtzWpUniBIcf/S1ZH/
V8Gj6bWdHHyFCa2PjhY1/QdLBOPRPdxpAAJk+q48AKmzyiOU6g3lIHBp5ai1WZNI
1C+SYxABE/EJgq9SoQYGqq6SUiolrFulqnFHXF0jHF+ifdjoHjSRmpGQAoyoZ0k1
aSl+Ojqx7QHibJd8GZBavWc3upRDzhHDRRB3tkQCENi+hryBZxEyeS2Z54NmBRUN
tsOuyac6rtknZdD8Do4DMt9uc9u1DWicaiZbLfkP1VL1Angh6NKSA7qbmH6giLBS
9Y+DPcIk5e34uKQ21WTxFydGD+SMg0EMnOmfr6EYXWEHBhKNYVR+cHyH0mAF6RzX
enU2g0H2m+3vUQqajPUP0DV/eLGtdsvWvMjiskc3KX90CWfHmV2C8GFSxjV2OkT1
vG1KFrICO6DR2943Udit
=FMtb
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"We've just concluded another Connectathon interoperability testing
week, and so here are the fixes for the bugs that were discovered:
- Don't allow NFS silly-renamed files to be deleted
- Don't start the retransmission timer when out of socket space
- Fix a couple of pnfs-related Oopses.
- Fix one more NFSv4 state recovery deadlock
- Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER"
* tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: One line comment fix
NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
SUNRPC: add call to get configured timeout
PNFS: set the default DS timeout to 60 seconds
NFSv4: Fix another open/open_recovery deadlock
nfs: don't allow nfs_find_actor to match inodes of the wrong type
NFSv4.1: Hold reference to layout hdr in layoutget
pnfs: fix resend_to_mds for directio
SUNRPC: Don't start the retransmission timer when out of socket space
NFS: Don't allow NFS silly-renamed files to be deleted, no signal
When setting a monitor interface up or down, the idle state needs to be
recalculated, otherwise the hardware will just stay in its previous idle
state.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch doesn't change how the code works because in the current
kernel gfp is always GFP_KERNEL. But gfp was obviously intended
instead of GFP_KERNEL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I had a report recently of a user trying to use dropwatch to localise some frame
loss, and they were getting false positives. Turned out they were using a user
space SCTP stack that used raw sockets to grab frames. When we don't have a
registered protocol for a given packet, we record it as a drop, even if a raw
socket receieves the frame. We should only record the drop in the event a raw
socket doesnt exist to receive the frames
Tested by the reported successfully
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
Tested-by: William Reich <reich@ulticom.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: William Reich <reich@ulticom.com>
CC: eric.dumazet@gmail.com
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
This is another flurry of fixes intended for the 3.9 stream...
A mac80211 pull from Johannes:
"Seth fixes a stupid bug I introduced into one of his earlier patches,
Chun-Yeow fixes mesh forwarding and Felix fixes monitor mode. I myself
fixed a small locking issue and, the biggest change here, removed some
nl80211 information with which sometimes the per wiphy information was
getting too large for the typical 4k-minus-overhead. In my -next tree I
have a patch to allow splitting that and add back the information
removed now."
An iwlwifi pull from Johannes:
"I have a fix for a pretty important bug regarding DMA mapping, that
could cause the DMA engine to overwrite data we wanted to send to it, so
that the next time we send it it would be bad. This particularly affects
calibration results. Other than that, three little fixes for the MVM
driver."
But wait, there's more!
Avinash Patil fixes an incorrectly timed delay in mwifiex.
Bing Zhao prevents a crash in SD8688 caused by failing to properly
set a flag before issuing a command.
Felix Fietkau is the big here this time, providing a trio of minor
ath9k fixes and correcting the advertised interface combinations for
rt2x00 when mesh support is disabled.
Finally, Hauke Mehrtens gives us a patch that correctlin initializes
a spin lock in the bcma code.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The sendmsg() syscall handler for PPPoL2TP doesn't decrease the socket
reference counter after successful transmissions. Any successful
sendmsg() call from userspace will then increase the reference counter
forever, thus preventing the kernel's session and tunnel data from
being freed later on.
The problem only happens when writing directly on L2TP sockets.
PPP sockets attached to L2TP are unaffected as the PPP subsystem
uses pppol2tp_xmit() which symmetrically increase/decrease reference
counters.
This patch adds the missing call to sock_put() before returning from
pppol2tp_sendmsg().
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VHT MCSes we advertise to the AP were supposed to
be restricted to the AP, but due to a bug in the logic
mac80211 will advertise rates to the AP that aren't
even supported by the local device. To fix this skip
any adjustment if the NSS isn't supported at all.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduced with de74a1d903
"mac80211: fix WPA with VLAN on AP side with ps-sta".
Apparently overwrites the sdata pointer with non-valid data in
the case of mesh.
Fix this by checking for IFTYPE_AP_VLAN.
Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Given a device with many channels capabilities the wiphy
information can still overflow even though its size in
3.9 was reduced to 3.8 levels. For new userspace and
kernel 3.10 we're going to implement a new "split dump"
protocol that can use multiple messages per wiphy.
For now though, add a workaround to be able to send more
information to userspace. Since generic netlink doesn't
have a way to set the minimum dump size globally, and we
wouldn't really want to set it globally anyway, increase
the size only when needed, as described in the comments.
As userspace might not be prepared for large buffers, we
can only use 4k.
Also increase the size for the get_wiphy command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull nfsd changes from J Bruce Fields:
"Miscellaneous bugfixes, plus:
- An overhaul of the DRC cache by Jeff Layton. The main effect is
just to make it larger. This decreases the chances of intermittent
errors especially in the UDP case. But we'll need to watch for any
reports of performance regressions.
- Containerized nfsd: with some limitations, we now support
per-container nfs-service, thanks to extensive work from Stanislav
Kinsbursky over the last year."
Some notes about conflicts, since there were *two* non-data semantic
conflicts here:
- idr_remove_all() had been added by a memory leak fix, but has since
become deprecated since idr_destroy() does it for us now.
- xs_local_connect() had been added by this branch to make AF_LOCAL
connections be synchronous, but in the meantime Trond had changed the
calling convention in order to avoid a RCU dereference.
There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.
* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
SUNRPC: make AF_LOCAL connect synchronous
nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
svcrpc: fix rpc server shutdown races
svcrpc: make svc_age_temp_xprts enqueue under sv_lock
lockd: nlmclnt_reclaim(): avoid stack overflow
nfsd: enable NFSv4 state in containers
nfsd: disable usermode helper client tracker in container
nfsd: use proper net while reading "exports" file
nfsd: containerize NFSd filesystem
nfsd: fix comments on nfsd_cache_lookup
SUNRPC: move cache_detail->cache_request callback call to cache_read()
SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
SUNRPC: rework cache upcall logic
SUNRPC: introduce cache_detail->cache_request callback
NFS: simplify and clean cache library
NFS: use SUNRPC cache creation and destruction helper for DNS cache
nfsd4: free_stid can be static
nfsd: keep a checksum of the first 256 bytes of request
sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
sunrpc: fix comment in struct xdr_buf definition
...
Pull Ceph updates from Sage Weil:
"A few groups of patches here. Alex has been hard at work improving
the RBD code, layout groundwork for understanding the new formats and
doing layering. Most of the infrastructure is now in place for the
final bits that will come with the next window.
There are a few changes to the data layout. Jim Schutt's patch fixes
some non-ideal CRUSH behavior, and a set of patches from me updates
the client to speak a newer version of the protocol and implement an
improved hashing strategy across storage nodes (when the server side
supports it too).
A pair of patches from Sam Lang fix the atomicity of open+create
operations. Several patches from Yan, Zheng fix various mds/client
issues that turned up during multi-mds torture tests.
A final set of patches expose file layouts via virtual xattrs, and
allow the policies to be set on directories via xattrs as well
(avoiding the awkward ioctl interface and providing a consistent
interface for both kernel mount and ceph-fuse users)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
libceph: add support for HASHPSPOOL pool flag
libceph: update osd request/reply encoding
libceph: calculate placement based on the internal data types
ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
ceph: update "ceph_features.h"
libceph: decode into cpu-native ceph_pg type
libceph: rename ceph_pg -> ceph_pg_v1
rbd: pass length, not op for osd completions
rbd: move rbd_osd_trivial_callback()
libceph: use a do..while loop in con_work()
libceph: use a flag to indicate a fault has occurred
libceph: separate non-locked fault handling
libceph: encapsulate connection backoff
libceph: eliminate sparse warnings
ceph: eliminate sparse warnings in fs code
rbd: eliminate sparse warnings
libceph: define connection flag helpers
rbd: normalize dout() calls
rbd: barriers are hard
rbd: ignore zero-length requests
...
Returns the configured timeout for the xprt of the rpc client.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In sctp_ulpq_tail_data(), use return values 0,1 to indicate whether
a complete event (with MSG_EOR set) was delivered. A return value
of -ENOMEM continues to indicate an out-of-memory condition was
encountered.
In sctp_ulpq_retrieve_partial() and sctp_ulpq_retrieve_first(),
correct message reassembly logic for SCTP partial delivery.
Change logic to ensure that as much data as possible is sent
with the initial partial delivery and that following partial
deliveries contain all available data.
In sctp_ulpq_partial_delivery(), attempt partial delivery only
if the data on the head of the reassembly queue is at or before
the cumulative TSN ACK point.
In sctp_ulpq_renege(), use the modified return values from
sctp_ulpq_tail_data() to choose whether to attempt partial
delivery or to attempt to drain the reassembly queue as a
means to reduce memory pressure. Remove call to
sctp_tsnmap_mark(), as this is handled correctly in call to
sctp_ulpq_tail_data().
Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
In sctp_ulpq_renege_list(), events being reneged from the
ordering queue may correspond to multiple TSNs. Identify
all affected packets; sum freed space and renege from the
tsnmap.
Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
In sctp_ulpq_renege_list(), do not renege packets below the
cumulative TSN ACK point.
Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
In sctp_tsnmap_mark(), correct off-by-one error when calculating
size value for sctp_tsnmap_grow().
In sctp_tsnmap_grow(), correct off-by-one error when copying
and resizing the tsnmap. If max_tsn_seen is in the LSB of the
word, this bit can be lost, causing the corresponding packet
to be transmitted again and to be entered as a duplicate into
the SCTP reassembly/ordering queues. Change parameter name
from "gap" (zero-based index) to "size" (one-based) to enhance
code readability.
Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
It doesn't appear that anyone actually needs to connect asynchronously.
Also, using a workqueue for the connect means we lose the namespace
information from the original process. This is a problem since there's
no way to explicitly pass in a filesystem namespace for resolution of an
AF_LOCAL address.
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After Felix's patch it was still broken in case you
used more than just a single monitor interface. Fix
it better now.
Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building sctp may fail with:
In function ‘copy_from_user’,
inlined from ‘sctp_getsockopt_assoc_stats’ at
net/sctp/socket.c:5656:20:
arch/x86/include/asm/uaccess_32.h:211:26: error: call to
‘copy_from_user_overflow’ declared with attribute error: copy_from_user()
buffer size is not provably correct
if built with W=1 due to a missing parameter size validation
before the call to copy_from_user.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
charset comes from skb->data. It's a number in the 0-255 range.
If we have debugging turned on then this could cause a read beyond
the end of the array.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is another case of data increasing the size of the
wiphy information significantly with a new feature, for
now remove this as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
The legacy behavior adds the pgid seed and pool together as the input for
CRUSH. That is problematic because each pool's PGs end up mapping to the
same OSDs: 1.5 == 2.4 == 3.3 == ...
Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and
feed that into CRUSH. This ensures that two adjacent pools will map to
an independent pseudorandom set of OSDs.
Advertise our support for this via a protocol feature flag.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Use the new version of the encoding for osd requests and replies. In the
process, update the way we are tracking request ops and reply lengths and
results in the struct ceph_osd_request. Update the rbd and fs/ceph users
appropriately.
The main changes are:
- we keep pointers into the request memory for fields we need to update
each time the request is sent out over the wire
- we keep information about the result in an array in the request struct
where the users can easily get at it.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type. This allows us to
pass the full 32-bit precision of the pgid.seed to the callers. It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features.
These have been present in ceph.git since v0.42, Feb 2012. Require these
features to simplify support; nobody is running older userspace.
Note that the new request and reply encoding is still not in place, so the new
code is not yet functional.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Always decode data into our cpu-native ceph_pg type that has the correct
field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the
legacy protocol.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Rename the old version this type to distinguish it from the new version.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Pablo Neira Ayuso says:
====================
The following patchset contains two bugfixes for netfilter/ipset via
Jozsef Kadlecsik, they are:
* Fix timeout corruption if sets are resized, by Josh Hunt.
* Fix bogus error report if the flag nomatch is set, from Jozsef.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some mlme work structs are not cancelled on disassociation
nor interface deletion, which leads to them running after
the memory has been freed
There is not a clean way to cancel these in the disassociation
logic because they must be canceled outside of the ifmgd->mtx
lock, so just cancel them in mgd_stop logic that tears down
the station.
This fixes the crashes we see in 3.7.9+. The crash stack
trace itself isn't so helpful, but this warning gives
more useful info:
WARNING: at /home/greearb/git/linux-3.7.dev.y/lib/debugobjects.c:261 debug_print_object+0x7c/0x8d()
ODEBUG: free active (active state 0) object type: work_struct hint: ieee80211_sta_monitor_work+0x0/0x14 [mac80211]
Modules linked in: [...]
Pid: 14743, comm: iw Tainted: G C O 3.7.9+ #11
Call Trace:
[<ffffffff81087ef8>] warn_slowpath_common+0x80/0x98
[<ffffffff81087fa4>] warn_slowpath_fmt+0x41/0x43
[<ffffffff812a2608>] debug_print_object+0x7c/0x8d
[<ffffffff812a2bca>] debug_check_no_obj_freed+0x95/0x1c3
[<ffffffff8114cc69>] slab_free_hook+0x70/0x79
[<ffffffff8114ea3e>] kfree+0x62/0xb7
[<ffffffff8149f465>] netdev_release+0x39/0x3e
[<ffffffff8136ad67>] device_release+0x52/0x8a
[<ffffffff812937db>] kobject_release+0x121/0x158
[<ffffffff81293612>] kobject_put+0x4c/0x50
[<ffffffff8148f0d7>] netdev_run_todo+0x25c/0x27e
Cc: stable@vger.kernel.org
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Re-order the quiesce code so that timers are always
stopped before work-items are flushed. This was not
the problem I saw, but I think it may still be more
correct.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When not using channel contexts with only monitor mode interfaces being
active, report local->monitor_chandef to userspace.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the driver does not want a monitor mode VIF, no channel context is
allocated for it. This causes ieee80211_recalc_idle to put the hardware
into idle mode if only a monitor mode is active, breaking injection.
Fix this by checking local->monitors in addition to active channel
contexts.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 6c17b77b67 (mac80211: Fix tx queue
handling during scans) contains a bug that causes off-channel frames to
get queued when they should be handed down to the driver for transmit.
Prevent this from happening.
Reported-by: Fabio Rossi <rossi.f@inwind.it>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull networking fixes from David Miller:
1) ping_err() ICMP error handler looks at wrong ICMP header, from Li
Wei.
2) TCP socket hash function on ipv6 is too weak, from Eric Dumazet.
3) netif_set_xps_queue() forgets to drop mutex on errors, fix from
Alexander Duyck.
4) sum_frag_mem_limit() can deadlock due to lack of BH disabling, fix
from Eric Dumazet.
5) TCP SYN data is miscalculated in tcp_send_syn_data(), because the
amount of TCP option space was not taken into account properly in
this code path. Fix from yuchung Cheng.
6) MLX4 driver allocates device queues with the wrong size, from Kleber
Sacilotto.
7) sock_diag can access past the end of the sock_diag_handlers[] array,
from Mathias Krause.
8) vlan_set_encap_proto() makes incorrect assumptions about where
skb->data points, rework the logic so that it works regardless of
where skb->data happens to be. From Jesse Gross.
9) Fix gianfar build failure with NET_POLL enabled, from Paul
Gortmaker.
10) Fix Ipv4 ID setting and checksum calculations in GRE driver, from
Pravin B Shelar.
11) bgmac driver does:
int i;
for (i = 0; ...; ...) {
...
for (i = 0; ...; ...) {
effectively corrupting the outer loop index, use a seperate
variable for the inner loops. From Rafał Miłecki.
12) Fix suspend bugs in smsc95xx driver, from Ming Lei.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
usbnet: smsc95xx: rename FEATURE_AUTOSUSPEND
usbnet: smsc95xx: fix broken runtime suspend
usbnet: smsc95xx: fix suspend failure
bgmac: fix indexing of 2nd level loops
b43: Fix lockdep splat on module unload
Revert "ip_gre: propogate target device GSO capability to the tunnel device"
IP_GRE: Fix GRE_CSUM case.
VXLAN: Use tunnel_ip_select_ident() for tunnel IP-Identification.
IP_GRE: Fix IP-Identification.
net/pasemi: Fix missing coding style
vmxnet3: fix ethtool ring buffer size setting
vmxnet3: make local function static
bnx2x: remove dead code and make local funcs static
gianfar: fix compile fail for NET_POLL=y due to struct packing
vlan: adjust vlan_set_encap_proto() for its callers
sock_diag: Simplify sock_diag_handlers[] handling in __sock_diag_rcv_msg
sock_diag: Fix out-of-bounds access to sock_diag_handlers[]
vxlan: remove depends on CONFIG_EXPERIMENTAL
mlx4_en: fix allocation of CPU affinity reverse-map
mlx4_en: fix allocation of device tx_cq
...
- SRP error handling fixes from Bart Van Assche
- Implementation of memory windows for mlx4 from Shani Michaeli
- Lots of cxgb4 HW driver fixes from Vipul Pandya
- Make iSER work for virtual functions, other fixes from Or Gerlitz
- Fix for bug in qib HW driver from Mike Marciniszyn
- IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
- Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRLPNoAAoJEENa44ZhAt0h0bMP/1xqlgDR3DGdoUoV4yTd6O9G
Uccdb7og5o5tedVo+8xZz01y4at99P8FWW4hJ4os6k8n6yKoBVwo7qjN+BOR+JG5
q8+O+ynUSIg4tGrb5sXcMnKXAXbw/vkftMWYNA41cbrM24DTYzB/2SLpvhwbFoTT
tdQc2tgz5QaDqzWbagyCR4+k/IgO+Llrz/RvIdtz4dsTnTDogN7QCoSffX8n/Lpb
DxtyXK4sdl3DAtd3CsIdsB/TSMb3RkHLCoSvmrWlLnqMdsbRxVnCVfBm4BOghW3J
Y2K3joRoCjjIZSRNs/i0FMFkT/jbCXg1oXg9ek/a6YFNcgyk7z8iGyXrRY7fOnno
8U2SfxJ69YpVYeJr+DSjaeHcmjsaYU7NN7JPxzvPKcJOIsxQJ/euJDXAXau3lEQY
o9/p4JsGty0WHi1NanyygvghvBAoP1C5/59Sl4bHH5gckPyJT1kinPSCTT76YXGS
WkSHg2mDhiJHy7Pnuy85iZldPoy2/5z09/I4aGMeL+8kUZbD4iFqzXIJU0HTsAim
EONoRXDhIcN5DNVSVH1ig6nJ2a7Vhov4Z0r/vB8P4KhslBcqFwf2leC0eCoe5mNt
SzcKhqosZDXoL8AwzpntzGIOid8pWmHbUx/PgIcoVXPjtl0h2ULNIFoYYyMZ3cyU
AyN2tSiUZVddTV1/aKGL
=RAQw
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband update from Roland Dreier:
"Main batch of InfiniBand/RDMA changes for 3.9:
- SRP error handling fixes from Bart Van Assche
- Implementation of memory windows for mlx4 from Shani Michaeli
- Lots of cxgb4 HW driver fixes from Vipul Pandya
- Make iSER work for virtual functions, other fixes from Or Gerlitz
- Fix for bug in qib HW driver from Mike Marciniszyn
- IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
- Various cleanups and warning fixes from Julia Lawall, Paul Bolle,
Wei Yongjun"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits)
IB/mlx4: Advertise MW support
IB/mlx4: Support memory window binding
mlx4: Implement memory windows allocation and deallocation
mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
mlx4_core: Disable memory windows for virtual functions
IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
IB/srp: Fail I/O requests if the transport is offline
IB/srp: Avoid endless SCSI error handling loop
IB/srp: Avoid sending a task management function needlessly
IB/srp: Track connection state properly
IB/mlx4: Remove redundant NULL check before kfree
IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
IB/iser: Enable iser when FMRs are not supported
IB/iser: Avoid error prints on EAGAIN registration failures
IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
IB/uverbs: Implement memory windows support in uverbs
IB/core: Add "type 2" memory windows support
mlx4_core: Propagate MR deregistration failures to caller
mlx4_core: Rename MPT-related functions to have mpt_ prefix
...
Pull slave-dmaengine updates from Vinod Koul:
"This is fairly big pull by my standards as I had missed last merge
window. So we have the support for device tree for slave-dmaengine,
large updates to dw_dmac driver from Andy for reusing on different
architectures. Along with this we have fixes on bunch of the drivers"
Fix up trivial conflicts, usually due to #include line movement next to
each other.
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (111 commits)
Revert "ARM: SPEAr13xx: Pass DW DMAC platform data from DT"
ARM: dts: pl330: Add #dma-cells for generic dma binding support
DMA: PL330: Register the DMA controller with the generic DMA helpers
DMA: PL330: Add xlate function
DMA: PL330: Add new pl330 filter for DT case.
dma: tegra20-apb-dma: remove unnecessary assignment
edma: do not waste memory for dma_mask
dma: coh901318: set residue only if dma is in progress
dma: coh901318: avoid unbalanced locking
dmaengine.h: remove redundant else keyword
dma: of-dma: protect list write operation by spin_lock
dmaengine: ste_dma40: do not remove descriptors for cyclic transfers
dma: of-dma.c: fix memory leakage
dw_dmac: apply default dma_mask if needed
dmaengine: ioat - fix spare sparse complain
dmaengine: move drivers/of/dma.c -> drivers/dma/of-dma.c
ioatdma: fix race between updating ioat->head and IOAT_COMPLETION_PENDING
dw_dmac: add support for Lynxpoint DMA controllers
dw_dmac: return proper residue value
dw_dmac: fill individual length of descriptor
...
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.
I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.
There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.
The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.
XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...
Unicast frame with unknown forwarding information always trigger
the path discovery assuming destination is always located inside the
MBSS. This patch allows the forwarding to look for mesh gate if path
discovery inside the MBSS has failed.
Reported-by: Cedric Voncken <cedric.voncken@acksys.fr>
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Just like the radar information, the TCP WoWLAN capability
data can increase the wiphy information and make it too
big. Remove the TCP WoWLAN information; no driver supports
it and new userspace tools will be required as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The wiphy information is getting very close to being too
much for a typical netlink dump message and adding the
radar attributes to channels and interface combinations
can push it over the limit, which means userspace gets no
information whatsoever. Therefore, remove these again for
now, no driver actually supports radar detection anyway
and a modified userspace is required as well.
We're working on a solution that will allow userspace to
request splitting the information across multiple netlink
messages, which will allow us to add this back.
Cc: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The ieee80211_beacon_add_tim() function might be called
by drivers with BHs enabled, which causes a potential
deadlock if TX happens at the same time and attempts to
lock the tim_lock as well. Use spin_lock_bh to fix it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This just converts a manually-implemented loop into a do..while loop
in con_work(). It also moves handling of EAGAIN inside the blocks
where it's already been determined an error code was returned.
Also update a few dout() calls near the affected code for
consistency.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This just rearranges the logic in con_work() a little bit so that a
flag is used to indicate a fault has occurred. This allows both the
fault and non-fault case to be handled the same way and avoids a
couple of nearly consecutive gotos.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An error occurring on a ceph connection is treated as a fault,
causing the connection to be reset. The initial part of this fault
handling has to be done while holding the connection mutex, but
it must then be dropped for the last part.
Separate the part of this fault handling that executes without the
lock into its own function, con_fault_finish(). Move the call to
this new function, as well as call that drops the connection mutex,
into ceph_fault(). Rename that function con_fault() to reflect that
it's only handling the connection part of the fault handling.
The motivation for this was a warning from sparse about the locking
being done here. Rearranging things this way keeps all the mutex
manipulation within ceph_fault(), and this stops sparse from
complaining.
This partially resolves:
http://tracker.ceph.com/issues/4184
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Collect the code that tests for and implements a backoff delay for a
ceph connection into a new function, ceph_backoff().
Make the debug output messages in that part of the code report
things consistently by reporting a message in the socket closed
case, and by making the one for PREOPEN state report the connection
pointer like the rest.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Eliminate most of the problems in the libceph code that cause sparse
to issue warnings.
- Convert functions that are never referenced externally to have
static scope.
- Pass NULL rather than 0 for a pointer argument in one spot in
ceph_monc_delete_snapid()
This partially resolves:
http://tracker.ceph.com/issues/4184
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define and use functions that encapsulate operations performed on
a connection's flags.
This resolves:
http://tracker.ceph.com/issues/4234
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
commit "ip_gre: allow CSUM capable devices to handle packets"
aa0e51cdda, broke GRE_CSUM case.
GRE_CSUM needs checksum computed for inner packet. Therefore
csum-calculation can not be offloaded if tunnel device requires
GRE_CSUM. Following patch fixes it by computing inner packet checksum
for GRE_CSUM type, for all other type of GRE devices csum is offloaded.
CC: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GRE-GSO generates ip fragments with id 0,2,3,4... for every
GSO packet, which is not correct. Following patch fixes it
by setting ip-header id unique id of fragments are allowed.
As Eric Dumazet suggested it is optimized by using inner ip-header
whenever inner packet is ipv4.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This member of struct virtio_chan is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deadlock might be caused by allocating memory with GFP_KERNEL in
runtime_resume and runtime_suspend callback of network devices in iSCSI
situation, so mark network devices and its ancestor as 'memalloc_noio'
with the introduced pm_runtime_set_memalloc_noio().
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Decotigny <david.decotigny@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sock_diag_lock_handler() and sock_diag_unlock_handler() actually
make the code less readable. Get rid of them and make the lock usage
and access to sock_diag_handlers[] clear on the first sight.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userland can send a netlink message requesting SOCK_DIAG_BY_FAMILY
with a family greater or equal then AF_MAX -- the array size of
sock_diag_handlers[]. The current code does not test for this
condition therefore is vulnerable to an out-of-bound access opening
doors for a privilege escalation.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
- not enough memory for file structures
- operation is not allowed
- user is over its limit
Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.
Return error through pointer. Change all callers to preserve this error code.
[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the socket is full, we're better off just waiting until it empties,
or until the connection is broken. The reason why we generally don't
want to time out is that the call to xprt->ops->release_xprt() will
trigger a connection reset, which isn't helpful...
Let's make an exception for soft RPC calls, since they have to provide
timeout guarantees.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
In fast open the sender unncessarily reduces the space available
for data in SYN by 12 bytes. This is because in the sender
incorrectly reserves space for TS option twice in tcp_send_syn_data():
tcp_mtu_to_mss() already accounts for TS option space. But it further
reserves MAX_TCP_OPTION_SPACE when computing the payload space.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mem cgroup socket limit is only used if the config option is
enabled. Found with sparse
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch found a locking bug in netif_set_xps_queue in which we were not
releasing the lock in the case of an allocation failure.
This change corrects that so that we release the xps_map_mutex before
returning -ENOMEM in the case of an allocation failure.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we handle icmp errors in each transport protocol's err_handler,
for icmp protocols, that is ping_err. Since this handler only care
of those icmp errors triggered by echo request, errors triggered
by echo reply(which sent by kernel) are sliently ignored.
So wrap ping_err() with icmp_err() to deal with those icmp errors.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull s390 update from Martin Schwidefsky:
"The most prominent change in this patch set is the software dirty bit
patch for s390. It removes __HAVE_ARCH_PAGE_TEST_AND_CLEAR_DIRTY and
the page_test_and_clear_dirty primitive which makes the common memory
management code a bit less obscure.
Heiko fixed most of the PCI related fallout, more often than not
missing GENERIC_HARDIRQS dependencies. Notable is one of the 3270
patches which adds an export to tty_io to be able to resize a tty.
The rest is the usual bunch of cleanups and bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
s390/module: Add missing R_390_NONE relocation type
drivers/gpio: add missing GENERIC_HARDIRQ dependency
drivers/input: add couple of missing GENERIC_HARDIRQS dependencies
s390/cleanup: rename SPP to LPP
s390/mm: implement software dirty bits
s390/mm: Fix crst upgrade of mmap with MAP_FIXED
s390/linker skript: discard exit.data at runtime
drivers/media: add missing GENERIC_HARDIRQS dependency
s390/bpf,jit: add vlan tag support
drivers/net,AT91RM9200: add missing GENERIC_HARDIRQS dependency
iucv: fix kernel panic at reboot
s390/Kconfig: sort list of arch selected config options
phylib: remove !S390 dependeny from Kconfig
uio: remove !S390 dependency from Kconfig
dasd: fix sysfs cleanup in dasd_generic_remove
s390/pci: fix hotplug module init
s390/pci: cleanup clp page allocation
s390/pci: cleanup clp inline assembly
s390/perf: cpum_cf: fallback to software sampling events
s390/mm: provide PAGE_SHARED define
...
Pull trivial tree from Jiri Kosina:
"Assorted tiny fixes queued in trivial tree"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits)
DocBook: update EXPORT_SYMBOL entry to point at export.h
Documentation: update top level 00-INDEX file with new additions
ARM: at91/ide: remove unsused at91-ide Kconfig entry
percpu_counter.h: comment code for better readability
x86, efi: fix comment typo in head_32.S
IB: cxgb3: delay freeing mem untill entirely done with it
net: mvneta: remove unneeded version.h include
time: x86: report_lost_ticks doesn't exist any more
pcmcia: avoid static analysis complaint about use-after-free
fs/jfs: Fix typo in comment : 'how may' -> 'how many'
of: add missing documentation for of_platform_populate()
btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
sound: soc: Fix typo in sound/codecs
treewide: Fix typo in various drivers
btrfs: fix comment typos
Update ibmvscsi module name in Kconfig.
powerpc: fix typo (utilties -> utilities)
of: fix spelling mistake in comment
h8300: Fix home page URL in h8300/README
xtensa: Fix home page URL in Kconfig
...
Merge misc patches from Andrew Morton:
- Florian has vanished so I appear to have become fbdev maintainer
again :(
- Joel and Mark are distracted to welcome to the new OCFS2 maintainer
- The backlight queue
- Small core kernel changes
- lib/ updates
- The rtc queue
- Various random bits
* akpm: (164 commits)
rtc: rtc-davinci: use devm_*() functions
rtc: rtc-max8997: use devm_request_threaded_irq()
rtc: rtc-max8907: use devm_request_threaded_irq()
rtc: rtc-da9052: use devm_request_threaded_irq()
rtc: rtc-wm831x: use devm_request_threaded_irq()
rtc: rtc-tps80031: use devm_request_threaded_irq()
rtc: rtc-lp8788: use devm_request_threaded_irq()
rtc: rtc-coh901331: use devm_clk_get()
rtc: rtc-vt8500: use devm_*() functions
rtc: rtc-tps6586x: use devm_request_threaded_irq()
rtc: rtc-imxdi: use devm_clk_get()
rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
rtc: rtc-pcf8583: use dev_warn() instead of printk()
rtc: rtc-sun4v: use pr_warn() instead of printk()
rtc: rtc-vr41xx: use dev_info() instead of printk()
rtc: rtc-rs5c313: use pr_err() instead of printk()
rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
rtc: rtc-ds2404: use dev_err() instead of printk()
rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
...