often in the ring buffer. At first I thought it had to do with some
of the changes I was working on, but then testing something else I
realized that the bug was in 3.17 itself. I ran several bisects as the
bug was not very reproducible, and finally came up with the commit
that I could reproduce easily within a few minutes, and without the change
I could run the tests over an hour without issue. The change fit the
bug and I figured out a fix. That bad commit was:
Commit 651e22f270 "ring-buffer: Always reset iterator to reader page"
This commit fixed a bug, but in the process created another one. It used
the wrong value as the cached value that is used to see if things changed
while an iterator was in use. This made it look like a change always
happened, and could cause the iterator to go into an infinite loop.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJULvrkAAoJEKQekfcNnQGuD+sH/iHPE2qb2ojCP9+hqOpszdd1
d8rN8BNZsDlJxfWQELw2vGVXTmeW7txW5DFWQ3I8qSSjwYa6l27M4mHsw2QLagtw
kIrcazis3IAcYCH8OE4ruD5nAGYLFqRIt0MOa/NAJD0r00xM7nvOhII2+6uAXF+A
1JbQDRq8eleCKMUMV0XchqWx6pYTXL8cLh1YEXZ0BTUFKIz+y22HjWnMf+odDhLB
okQic67/+i7mJDAAW4U+pyevd0QBZdDOohjQtbj+irv2pb7WtWqylKcYhAYSpgsy
MtPzzYyPDs/aHLNcnIJVdVtbKfNXsaHuCgEvKKgLXnKMMcS5UxSIxj+Q1IxSIOM=
=B7HS
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull trace ring buffer iterator fix from Steven Rostedt:
"While testing some new changes for 3.18, I kept hitting a bug every so
often in the ring buffer. At first I thought it had to do with some
of the changes I was working on, but then testing something else I
realized that the bug was in 3.17 itself. I ran several bisects as
the bug was not very reproducible, and finally came up with the commit
that I could reproduce easily within a few minutes, and without the
change I could run the tests over an hour without issue. The change
fit the bug and I figured out a fix. That bad commit was:
Commit 651e22f270 "ring-buffer: Always reset iterator to reader page"
This commit fixed a bug, but in the process created another one. It
used the wrong value as the cached value that is used to see if things
changed while an iterator was in use. This made it look like a change
always happened, and could cause the iterator to go into an infinite
loop"
* tag 'trace-fixes-v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Fix infinite spin in reading buffer
Pull cifs/smb3 fixes from Steve French:
"Fix for CIFS/SMB3 oops on reconnect during readpages (3.17 regression)
and for incorrectly closing file handle in symlink error cases"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix readpages retrying on reconnects
Fix problem recognizing symlinks
Herton R. Krzesinski says:
====================
Small fixes/changes for RDS
I got a report of one issue within RDS (after investigation it was a double
free), and I'm sending the fix (patch 3/3) which reporter said it works (no more
WARNING triggered on a specially instrumented kernel). The report/test was done
on a very old kernel (RHEL 5, 2.6.18 based with backports), but the problem the
patch handles still exists and should not change. Besides that, while
reviewing some of the code but being unable to reproduce with rds_tcp, I
noticed two small improvements/fixes which are in patches 1 and 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I got a report of a double free happening at RDS slab cache. One
suspicion was that may be somewhere we were doing a sock_hold/sock_put
on an already freed sock. Thus after providing a kernel with the
following change:
static inline void sock_hold(struct sock *sk)
{
- atomic_inc(&sk->sk_refcnt);
+ if (!atomic_inc_not_zero(&sk->sk_refcnt))
+ WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n",
+ sk, sk->sk_family);
}
The warning successfuly triggered:
Trying to hold sock already gone: ffff81f6dda61280 (family: 21)
WARNING: at include/net/sock.h:350 sock_hold()
Call Trace:
<IRQ> [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b
[<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf
[<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc
[<ffffffff8009899a>] tasklet_action+0x8f/0x12b
[<ffffffff800125a2>] __do_softirq+0x89/0x133
[<ffffffff8005f30c>] call_softirq+0x1c/0x28
[<ffffffff8006e644>] do_softirq+0x2c/0x7d
[<ffffffff8006e4d4>] do_IRQ+0xee/0xf7
[<ffffffff8005e625>] ret_from_intr+0x0/0xa
<EOI>
Looking at the call chain above, the only way I think this would be
possible is if somewhere we already released the same socket->sock which
is assigned to the rds_message at rds_send_remove_from_sock. Which seems
only possible to happen after the tear down done on rds_release.
rds_release properly calls rds_send_drop_to to drop the socket from any
rds_message, and some proper synchronization is in place to avoid race
with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see
a very narrow window where it may be possible we touch a sock already
released: when rds_release races with rds_send_drop_acked, we check
RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this
specific case we don't clear rm->m_rs. In this case, it seems we could
then go on at rds_send_drop_to and after it returns, the sock is freed
by last sock_put on rds_release, with concurrently we being at
rds_send_remove_from_sock; then at some point in the loop at
rds_send_remove_from_sock we process an rds_message which didn't have
rm->m_rs unset for a freed sock, and a possible sock_hold on an sock
already gone at rds_release happens.
This hopefully address the described condition above and avoids a double
free on "second last" sock_put. In addition, I removed the comment about
socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to
in rds_release and we should have things properly serialized there, thus
I can't see the comment being accurate there.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I see two problems if we consider the sock->ops->connect attempt to fail in
rds_tcp_conn_connect. The first issue is that for example we don't remove the
previously added rds_tcp_connection item to rds_tcp_tc_list at
rds_tcp_set_callbacks, which means that on a next reconnect attempt for the
same rds_connection, when rds_tcp_conn_connect is called we can again call
rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list,
leading to list corruption: to avoid this just make sure we call
properly rds_tcp_restore_callbacks before we exit. The second issue
is that we should also release the sock properly, by setting sock = NULL
only if we are returning without error.
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer says:
====================
qdisc: bulk dequeue support
This patchset uses DaveM's recent API changes to dev_hard_start_xmit(),
from the qdisc layer, to implement dequeue bulking.
Patch01: "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"
- Implement basic qdisc dequeue bulking
- This time, 100% relying on BQL limits, no magic safe-guard constants
Patch02: "qdisc: dequeue bulking also pickup GSO/TSO packets"
- Extend bulking to bulk several GSO/TSO packets
- Seperate patch, as it introduce a small regression, see test section.
We do have a patch03, which exports a userspace tunable as a BQL
tunable, that can byte-cap or disable the bulking/bursting. But we
could not agree on it internally, thus not sending it now. We
basically strive to avoid adding any new userspace tunable.
Testing patch01:
================
Demonstrating the performance improvement of qdisc dequeue bulking, is
tricky because the effect only "kicks-in" once the qdisc system have a
backlog. Thus, for a backlog to form, we need either 1) to exceed wirespeed
of the link or 2) exceed the capability of the device driver.
For practical use-cases, the measureable effect of this will be a
reduction in CPU usage
01-TCP_STREAM:
--------------
Testing effect for TCP involves disabling TSO and GSO, because TCP
already benefit from bulking, via TSO and especially for GSO segmented
packets. This patch view TSO/GSO as a seperate kind of bulking, and
avoid further bulking of these packet types.
The measured perf diff benefit (at 10Gbit/s) for a single netperf
TCP_STREAM were 9.24% less CPU used on calls to _raw_spin_lock()
(mostly from sch_direct_xmit).
If my E5-2695v2(ES) CPU is tuned according to:
http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html
Then it is possible that a single netperf TCP_STREAM, with GSO and TSO
disabled, can utilize all bandwidth on a 10Gbit/s link. This will
then cause a standing backlog queue at the qdisc layer.
Trying to pressure the system some more CPU util wise, I'm starting
24x TCP_STREAMs and monitoring the overall CPU utilization. This
confirms bulking saves CPU cycles when it "kicks-in".
Tool mpstat, while stressing the system with netperf 24x TCP_STREAM, shows:
* Disabled bulking: sys:2.58% soft:8.50% idle:88.78%
* Enabled bulking: sys:2.43% soft:7.66% idle:89.79%
02-UDP_STREAM
-------------
The measured perf diff benefit for UDP_STREAM were 6.41% less CPU used
on calls to _raw_spin_lock(). 24x UDP_STREAM with packet size -m 1472 (to
avoid sending UDP/IP fragments).
03-trafgen driver test
----------------------
The performance of the 10Gbit/s ixgbe driver is limited due to
updating the HW ring-queue tail-pointer on every packet. As
previously demonstrated with pktgen.
Using trafgen to send RAW frames from userspace (via AF_PACKET), and
forcing it through qdisc path (with option --qdisc-path and -t0),
sending with 12 CPUs.
I can demonstrate this driver layer limitation:
* 12.8 Mpps with no qdisc bulking
* 14.8 Mpps with qdisc bulking (full 10G-wirespeed)
Testing patch02:
================
Testing Bulking several GSO/TSO packets:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec for 10G while transmitting
approx 813Kpps).
Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec, for 10G
while transmitting approx 813Kpps).
Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.
Corrosponding to:
(10000*10^6)*((0.50-0.47)/10^3)/8 = 37500 bytes
(10000*10^6)*((0.50-0.38)/10^3)/8 = 150000 bytes
37500/1500 = 25 pkts
150000/1500 = 100 pkts
Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
diff of 0.12ms corrosponding to 1500 bytes at 100Mbit/s. Bulking
several GSOs together, result in a stable high-prio queue delay of
2.23ms.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The TSO and GSO segmented packets already benefit from bulking
on their own.
The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.
The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit 53fda7f7f9 ("Merge
branch 'xmit_list'"), specifically via commit 7f2e870f2a ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list. And via commit
ce93718fb7 ("net: Don't keep around original SKB when we
software segment GSO frames.").
This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.
Testing:
Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).
Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).
Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.
Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.
This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().
The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
(1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet. The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
(2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.
One restriction of the new API is that every SKB must belong to the
same TXQ. This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.
Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()). In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit(). In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits. In-effect bulking will only happen for BQL enabled drivers.
Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.
For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets. They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.
Jointed work with Hannes, Daniel and Florian.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the ethernet driver for Agere et131x devices to
drivers/net/ethernet.
The driver being added has been in the staging tree for some time, and will be
removed from there in a seperate patch. This one merely disables the staging
version to prevent two instances being built.
Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm fixes from Dave Airlie:
"Nothing too major or scary.
One i915 regression fix, nouveau has a tmds regression fix, along with
a regression fix for the runtime pm code for optimus laptops not
restoring the display hw correctly"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: make sure display hardware is reinitialised on runtime resume
drm/nouveau: punt fbcon resume out to a workqueue
drm/nouveau: fix regression on original nv50 board
drm/nv50/disp: fix dpms regression on certain boards
drm/i915: Flush the PTEs after updating them before suspend
Function prototypes are never definitions, so remove any 'extern' keyword
from the funcion prototypes in cpu_ops.h. Fixes warnings emited by
checkpatch.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
of_device_ids (i.e. compatible strings and the respective data) are not
supposed to change at runtime. All functions working with of_device_ids
provided by <linux/of.h> work with const of_device_ids. So mark the
only non-const struct in arch/arm64 as const, too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We have to register the notifiers in the masquerade expression from
the the module _init and _exit path.
This fixes crashes when removing the masquerade rule with no
ipt_MASQUERADE support in place (which was masking the problem).
Fixes: 9ba1f72 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The uas driver uses the block layer tag for USB3 stream IDs. With
blk-mq we can get larger tag numbers that the queue depth, which breaks
this assumption. A fix is under way for 3.18, but sits on top of
large changes so can't easily be backported. Set the disable_blk_mq
path so that a uas device can't easily crash the system when using
blk-mq for SCSI.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
hwreg_present() and hwreg_write() temporarily change the VBR register to
another vector table. This table contains a valid bus error handler
only, all other entries point to arbitrary addresses.
If an interrupt comes in while the temporary table is active, the
processor will start executing at such an arbitrary address, and the
kernel will crash.
While most callers run early, before interrupts are enabled, or
explicitly disable interrupts, Finn Thain pointed out that macsonic has
one callsite that doesn't, causing intermittent boot crashes.
There's another unsafe callsite in hilkbd.
Fix this for good by disabling and restoring interrupts inside
hwreg_present() and hwreg_write().
Explicitly disabling interrupts can be removed from the callsites later.
Reported-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
This reverts commit a86713b153.
Kevin Hilman writes:
Multiple boot failures on ARM[1] were bisected down to this
patch.
How was this patch tested, and on which platforms?
Also, the changelog states that this should be done only for
UART_CAP_SLEEP, but the patch does it for every UART.
Greg, I suggest this patch be dropped from tty-next until it has
been better described and tested.
[1] http://lists.linaro.org/pipermail/kernel-build-reports/2014-October/005550.html
Reported-by: Kevin Hilman <khilman@kernel.org>
Cc: Sudhir Sreedharan <ssreedharan@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- A recent cpufreq core fix went too far and introduced a regression
in the system suspend code path. Fix from Viresh Kumar.
- An ACPI-related commit in the i915 driver that fixed backlight
problems for some Thinkpads inadvertently broke a Dell machine
(in 3.16). Fix from Aaron Lu.
- The pcc-cpufreq driver was broken during the 3.15 cycle by a
commit that put wait_event() under a spinlock by mistake. Fix
that (Rafael J Wysocki).
- The return value type of integrator_cpufreq_remove() is void, but
should be int. Fix from Arnd Bergmann.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJULgB3AAoJEILEb/54YlRxeToP/R5grVhxifKrXl7t0+QYkzWx
Znm0ZLkKq4bjqXJiBTq5f3MehW3QHYfR/LXMdK+4YQqfZYoJfpa7Ui2XT8PMDyH6
IXwF81ABNVlcgeiAmHMmRslWuL/F3cMfWeFh49qdozhRrSV72Awljp8esRuMCDcu
p2OBf+gvxYWxxPzP7pgimsLBxxVfw9L7lL3cjogM1vU46bWz1SVP0dw5XEtBlWs5
z+R9KwStmskfxX50xmU/gxfVDapllh4OvbEsTOQVfjWyB9edkkLFSkqwfs3MaVwi
kJlYyPgEGTI7Fu6kMuVHdwCItqnKbDWqoXmIiY87708G9KQLEp4kQLpMJykCEYR1
jzJ/UxiQOfUHLKfvM2L8bdkLp8MPwsOvesW7oU9G1TLO7+t+ZQannW0E+JmBX00t
RtNAt5O4XiPigdd8+OUHjPeTVV+w6BTVN3Wd9BRUOzhcKfV2BlH2brmkFNmLeLrQ
rR3Tt6VyB61v6xCx6exWeBFVi/FuRI97Pur4Oc2J3GE0zjxfSv5sPByF6qDqX1hv
dgA/xrnIzROWXf4b9GVULcJZgHuTeA8p7PEDugYhLitgN/sk2HsxtURJ2plqnT8X
HSh+T3PM+uD3Lm0riF1KjTknrAHPLuDafySlhSlrNblYS7R1ggGQdnbSpyVneM1g
G449tBjoQnC2oPOTMUCR
=jOuf
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are three regression fixes (cpufreq core, pcc-cpufreq, i915 /
ACPI) and one trivial fix for a callback return value mismatch in the
cpufreq integrator driver.
Specifics:
- A recent cpufreq core fix went too far and introduced a regression
in the system suspend code path. Fix from Viresh Kumar.
- An ACPI-related commit in the i915 driver that fixed backlight
problems for some Thinkpads inadvertently broke a Dell machine (in
3.16). Fix from Aaron Lu.
- The pcc-cpufreq driver was broken during the 3.15 cycle by a commit
that put wait_event() under a spinlock by mistake. Fix that
(Rafael J Wysocki).
- The return value type of integrator_cpufreq_remove() is void, but
should be int. Fix from Arnd Bergmann"
* tag 'pm+acpi-3.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: update 'cpufreq_suspended' after stopping governors
ACPI / i915: Update the condition to ignore firmware backlight change request
cpufreq: integrator: fix integrator_cpufreq_remove return type
cpufreq: pcc-cpufreq: Fix wait_event() under spinlock
final regression fix for 3.17.
* tag 'drm-intel-fixes-2014-10-02' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Flush the PTEs after updating them before suspend
The runtime pm calls need to be done before populating the children via the
i2c_add_adapter call. If this is not done, a child can run into issues trying
to do i2c read/writes due to the pm_runtime_sync failing.
Signed-off-by: Andy Gross <agross@codeaurora.org>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
i2cdetect -q was broken (everything was a false positive, and no transfers were
actually being sent over i2c). The way it works is by sending a 0 length write
request and checking for NACK. This patch fixes the 0 length writes and actually
sends them.
Reported-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Alexandru M Stan <amstan@chromium.org>
Tested-by: Doug Anderson <dianders@chromium.org>
Tested-by: Max Schwarz <max.schwarz@online.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Merge fixes from Andrew Morton:
"5 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: page_alloc: fix zone allocation fairness on UP
perf: fix perf bug in fork()
MAINTAINERS: change git URL for mpc5xxx tree
mm: memcontrol: do not iterate uninitialized memcgs
ocfs2/dlm: should put mle when goto kill in dlm_assert_master_handler
The zone allocation batches can easily underflow due to higher-order
allocations or spills to remote nodes. On SMP that's fine, because
underflows are expected from concurrency and dealt with by returning 0.
But on UP, zone_page_state will just return a wrapped unsigned long,
which will get past the <= 0 check and then consider the zone eligible
until its watermarks are hit.
Commit 3a025760fc ("mm: page_alloc: spill to remote nodes before
waking kswapd") already made the counter-resetting use
atomic_long_read() to accomodate underflows from remote spills, but it
didn't go all the way with it.
Make it clear that these batches are expected to go negative regardless
of concurrency, and use atomic_long_read() everywhere.
Fixes: 81c0a2bb51 ("mm: page_alloc: fair zone allocator policy")
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process. This is bad..
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The repository for mpc5xxx has been moved, update git URL to new
location.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cgroup iterators yield css objects that have not yet gone through
css_online(), but they are not complete memcgs at this point and so the
memcg iterators should not return them. Commit d8ad305597 ("mm/memcg:
iteration skip memcgs not yet fully initialized") set out to implement
exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does
not meet the ordering requirements for memcg, and so the iterator may
skip over initialized groups, or return partially initialized memcgs.
The cgroup core can not reasonably provide a clear answer on whether the
object around the css has been fully initialized, as that depends on
controller-specific locking and lifetime rules. Thus, introduce a
memcg-specific flag that is set after the memcg has been initialized in
css_online(), and read before mem_cgroup_iter() callers access the memcg
members.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In dlm_assert_master_handler, the mle is get in dlm_find_mle, should be
put when goto kill, otherwise, this mle will never be released.
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJULb5NAAoJEAhfPr2O5OEVMUsQAJMfLQen1kX/kw2J7GRV2kQM
3MrQoBERrEi+QOZIfpsAAlytk3ebIVi5HCIXiTa7SUc18M5NoeOz4JiVVrTo5iDi
yJ5aLA0WiY31weRg4JMieGjjYJqyYBiadYPuft6GidX5Gqi5+2Ta/vbq6N4SE+Be
eIUXWDOKTbeinUt3b8AqwUt9S02A8DRjqGWQyn8jVSYUl/Db0+VBhw95z7CWOfjM
8MgDjbPYYoLBAkcMsZ4d/1Eyv7qT2miYKFQo7Udgd+Zi0jExFTkY6uFMFRSlFZ1D
g+h3/diHdmuTPNsiYQaESxSgNxdxfJk4nLhsxnP+E/71c6qUzYwAf6peXvl3JE39
E7sdDKlmQjoECUUuWBiJffcEg5VG/jKF2paKqCm+ti5jM/c4WQc/kQaIWk63hSYG
on9Y6t9xyGWHoIylDtqI3m3mwJfituxCshBl3FAenXNDq45qU4BUWRzpsoJK6P6U
znp25UXborgkvPDb1/IlDqrxLZ9dXP7OKHNUbW/tbMAqpLWH6GEDHWfkvPMI0sMm
oaYPPp2wj/1FAEZzTIyUH4BHUnob6aKEa/WaOoO87qLshmShkWMamMrEVer59CR6
CdhFslYNRkvhLyE+ywj9XSbG8zE7HQE6vdGFxfvcUFmFTj2jlHmZWL6cv+Zs4E72
sCo7y0KxnQmJjc0vwxLq
=hkpj
-----END PGP SIGNATURE-----
Merge tag 'media/v3.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"One last time regression fix at em28xx. The removal of .reset_resume
broke suspend/resume on this driver for some devices.
There are more fixes to be done for em28xx suspend/resume to be better
handled, but I'm opting to let them to stay for a while at the media
devel tree, in order to get more tests. So, for now, let's just
revert this patch"
* tag 'media/v3.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "[media] media: em28xx - remove reset_resume interface"
Commit 651e22f270 "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.
A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.
The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.
Commit 651e22f270 changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.
Fixes: 651e22f270 "ring-buffer: Always reset iterator to reader page"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Use min_t instead of min function in emxx_udc.c
Fix checkpatch.pl warnings:
WARNING: min() should probably be min_t(u32, iBufSize, ep->ep.maxpacket)
WARNING: min() should probably be min_t(u32, data_size, ep->ep.maxpacket)
WARNING: min() should probably be min_t(u16, udc->ctrl.wLength, sizeof(status_data))
Changes in v2:
- Fixed min function call as min_t
Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following patch fixes the checkpatch.pl warning:
drivers/staging/media/omap4iss/iss_csi2.c:811 warning: else is not generally useful after a break or return
Signed-off-by: Yeliz Taneroglu <yeliztaneroglu@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following patch fixes the checkpatch.pl warning:
drivers/staging/media/omap4iss/iss_ipipe.c:184 warning: else is not generally useful after a break or return
Signed-off-by: Yeliz Taneroglu <yeliztaneroglu@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch extends the start and end address of initrd to be page aligned,
so that we can free all memory including the un-page aligned head or tail
page of initrd, if the start or end address of initrd are not page
aligned, the page can't be freed by free_initrd_mem() function.
Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>