Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
a flurry of hundreds of warnings at kernel/res_counter.c:96, where
res_counter_uncharge_locked() does WARN_ON(counter->usage < val).
The first trace of each flurry implicates __mem_cgroup_cancel_charge()
of mc.precharge, and an audit of mc.precharge handling points to
mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e8
("memcg: avoid THP split in task migration").
Checking !mc.precharge is good everywhere else, when a single page is to
be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
follow, is liable to result in underflow (a lot can change since the
precharge was estimated).
Simply check against HPAGE_PMD_NR: there's probably a better
alternative, trying precharge for more, splitting if unsuccessful; but
this one-liner is safer for now - no kernel/res_counter.c:96 warnings
seen in 26 hours.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following compile error caused by commit 39febc0 (mtd: nand: gpmi:
adopt pinctrl support).
CC drivers/mtd/nand/gpmi-nand/gpmi-nand.o
drivers/mtd/nand/gpmi-nand/gpmi-nand.c: In function ‘acquire_resources’:
drivers/mtd/nand/gpmi-nand/gpmi-nand.c:499:45: error: ‘pdev’ undeclared (first use in this function)
Reported-by: Subodh Nijsure <snijsure@grid-net.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Add the Kconfig/Makefile stuff for the palmas regulator driver
Signed-off-by: Graeme Gregory <gg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Palmas has both Switched Mode (SMPS) and Linear (LDO) regulators in it.
This regulator driver allows software control of these regulators.
The regulators available on Palmas series chips vary depending on the muxing.
This is handled automatically in the driver by reading the mux info from OTP.
Signed-off-by: Graeme Gregory <gg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
If the allfrag feature has been set on a host route (due to an ICMPv6
Packet Too Big received indicating a MTU of less than 1280), we hit a
very slow behavior in TCP stack, because all big packets are dropped and
only a retransmit timer is able to push one MSS frame every 200 ms.
One way to handle this is to disable GSO on the socket the first time a
super packet is dropped. Adding a specific dst_allfrag() in the fast
path is probably overkill since the dst_allfrag() case almost never
happen.
Result on netperf TCP_STREAM, one flow :
Before : 60 kbit/sec
After : 1.6 Gbit/sec
Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update our reference driver to use netdev_alloc_frag() API instead of
the temporary custom allocator I introduced in commit 8d4057a938
(tg3: provide frags as skb head)
This removes the memory leak we had, since we could leak one page at
device dismantle.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call consume_skb() in place of kfree_skb() were appropriate.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Repeat pull of soc-core to bring in a bugfix.
* 'soc-core' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/renesas:
ARM: mach-shmobile: sh73a0: fixup PINT/IRQ16-IRQ31 irq number conflict
Mostly bool conversions, some inline removals and const additions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already unconditionally dereference 'sk' via lock_sock(sk) earlier
in this function, and our caller (sock_do_ioctl()) makes takes similar
liberties.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quoting Tore Anderson from :
If the allfrag feature has been set on a host route (due to an ICMPv6
Packet Too Big received indicating a MTU of less than 1280),
TCP SYN/ACK packets to that destination appears to get an incorrect
TCP checksum. This in turn means they are thrown away as invalid.
In the case of an IPv4 client behind a link with a MTU of less than
1260, accessing an IPv6 server through a stateless translator,
this means that the client can only download a single large file
from the server, because once it is in the server's routing cache
with the allfrag feature set, new TCP connections can no longer
be established.
</endquote>
It appears ip6_fragment() doesn't handle CHECKSUM_PARTIAL properly.
As network drivers are not prepared to fetch correct transport header, a
safe fix is to call skb_checksum_help() before fragmenting packet.
Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move blocks of code around to avoid function prototypes.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce and use a debug macro to test and print.
Convert printks to pr_<level>.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just some stylings.
Use #include <linux... not #include <asm...
Convert a test and print to a printk_once.
Combine an "if (foo) { if (bar) {" to single "if (foo && bar) {"
to save an indent level.
Convert single line "if (foo) bar;" to multiple lines.
Move some braces.
Align some long lines a bit better.
Long lines and printks with KERN_ checkpatch complaints
still exist.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neaten the comments and reflow the code without
changing anything other than whitespace.
git diff -w shows just comment neatening and a few
line removals.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol. This should make it a lot more
clear what action the programmer needs to take and should help us find
the reason if additional symbol bugs show up.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols. Let the relocs program know that those should be treated as
relative symbols.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.
In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.
16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.
The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture. be compiled before building the arch/x86 tree.
[ hpa: accelerating this because it detects invalid absolute
relocations, a serious bug in binutils 2.22.52.0.x which currently
produces bad kernels. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
We need to use a different loop index for mlx4_counter_alloc() and for
device_create_file() iterations: the mlx4_counter_alloc() loop index
is used in the error flow to free counters.
If the same loop index is used for device_create_file() and, say, the
device_create_file() loop fails on the first iteration, the allocated
counters will not be freed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
It needs parentheses around the argument, so that it can be used with
complex arguments (e.g., "n+5").
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The current error flow code was releasing the IB connection object and
calling iscsi_destroy_endpoint() directly without going through the
reference counting mechanism introduced in commit 39ff05d ("IB/iser:
Enhance disconnection logic for multi-pathing"). This resulted in a
double free of the iscsi endpoint object, which causes a kernel NULL
pointer dereference. Fix that by plugging into the IB conn reference
counting correctly.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Enable IB ULPs to use a larger portion of the device EQs (which map to
IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs
to be divided among the device ports. In this scheme, for each IB
port, the number of allocated EQs follows the number of cores, subject
to other system constraints, such as number available MSI-X vectors.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When the thin pool target clears the discard_passdown parameter
internally, it incorrectly changes the table line reported to userspace.
This breaks dumb string comparisons on these table lines in generic
userspace device-mapper library code and leads to tables being reloaded
repeatedly when nothing is actually meant to be changing.
This patch corrects this by no longer changing the table line when
discard passdown was disabled.
We can still tell when discard passdown is overridden by looking for the
message "Discard unsupported by data device (sdX): Disabling discard passdown."
This automatic detection is also moved from the 'load' to the 'resume'
so that it is re-evaluated should the properties of underlying devices
change.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Without this patch, recovery will crash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUAT7bVGznsnt1WYoG5AQL0ohAAvpF47NkSnuqw9gko25/Ndr5akrIC9kfz
YNzxsd0FvNSEQipGS4KgBcRxnFxbcVkmHL+pN8McmiDxNew1LW2+zdtV47RJngnQ
2slTUiSBLSHcvnOFblBrwIh+XpMdhv4FEMG13Si4tQaETtHp3JiNDr2JqvwbuKbo
aO/nW38DFNjEcB59X+9npknZYaymvavxso4J7iH/ec0Iys2c2h+jX5afewCXnbYr
Q6X2YlySUiUM3AXbgV0QI/w6xDM0TD+WWguHq411pgU1wHMYpHelYYwRGHSU73TJ
061K5WB83Q53A8czeRXK33x0xGr/AhQesTel+mlV3eODHoGSnsYxH8Zja7sV220h
HgTICrSvAFL2cvBz8dqJQ59KYH3yNo4Cie5xzmBxCuX6yyTxizgrQTp7L/HUy7RL
eL1SqNFTRJcwSl3MwmgbL57saBLw+/ejV+og7PNBCXSS9gZg39uJyn1uTqJZTrcL
PuBaizNG9HCYuR/pRTMsQGNWrksqk9r+hxwCJDIGb7xh9SBFI82YtQfK9F5fSB84
vdmIec8yUvwS/Gxhz+I+YB3kqk/nwfFRbVGfBWvJdRBOR0DsZphsBapsJ83LZqFb
VAa2nQL+NoHOweixv6RgiH4Y96SNHdWkvDJjj7R01BUDdpHeHMZuff07Effgqd64
Knk64XvWKQc=
=s7vy
-----END PGP SIGNATURE-----
Merge tag 'md-3.4-fixes' of git://neil.brown.name/md
Pull one more md bugfix from NeilBrown:
"Fix bug in recent fix to RAID10.
Without this patch, recovery will crash"
* tag 'md-3.4-fixes' of git://neil.brown.name/md:
md/raid10: fix transcription error in calc_sectors conversion.
Pull tile tree bugfix from Chris Metcalf:
"This fixes a security vulnerability (and correctness bug) in tilegx"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tilegx: enable SYSCALL_WRAPPERS support
The old code was
sector_div(stride, fc);
the new code was
sector_dir(size, conf->near_copies);
'size' is right (the stride various wasn't really needed), but
'fc' means 'far_copies', and that is an important difference.
Signed-off-by: NeilBrown <neilb@suse.de>
Merge misc fixes from Andrew Morton.
* emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
frv: delete incorrect task prototypes causing compile fail
slub: missing test for partial pages flush work in flush_all()
fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.
Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around. Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.
Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a28784028e: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds. And it
just makes sense to update i_mode in the same place we update i_uid/gid.
Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead. However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor. We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows querying the QP state before flushing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Using kfifos for ID management was limiting the number of QPs and
preventing NP384 MPI jobs. So replace it with a simple bitmap
allocator.
Remove IDs from the IDR tables before deallocating them. This bug was
causing the BUG_ON() in insert_handle() to fire because the ID was
getting reused before being removed from the IDR table.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This allows dumping thousands of QPs. Log active open failures of
interest.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode. Automatically transition
to/from flow control mode when the active qp count crosses
db_fc_theshold.
Add more db debugfs stats
On DB DROP event from the LLD, recover all the iwarp queues.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use GFP_ATOMIC in _insert_handle() if ints are disabled.
Don't panic if we get an abort with no endpoint found. Just log a
warning.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Get FULL/EMPTY/DROP events from LLD. On FULL event, disable normal
user mode DB rings.
Add modify_qp semantics to allow user processes to call into the
kernel to ring doobells without overflowing.
Add DB Full/Empty/Drop stats.
Mark queues when created indicating the doorbell state.
If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.
Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can
know if the driver supports the kernel mode db ringing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
recover LLD EQs for DB drop interrupts. This includes adding a new
db_lock, a spin lock disabling BH too, used by the recovery thread and
the ring_tx_db() paths to allow db drop recovery.
Clean up initial DB avoidance code.
Add read_eq_indices() - this allows the LLD to use the PCIe mw to
efficiently read hw eq contexts.
Add cxgb4_sync_txq_pidx() - called by iw_cxgb4 to sync up the sw/hw
pidx value.
Add flush_eq_cache() and cxgb4_flush_eq_cache(). This allows iw_cxgb4
to flush the sge eq context cache before beginning db drop recovery.
Add module parameter, dbfoifo_int_thresh, to allow tuning the db
interrupt threshold value.
Add dbfifo_int_thresh to cxgb4_lld_info so iw_cxgb4 knows the threshold.
Add module parameter, dbfoifo_drain_delay, to allow tuning the amount
of time delay between DB FULL and EMPTY upcalls to iw_cxgb4.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add platform-specific callback functions for interrupts. This is
needed to do a single read-clear of the CAUSE register and then call
out to platform specific functions for DB threshold interrupts and DB
drop interrupts.
Add t4_mem_win_read_len() - mem-window reads for arbitrary lengths.
This is used to read the CIDX/PIDX values from EC contexts during DB
drop recovery.
Add t4_fwaddrspace_write() - sends addrspace write cmds to the fw.
Needed to flush the sge eq context cache.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The PRE_BUILD and POST_BUILD options of ktest are added to allow the
user to add temporary patch to the system and remove it on builds. This
is sometimes use to take a change from another git branch and add it to
a series without the fix so that this series can be tested, when an
unrelated bug exists in the series.
The problem comes when a tagged commit is being used. For example, if
v3.2 is being tested, and we add a patch to it, the kernelrelease for
that commit will be 3.2.0+, but without the patch the version will be
3.2.0. This can cause problems when the kernelrelease is determined for
creating the /lib/modules directory. The kernel booting has the '+' but
the module directory will not, and the modules will be missing for that
boot, and may not allow the kernel to succeed.
The fix is to put the creation of the kernelrelease in the POST_BUILD
logic, before it applies the POST_BUILD operation. The POST_BUILD is
where the patch may be removed, removing the '+' from the kernelrelease.
The calculation of the kernelrelease will also stay in its current
location but will be ignored if it was already calculated previously.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
BugLink: http://bugs.launchpad.net/bugs/955892
All failures from __d_path where being treated as disconnected paths,
however __d_path can also fail when the generated pathname is too long.
The initial ENAMETOOLONG error was being lost, and ENAMETOOLONG was only
returned if the subsequent dentry_path call resulted in that error. Other
wise if the path was split across a mount point such that the dentry_path
fit within the buffer when the __d_path did not the failure was treated
as a disconnected path.
Signed-off-by: John Johansen <john.johansen@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/978038
also affects apparmor portion of
BugLink: http://bugs.launchpad.net/bugs/987371
The unconfined profile is not stored in the regular profile list, but
change_profile and exec transitions may want access to it when setting
up specialized transitions like switch to the unconfined profile of a
new policy namespace.
Signed-off-by: John Johansen <john.johansen@canonical.com>
commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.
1) list_splice() arguments were in the wrong order
2) list_splice(list, head) has undefined behavior if head is not
initialized.
3) We should use the list_splice_init() variant to clear pktgen_threads
list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch re-spin.
Incorporated review comments by Ben Hutchings.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some discussion with the glibc mailing lists revealed that this was
necessary for 64-bit platforms with MIPS-like sign-extension rules
for 32-bit values. The original symptom was that passing (uid_t)-1 to
setreuid() was failing in programs linked -pthread because of the "setxid"
mechanism for passing setxid-type function arguments to the syscall code.
SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
proper sign-extension and is thus the appropriate fix for this problem.
On other platforms (s390, powerpc, sparc64, and mips) this was fixed
in 2.6.28.6. The general issue is tracked as CVE-2009-0029.
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
csummode variable is always CHECKSUM_NONE in ip6_append_data()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>