* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
i.MX SDMA: Fix burstsize settings
ARM: mach-shmobile: both USB DMAC instances on sh7372 are slave-only
dma: sh_dma: not all SH DMAC implementations support MEMCPY
at_hdmac: bugfix for enabling channel irq
dmaengine: fix missing 'cnt' in ?: in dmatest
- Fix breakage with MTD suspend caused by the API rework
- Fix a problem with resetting the MX28 BCH module
- A couple of other trivial fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEABECAAYFAk8s6HsACgkQdwG7hYl686MIiACgxpNoUWFvq8z+2UGXxsLnNrio
hhcAn31H7TY3KUuIQBo4CqG2dEjNwpCw
=DRWp
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.3' of git://git.infradead.org/~dwmw2/mtd-3.3
- Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1
- Fix breakage with MTD suspend caused by the API rework
- Fix a problem with resetting the MX28 BCH module
- A couple of other trivial fixes
* tag 'for-linus-3.3-20120204' of git://git.infradead.org/~dwmw2/mtd-3.3:
Revert "mtd: atmel_nand: optimize read/write buffer functions"
mtd: fix MTD suspend
jffs2: do not initialize variable unnecessarily
mtd: gpmi-nand bugfix: reset the BCH module when it is not MX23
mtd: nand: fix typo in comment
Commit bf118a342f (NFSv4: include bitmap
in nfsv4 get acl data) introduces the 'acl_scratch' page for the case
where we may need to decode multi-page data. However it fails to take
into account the fact that the variable may be NULL (for the case where
we're not doing multi-page decode), and it also attaches it to the
encoding xdr_stream rather than the decoding one.
The immediate result is an Oops in nfs4_xdr_enc_getacl due to the
call to page_address() with a NULL page pointer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: stable@vger.kernel.org
Store the last used mclk configuration for the PLL.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
can_get_echo_skb() is usually called in the TX complete handler.
The stats->tx_packets and stats->tx_bytes should be updated there, too.
This patch simplifies to figure out the size of the sent CAN frame.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This fixes the race in process_vm_core found by Oleg (see
http://article.gmane.org/gmane.linux.kernel/1235667/
for details).
This has been updated since I last sent it as the creation of the new
mm_access() function did almost exactly the same thing as parts of the
previous version of this patch did.
In order to use mm_access() even when /proc isn't enabled, we move it to
kernel/fork.c where other related process mm access functions already
are.
Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a possible data corruption if an RNDIS message goes beyond page
boundary in the sending code path. This patch fixes the problem.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bugs, x86: Fix printk levels for panic, softlockups and stack dumps
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf top: Fix number of samples displayed
perf tools: Fix strlen() bug in perf_event__synthesize_event_type()
perf tools: Fix broken build by defining _GNU_SOURCE in Makefile
x86/dumpstack: Remove unneeded check in dump_trace()
perf: Fix broken interrupt rate throttling
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW
sched: Fix ancient race in do_exit()
sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug
sched/s390: Fix compile error in sched/core.c
sched: Fix rq->nr_uninterruptible update race
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/reboot: Remove VersaLogic Menlow reboot quirk
x86/reboot: Skip DMI checks if reboot set by user
x86: Properly parenthesize cmpxchg() macro arguments
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
lib: Fix 32-bit sparc udiv_qrnnd() definition in mpilib's longlong.h
lib: Fix multiple definitions of clz_tab
lib/digsig: checks for NULL return value
lib/mpi: added missing NULL check
lib/mpi: added comment on divide by 0 case
lib/mpi: check for possible zero length
lib/digsig: pkcs_1_v1_5_decode_emsa cleanup
lib/digsig: additional sanity checks against badly formated key payload
lib/mpi: removed unused functions
lib/mpi: checks for zero divisor length
lib/mpi: return error code on dividing by zero
lib/mpi: replaced MPI_NULL with normal NULL
lib/mpi: added missing NULL check
The usb/ch9.h will be installed to /usr/include/linux,
and be used from user space.
But le16_to_cpu() is only defined for kernel code.
Without this patch, user space compile will be broken.
Special thanks to Stefan Becker
Reported-by: Stefan Becker <chemobejk@gmail.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes merge conflict resolution breakage introduced by merge
d3712b9dfc ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream").
The commit changed 'mtd_can_have_bb()' function and made it always
return zero, which is incorrect. Instead, we need it to return whether
the underlying flash device can have bad eraseblocks or not. UBI needs
this information because it affects how it handles the underlying flash.
E.g., if the underlying flash is NOR, it cannot have bad blocks and any
write or erase error is fatal, and all we can do is to switch to R/O
mode. We do not need to reserve a pool of good eraseblocks for bad
eraseblocks handling, and so on.
This patch also removes 'mtd_can_have_bb()' invocations from Logfs to
ensure correct Logfs behavior.
I've tested that with this patch UBI works on top of NOR and NAND
flashes emulated by mtdram and nandsim correspondingly.
This patch is based on patch from Linus Torvalds.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Jörn Engel <joern@logfs.org>
Acked-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A struct device parameter is used in the enable and disable callbacks to
distinguish between different gpio_keys devices.
Platforms that don't use these callbacks may not include struct device
at all, as seen on arch/arm/mach-s3c2410/mach-n30.c
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Add a flag to allow platforms to specify, whether a DMAC instance supports
the MEMCPY operation. To avoid regressions, preserve the current default.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
MPI_NULL is replaced with normal NULL.
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Reviewed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs
in the below lines of bdi_dirty_limit():
bdi_dirty *= numerator;
do_div(bdi_dirty, denominator);
1) divide error: do_div() only uses the lower 32 bit of the denominator,
which may trimmed to be 0 when PROP_MAX_SHIFT > 32.
2) overflow: (bdi_dirty * numerator) could easily overflow if numerator
used up to 48 bits, leaving only 16 bits to bdi_dirty
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
This patch makes sure we use appropriate memory barriers before
publishing tp->md5sig_info, allowing tcp_md5_do_lookup() being used from
tcp_v4_send_reset() without holding socket lock (upcoming patch from
Shawn Lu)
Note we also need to respect rcu grace period before its freeing, since
we can free socket without this grace period thanks to
SLAB_DESTROY_BY_RCU
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few important bug fixes for LogFS
Shortlog:
Joern Engel (5):
logfs: Prevent memory corruption
logfs: remove useless BUG_ON
logfs: Free areas before calling generic_shutdown_super()
logfs: Grow inode in delete path
Logfs: Allow NULL block_isbad() methods
Prasad Joshi (5):
logfs: update page reference count for pined pages
logfs: take write mutex lock during fsync and sync
logfs: set superblock shutdown flag after generic sb shutdown
logfs: Propagate page parameter to __logfs_write_inode
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
Diffstat:
MAINTAINERS | 1 +
fs/logfs/dev_mtd.c | 26 +++++++++++-------------
fs/logfs/dir.c | 2 +-
fs/logfs/file.c | 2 +
fs/logfs/gc.c | 2 +-
fs/logfs/inode.c | 4 ++-
fs/logfs/journal.c | 1 -
fs/logfs/logfs.h | 5 +++-
fs/logfs/readwrite.c | 51 +++++++++++++++++++++++++++++++++----------------
fs/logfs/segment.c | 51 ++++++++++++++++++++++++++++++++++++++-----------
fs/logfs/super.c | 3 +-
11 files changed, 99 insertions(+), 49 deletions(-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPKByhAAoJEDFA/f+3K+ZNQSUP/3gACcIwcsl+FnXPWBtz9XIG
g0DjXoRDd/sR0u25nLgjCVdBJgx5FVEyA+PvLgvvUU2KCAsqI5F/EQ+fLJs21YEN
TzepBO5aHtFZbNEjo6WiXOlDbBePTtk44WrN6jqoCHM/aDeT4Wof3NZBmHWNN1PX
B2RtEZ0ypJ7/b1OY2LUNcQfTaJXNgVoP8Hkx4KGY5LUVxVrBXxvDTU7YbkS8a+ys
1Yje/EQ4XD4RyZB42TmFEuTenvGPRgMGVFdnkJKuON8EmJQ8Hc61jEf5d7Q8sWef
dH5F/ptoAaR9a9LbbO8LoYuBZ8MR8848NPsrNPpr/gWntj46Z79yII8Jr7YoSDyw
zq5G2dZbwlbVrtVWKGae47THkNB8bljR/g4cijvPAkvuIAku6mg+dgjVHAhZ/t+J
xu8+Gy2sWHUH2gmoSXuoNyppOvYpPIRd5RB16PizMH3bw+sMad2K8/rfOKnmF1/r
HTM2jZ5bDcHVDjSuVI6u2m/mQX+PmPXUTffreaFXuSI75YpT0dqN3nponTX4EgFI
Ad9ZBQvdg8w1LGDsNxIAaqrGx4Q87RxqfUV4W/wo6N8gKsp+I2y4GtYMeD/CEKyi
wncKg10YwoMXZj7cBAkWgPlgrOBYCPwYZc/1DVRHvqrHo/m13SJrWDKkNKVvoXzH
2y4Tfi5w1WDRUT7yeoyK
=TA1A
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream
There are few important bug fixes for LogFS
* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
Logfs: Allow NULL block_isbad() methods
logfs: Grow inode in delete path
logfs: Free areas before calling generic_shutdown_super()
logfs: remove useless BUG_ON
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
logfs: Propagate page parameter to __logfs_write_inode
logfs: set superblock shutdown flag after generic sb shutdown
logfs: take write mutex lock during fsync and sync
logfs: Prevent memory corruption
logfs: update page reference count for pined pages
Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
"mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL
block_isbad() methods" clashing with the abstraction changes in the
commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and
d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly".
This resolution takes the semantics from commit f2933e86ad, and just
makes mtd_block_isbad() return zero (false) if the 'block_isbad'
function is NULL. But that also means that now "mtd_can_have_bb()"
always returns 0.
Now, "mtd_block_markbad()" will obviously return an error if the
low-level driver doesn't support bad blocks, so this is somewhat
non-symmetric, but it actually makes sense if a NULL "block_isbad"
function is considered to mean "I assume that all my blocks are always
good".
In order to be able to support proper RST messages for TCP MD5 flows, we
need to allow access to MD5 keys without locking listener socket.
This conversion is a nice cleanup, and shrinks size of timewait sockets
by 80 bytes.
IPv6 code reuses generic code found in IPv4 instead of duplicating it.
Control path uses GFP_KERNEL allocations instead of GFP_ATOMIC.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shawn Lu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow to set mcs masks through nl80211. We also allow to set MCS
rates but no legacy rates (and vice versa).
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
text data bss dec hex filename
8455963 532732 1810804 10799499 a4c98b vmlinux.o.before
8448899 532732 1810804 10792435 a4adf3 vmlinux.o
This change also removes commented-out copy of __nlmsg_put
which was last touched in 2005 with "Enable once all users
have been converted" comment on top.
Changes in v2: rediffed against net-next.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing major, largest thing here is the removal of some drivers that
did not work at all. Other than that, the normal collection of bugfixes
and new device ids.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk8m8JEACgkQMUfUDdst+ymCFQCeNhTHopHy1PQbuCDwk8bSH4DW
1/YAn2k0YaaCrOo0HCzOslAVX18vGnWl
=TNNB
-----END PGP SIGNATURE-----
Merge tag 'usb-3.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Here are a bunch of USB patches for 3.3-rc1.
Nothing major, largest thing here is the removal of some drivers that
did not work at all. Other than that, the normal collection of bugfixes
and new device ids.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* tag 'usb-3.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (52 commits)
uwb & wusb: fix kconfig error
USB: Realtek cr: fix autopm scheduling while atomic
USB: ftdi_sio: Add more identifiers
xHCI: Cleanup isoc transfer ring when TD length mismatch found
usb: musb: omap2430: minor cleanups.
qcaux: add more Pantech UML190 and UML290 ports
Revert "drivers: usb: Fix dependency for USB_HWA_HCD"
usb: mv-otg - Fix build if CONFIG_USB is not set
USB: cdc-wdm: Avoid hanging on interface with no USB_CDC_DMM_TYPE
usb: add support for STA2X11 host driver
drivers: usb: Fix dependency for USB_HWA_HCD
kernel-doc: fix new warning in usb.h
USB: OHCI: fix new compiler warnings
usb: serial: kobil_sct: fix compile warning:
drivers/usb/host/ehci-fsl.c: add missing iounmap
USB: cdc-wdm: better allocate a buffer that is at least as big as we tell the USB core
USB: cdc-wdm: call wake_up_all to allow driver to shutdown on device removal
USB: cdc-wdm: use two mutexes to allow simultaneous read and write
USB: cdc-wdm: updating desc->length must be protected by spin_lock
USB: usbsevseg: fix max length
...
Commits 3fe4bae884 and
079c985e7a broke MTD suspend in 2 ways:
1. When the '->suspend' method is not present, we return -EOPNOTSUPP, but
the callers of 'mtd_suspend()' expects 0 instead.
2. Checking of the 'mtd' parameter against NULL has been incorrectly removed
in 'mtd_cls_suspend()'.
This patch fixes the breakages. This has been found, analyzed, reported
and tested by Rafael J. Wysocki <rjw@sisk.pl>.
Note, this patch is not needed in the stable tree because it causes a
regression introduced during the v3.3 merge window.
Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commit 2aede851dd
PM / Hibernate: Freeze kernel threads after preallocating memory
introduced a mechanism by which kernel threads were frozen after
the preallocation of hibernate image memory to avoid problems with
frozen kernel threads not responding to memory freeing requests.
However, it overlooked the s2disk code path in which the
SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE,
which caused freeze_workqueues_begin() to BUG(), because it saw
that worqueues had been already frozen.
Although in principle this issue might be addressed by removing
the relevant BUG_ON() from freeze_workqueues_begin(), that would
reintroduce the very problem that commit 2aede851dd
attempted to avoid into that particular code path. For this reason,
to fix the issue at hand, introduce thaw_kernel_threads() and make
the SNAPSHOT_FREE ioctl execute it.
Special thanks to Srivatsa S. Bhat for detailed analysis of the
problem.
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: stable@kernel.org
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk8jKW4ACgkQMUfUDdst+ynAUwCfVWwHJxpb4DSSMVZhGOnHMQrL
ZjIAn00gPeSs5u8y1nPvFrFikbon4FDs
=bzVy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Here are some patches for the 3.3-rc1 tree.
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
* tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Complain bitterly about attempts to remove files from nonexistent directories.
stable: update documentation to ask for kernel version
base/core.c:fix typo in comment in function device_add
Documentation: devres: add allocation functions to list of supported calls
Documentation update for the driver model core
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warnings in device.h
driver core: remove drivers/base/sys.c and include/linux/sysdev.h
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm: (31 commits)
ARM: 7304/1: ioremap: fix boundary check when reusing static mapping
ARM: 7301/1: Rename the T() macro to TUSER() to avoid namespace conflicts
ARM: 7299/1: ftrace: clear zero bit in reported IPs for Thumb-2
ARM: 7298/1: realview: fix mapping of MPCore private memory region
PCMCIA: fix sa1111 oops on remove
ARM: 7288/1: mach-sa1100: add missing module_init() call
ARM: 7297/1: smp_twd: make sure timer is stopped before registering it
ARM: 7296/1: proc-v7.S: remove HARVARD_CACHE preprocessor guards
ARM: 7295/1: cortex-a7: move proc_info out of !CONFIG_ARM_LPAE block
ARM: 7293/1: logical_cpu_map: decouple CPU mapping from SMP
ARM: 7291/1: cache: assume 64-byte L1 cachelines for ARMv7 CPUs
ARM: 7290/1: vmlinux.lds.S: align the exception fixup table to a 4-byte boundary
ARM: 7289/1: vmlinux.lds.S: do not hardcode cacheline size as 32 bytes
MFD: ucb1x00-ts: fix resume failure
MFD: ucb1x00-core: fix gpiolib direction_output handling
MFD: ucb1x00-core: fix missing restore of io output data on resume
MFD: mcp-core: fix mcp_priv() to be more type safe
MFD: mcp-core: fix complaints from the genirq layer
Revert "ARM: sa11x0: Implement autoloading of codec and codec pdata for mcp bus."
Revert "ARM: sa1100: Refactor mcp-sa11x0 to use platform resources."
...
Fix up conflict due to arch/arm/mach-mx5/Kconfig having been merged into
mach-imx5 (commit 784a90c0a7d8: "ARM i.MX: Merge i.MX5 support into
mach-imx"), but the ARM_L1_CACHE_SHIFT_6 entry was moved to be driven by
the CPU_V7 logic from it in the old location in rmk's branch (commit
a092f2b15399: "ARM: 7291/1: cache: assume 64-byte L1 cachelines for
ARMv7 CPUs").
A mesh node that joins the mesh network is by default a forwarding entity. This patch allows
the mesh node to set as non-forwarding entity. Whenever dot11MeshForwarding is set to 0, the
mesh node can prevent itself from forwarding the traffic which is not destined to him.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes the sampling interrupt throttling mechanism.
It was broken in v3.2. Events were not being unthrottled. The
unthrottling mechanism required that events be checked at each
timer tick.
This patch solves this problem and also separates:
- unthrottling
- multiplexing
- frequency-mode period adjustments
Not all of them need to be executed at each timer tick.
This third version of the patch is based on my original patch +
PeterZ proposal (https://lkml.org/lkml/2012/1/7/87).
At each timer tick, for each context:
- if the current CPU has throttled events, we unthrottle events
- if context has frequency-based events, we adjust sampling periods
- if we have reached the jiffies interval, we multiplex (rotate)
We decoupled rotation (multiplexing) from frequency-mode sampling
period adjustments. They should not necessarily happen at the same
rate. Multiplexing is subject to jiffies_interval (currently at 1
but could be higher once the tunable is exposed via sysfs).
We have grouped frequency-mode adjustment and unthrottling into the
same routine to minimize code duplication. When throttled while in
frequency mode, we scan the events only once.
We have fixed the threshold enforcement code in __perf_event_overflow().
There was a bug whereby it would allow more than the authorized rate
because an increment of hwc->interrupts was not executed at the right
place.
The patch was tested with low sampling limit (2000) and fixed periods,
frequency mode, overcommitted PMU.
On a 2.1GHz AMD CPU:
$ cat /proc/sys/kernel/perf_event_max_sample_rate
2000
We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000):
$ perf record -e cycles,cycles -c 700000 noploop 10
$ perf report -D | tail -21
Aggregated stats:
TOTAL events: 80086
MMAP events: 88
COMM events: 2
EXIT events: 4
THROTTLE events: 19996
UNTHROTTLE events: 19996
SAMPLE events: 40000
cycles stats:
TOTAL events: 40006
MMAP events: 5
COMM events: 1
EXIT events: 4
THROTTLE events: 9998
UNTHROTTLE events: 9998
SAMPLE events: 20000
cycles stats:
TOTAL events: 39996
THROTTLE events: 9998
UNTHROTTLE events: 9998
SAMPLE events: 20000
For 10s, the cap is 2x2000x10 = 40000 samples.
We get exactly that: 20000 samples/event.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org> # v3.2+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120126160319.GA5655@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It might be useful to get a counter of failed tcp_retransmit_skb()
calls.
Reported-by: Satoru Moriya <satoru.moriya@hds.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-greg' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: musb: omap2430: minor cleanups.
usb: dwc3: unmap the proper number of sg entries
usb: musb: fix shutdown while usb gadget is in use
usb: gadget: f_mass_storage: Use "bool" instead of "int" in fsg_module_parameters
usb: gadget: check for streams only for SS udcs
usb: gadget: fsl_udc: fix the usage of udc->max_ep
drivers: usb: otg: Fix dependencies for some OTG drivers
usb: renesas: silence uninitialized variable report in usbhsg_recip_run_handle()
usb: gadget: SS Isoc endpoints use comp_desc->bMaxBurst too
usb: gadget: storage: endian fix
usb: dwc3: ep0: fix compile warning
usb: musb: davinci: fix build breakage
usb: gadget: langwell: don't call gadget's disconnect()
usb: gadget: langwell: drop langwell_otg support
usb: otg: kill langwell_otg driver
usb: dwc3: ep0: tidy up Pending Request handling
Quoth Len:
"This fixes a merge-window regression due to a conflict
between error injection and preparation to remove atomicio.c
Here we fix that regression and complete the removal
of atomicio.c.
This also re-orders some idle initialization code to
complete the merge window series that allows cpuidle
to cope with bringing processors on-line after boot."
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
Use acpi_os_map_memory() instead of ioremap() in einj driver
ACPI, APEI, EINJ, cleanup 0 vs NULL confusion
ACPI, APEI, EINJ Allow empty Trigger Error Action Table
thermal: Rename generate_netlink_event
ACPI / PM: Add Sony Vaio VPCCW29FX to nonvs blacklist.
ACPI: Remove ./drivers/acpi/atomicio.[ch]
ACPI, APEI: Add RAM mapping support to ACPI
ACPI, APEI: Add 64-bit read/write support for APEI on i386
ACPI processor hotplug: Delay acpi_processor_start() call for hotplugged cores
ACPI processor hotplug: Split up acpi_processor_add
Davem says:
1) Fix JIT code generation on x86-64 for divide by zero, from Eric Dumazet.
2) tg3 header length computation correction from Eric Dumazet.
3) More build and reference counting fixes for socket memory cgroup
code from Glauber Costa.
4) module.h snuck back into a core header after all the hard work we
did to remove that, from Paul Gortmaker and Jesper Dangaard Brouer.
5) Fix PHY naming regression and add some new PCI IDs in stmmac, from
Alessandro Rubini.
6) Netlink message generation fix in new team driver, should only advertise
the entries that changed during events, from Jiri Pirko.
7) SRIOV VF registration and unregistration fixes, and also add a
missing PCI ID, from Roopa Prabhu.
8) Fix infinite loop in tx queue flush code of brcmsmac, from Stanislaw Gruszka.
9) ftgmac100/ftmac100 build fix, missing interrupt.h include.
10) Memory leak fix in net/hyperv do_set_mutlicast() handling, from Wei Yongjun.
11) Off by one fix in netem packet scheduler, from Vijay Subramanian.
12) TCP loss detection fix from Yuchung Cheng.
13) TCP reset packet MD5 calculation uses wrong address, fix from Shawn Lu.
14) skge carrier assertion and DMA mapping fixes from Stephen Hemminger.
15) Congestion recovery undo performed at the wrong spot in BIC and CUBIC
congestion control modules, fix from Neal Cardwell.
16) Ethtool ETHTOOL_GSSET_INFO is unnecessarily restrictive, from Michał Mirosław.
17) Fix triggerable race in ipv6 sysctl handling, from Francesco Ruggeri.
18) Statistics bug fixes in mlx4 from Eugenia Emantayev.
19) rds locking bug fix during info dumps, from your's truly.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
rds: Make rds_sock_lock BH rather than IRQ safe.
netprio_cgroup.h: dont include module.h from other includes
net: flow_dissector.c missing include linux/export.h
team: send only changed options/ports via netlink
net/hyperv: fix possible memory leak in do_set_multicast()
drivers/net: dsa/mv88e6xxx.c files need linux/module.h
stmmac: added PCI identifiers
llc: Fix race condition in llc_ui_recvmsg
stmmac: fix phy naming inconsistency
dsa: Add reporting of silicon revision for Marvell 88E6123/88E6161/88E6165 switches.
tg3: fix ipv6 header length computation
skge: add byte queue limit support
mv643xx_eth: Add Rx Discard and Rx Overrun statistics
bnx2x: fix compilation error with SOE in fw_dump
bnx2x: handle CHIP_REVISION during init_one
bnx2x: allow user to change ring size in ISCSI SD mode
bnx2x: fix Big-Endianess in ethtool -t
bnx2x: fixed ethtool statistics for MF modes
bnx2x: credit-leakage fixup on vlan_mac_del_all
macvlan: fix a possible use after free
...
This patch changes event message behaviour to send only updated records
instead of whole list. This fixes bug on which userspace receives non-actual
data in case multiple events occur in row.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix new kernel-doc warning:
Warning(include/linux/usb.h:1251): No description found for parameter 'num_mapped_sgs'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Pass information that quota is stored in system file to userspace
ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls
jbd: Issue cache flush after checkpointing
Export new attributes sensb_res for tech B and sensf_res
for tech F in the target info (returned as a response to
NFC_CMD_GET_TARGET).
The max size of the attributes nfcid1, sensb_res and sensf_res
is exported to user space though include/linux/nfc.
Signed-off-by: Ilan Elias <ilane@ti.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We already extract some basic info but it's incomplete, reads info
about the first core only. Used data structure doesn't allow easy
adding of more cores.
This patch adds new struct and array for storing power info. The plan
is to: switch all extractors (including the ones using NVRAM) to new
struct, switch drivers, then deprecate and finally drop old SSB fields.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix new kernel-doc warnings:
Warning(include/linux/device.h:299): No description found for parameter 'name'
Warning(include/linux/device.h:299): No description found for parameter 'subsys'
Warning(include/linux/device.h:299): No description found for parameter 'node'
Warning(include/linux/device.h:299): No description found for parameter 'add_dev'
Warning(include/linux/device.h:299): No description found for parameter 'remove_dev'
Warning(include/linux/device.h:685): No description found for parameter 'id'
Warning(include/linux/device.h:1009): No description found for parameter '__driver'
Warning(include/linux/device.h:1009): No description found for parameter '__register'
Warning(include/linux/device.h:1009): No description found for parameter '__unregister'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The way this driver was added by f0ae849 (usb: Add Intel Langwell USB
OTG Transceiver Driver) never even compiled together with langwell_udc,
and that's the only way for it to be useful.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: stable@vger.kernel.org # v2.6.31+
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@linux.intel.com>
Cc: linux-usb@vger.kernel.org
Signed-off-by: Felipe Balbi <balbi@ti.com>
so move its include into fs.h inside the __KERNEL__ protection.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two fixes for regressions introduced during the merge window, one fix for
a long-standing obscure issue in the computation of hibernate image size
and two small PM documentation fixes.
Namhyung Kim (1):
PM / Hibernate: Correct additional pages number calculation
Srivatsa S. Bhat (1):
PM / Hibernate: Rewrite unlock_system_sleep() to fix s2disk regression
Tetsuo Handa (1):
PM / Sleep: Fix read_unlock_usermodehelper() call.
Viresh Kumar (2):
PM / Documentation: Fix spelling mistake in basic-pm-debugging.txt
PM / Documentation: Fix minor issue in freezing_of_tasks.txt
Documentation/power/basic-pm-debugging.txt | 2 +-
Documentation/power/freezing-of-tasks.txt | 8 ++++----
drivers/base/firmware_class.c | 3 +--
include/linux/suspend.h | 19 +++++++++++++++++--
kernel/power/snapshot.c | 3 ++-
5 files changed, 25 insertions(+), 10 deletions(-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJPHdAnAAoJEKhOf7ml8uNsMOgQAJVw9lUK1grMRoWILq/oAIAy
jWQfcVmLq04T32ZTAqohocMsJHTRRgHzhP60zWmiUtpiikO7UJT/sMPt1vSIMxOg
83dT0s4VLCrHjpKW2J3UbyUq9CGlXjMchkwt5ZwCtQtGoNNBg3Q8ZCtS6ksUh7+c
1xnUANBdHXzoz8gp9zemoW05vZu7KFX5SWuWYv+5FLd7gb0GpOFZksBAgQ7KAOwT
jk2sx20yeXoCFshWTVIaDQa9ydQxGyfWu7ZoTst3fNJvpwXwozEFCSro/WsF4FFQ
fwOl8hU6Ptaogy6VuzQZ8B9XNKxgzcSDfLqvmynUAzWbwZkgW7V9EA0cRmnyBI4C
Mmus5npRq+zC09e82noPDI8Y+gOMe70u/pHPL/9aSiK8D97baQd7JCwV9iCf3bxr
h+woLfEF3maMV0vdfzFz6Zh435oO668ZGUUxqSpFIKyJ2uVcCKRrpEo47N1xKChr
msH8sSLl4yrVYg0XUhN62qOrPiMvWw0i6httrXXOBMs4jVEBNrdSUSeAjn269gsx
xaDZS9teZNnn3Vf/lJdPvPC7dvz394mOkWOq707S/D/hHdA96dYii5PET366FNjE
O0jgCrlbRLnFmcngeBdklxu2Zf1GIhSw+Lk+LdwnEloSZ4RKMjUy0vf4l5shDS02
N/e4vEbwR68dN8Hg5evl
=dgnH
-----END PGP SIGNATURE-----
Merge tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Power management fixes for 3.3
Two fixes for regressions introduced during the merge window, one fix for
a long-standing obscure issue in the computation of hibernate image size
and two small PM documentation fixes.
* tag 'pm-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Sleep: Fix read_unlock_usermodehelper() call.
PM / Hibernate: Rewrite unlock_system_sleep() to fix s2disk regression
PM / Hibernate: Correct additional pages number calculation
PM / Documentation: Fix minor issue in freezing_of_tasks.txt
PM / Documentation: Fix spelling mistake in basic-pm-debugging.txt
The usual kernel-doc fixups from Randy. Some of them David acked as
merged in his tree, this is the random left-overs.
* kernel-doc:
docbook: fix sched source file names in device-drivers book
docbook: change iomap source filename in deviceiobook
docbook: don't use serial_core.h in device-drivers book
kernel-doc: fix kernel-doc warnings in sched
kernel-doc: fix new warnings in cfg80211.h
kernel-doc: fix new warning in usb.h
kernel-doc: fix new warnings in device.h
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warning in regulator core
kernel-doc: fix new warnings in pci
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in auditsc.c
scripts/kernel-doc: fix fatal error caused by cfg80211.h
Fix new kernel-doc notation warnings:
Warning(include/linux/sched.h:2094): No description found for parameter 'p'
Warning(include/linux/sched.h:2094): Excess function parameter 'tsk' description in 'is_idle_task'
Warning(kernel/sched/cpupri.c:139): No description found for parameter 'newpri'
Warning(kernel/sched/cpupri.c:139): Excess function parameter 'pri' description in 'cpupri_set'
Warning(kernel/sched/cpupri.c:208): Excess function parameter 'bootmem' description in 'cpupri_init'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix new kernel-doc warning:
Warning(include/linux/usb.h:1251): No description found for parameter 'num_mapped_sgs'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix new kernel-doc warnings:
Warning(include/linux/device.h:299): No description found for parameter 'name'
Warning(include/linux/device.h:299): No description found for parameter 'subsys'
Warning(include/linux/device.h:299): No description found for parameter 'node'
Warning(include/linux/device.h:299): No description found for parameter 'add_dev'
Warning(include/linux/device.h:299): No description found for parameter 'remove_dev'
Warning(include/linux/device.h:685): No description found for parameter 'id'
Warning(include/linux/device.h:1009): No description found for parameter '__driver'
Warning(include/linux/device.h:1009): No description found for parameter '__register'
Warning(include/linux/device.h:1009): No description found for parameter '__unregister'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit cc39c6a9bb ("mm: account skipped entries to avoid looping in
find_get_pages") correctly fixed an infinite loop; but left a problem
that find_get_pages() on shmem would return 0 (appearing to callers to
mean end of tree) when it meets a run of nr_pages swap entries.
The only uses of find_get_pages() on shmem are via pagevec_lookup(),
called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
scan_mapping_unevictable_pages(). The first is already commented, and
not worth worrying about; but the second can leave pages on the
Unevictable list after an unusual sequence of swapping and locking.
Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
swap) instead of pagevec_lookup().
But I don't want to contaminate vmscan.c with shmem internals, nor
shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into
shmem.c, renaming it shmem_unlock_mapping(); and rename
check_move_unevictable_page() to check_move_unevictable_pages(), looping
down an array of pages, oftentimes under the same lock.
Leave out the "rotate unevictable list" block: that's a leftover from
when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
handling involved looking at pages at tail of LRU.
Was there significance to the sequence first ClearPageUnevictable, then
test page_evictable, then SetPageUnevictable here? I think not, we're
under LRU lock, and have no barriers between those.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: <stable@vger.kernel.org> [back to 3.1 but will need respins]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kdump only allocates memory for the prstatus ELF note. For s390x,
besides of prstatus multiple ELF notes for various different register
types are stored. Therefore the currently allocated memory is not
sufficient. With this patch the KEXEC_NOTE_BYTES macro can be defined
by architecture code and for s390x it is set to the correct size now.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparc64 allmodconfig:
In file included from include/linux/compat.h:15,
from /usr/src/25/arch/sparc/include/asm/siginfo.h:19,
from include/linux/signal.h:5,
from include/linux/sched.h:73,
from arch/sparc/kernel/asm-offsets.c:13:
include/linux/fs.h:618: warning: parameter has incomplete type
It seems that my sparc64 compiler (gcc-3.4.5) doesn't like the forward
declaration of enums.
Fix this by moving the "enum migrate_mode" definition into its own header
file.
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It doesn't seem right for the thermal subsystem to export a symbol
named generate_netlink_event. This function is thermal-specific and
its name should reflect that fact. Rename it to
thermal_generate_netlink_event.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: R.Durgadoss <durgadoss.r@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Now that all users of 'struct sysdev' are removed from the kernel, we
can safely remove the .h and .c files for this code, to ensure that no
one accidentally starts to use it again.
Many thanks for Kay who did all the hard work here on making this
happen.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:
sk->sk_wmem_queued + size >= sk->sk_sndbuf
The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.
I see two ways of fixing this:
1) storing the information about those allocations somewhere
in memcg, and then deducting from that first, before
we start draining the res_counter,
2) providing a slightly different allocation function for
the res_counter, that matches the original behavior of
the network code more closely.
I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the memcg sock code, we'll need to register allocations
that are temporarily over limit. Let's make sure that margin
is 0 in this case.
I am keeping this as a separate patch, so that if any weirdness
interaction appears in the future, we can now exactly what caused
it.
Suggested by Johannes Weiner
Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Michal Hocko <mhocko@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correctly implement a loss detection heuristic: New sequences (above
high_seq) sent during the fast recovery are deemed lost when higher
sequences are SACKed.
Current code does not catch these losses, because tcp_mark_head_lost()
does not check packets beyond high_seq. The fix is straight-forward by
checking packets until the highest sacked packet. In addition, all the
FLAG_DATA_LOST logic are in-effective and redundant and can be removed.
Update the loss heuristic comments. The algorithm above is documented
as heuristic B, but it is redundant too because heuristic A already
covers B.
Note that this change only marks some forward-retransmitted packets LOST.
It does NOT forbid TCP performing further CWR on new losses. A potential
follow-up patch under preparation is to perform another CWR on "new"
losses such as
1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
2) retransmission is lost.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In native mode display all available staticstics.
In SRIOV mode on VF display only SW counters statistics,
in SRIOV mode on hypervisor display SW counters and errors (got from FW)
statistics.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
mcp_priv() does unexpected things when passed a void pointer. Make it
a typed inline function, which ensures that it works correctly in
these cases.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This reverts commit 5dd7bf59e0.
Conflicts:
scripts/mod/file2alias.c
This change is wrong on many levels. First and foremost, it causes a
regression. On boot on Assabet, which this patch gives a codec id of
'ucb1x00', it gives:
ucb1x00 ID not found: 1005
0x1005 is a valid ID for the UCB1300 device.
Secondly, this patch is way over the top in terms of complexity. The
only device which has been seen to be connected with this MCP code is
the UCB1x00 (UCB1200, UCB1300 etc) devices, and they all use the same
driver. Adding a match table, requiring the codec string to match the
hardware ID read out of the ID register, etc is completely over the top
when we can just read the hardware ID register.
Commit 33e638b, "PM / Sleep: Use the freezer_count() functions in
[un]lock_system_sleep() APIs" introduced an undesirable change in the
behaviour of unlock_system_sleep() since freezer_count() internally calls
try_to_freeze() - which we don't need in unlock_system_sleep().
And commit bcda53f, "PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with
[un]lock_system_sleep()" made these APIs wide-spread. This caused a
regression in suspend-to-disk where snapshot_read() and snapshot_write()
were getting frozen due to the try_to_freeze embedded in
unlock_system_sleep(), since these functions were invoked when the freezing
condition was still in effect.
Fix this by rewriting unlock_system_sleep() by open-coding freezer_count()
and dropping the try_to_freeze() part. Not only will this fix the
regression but this will also ensure that the API only does what it is
intended to do, and nothing more, under the hood.
While at it, make the code more correct and robust by ensuring that the
PF_FREEZER_SKIP flag gets cleared with pm_mutex held, to avoid a race with
the freezer.
Also, to be on the safer side, open-code freezer_do_not_count() as well
(inside lock_system_sleep()), to ensure that any unrelated modification to
freezer[_do_not]_count() does not break things again!
Reported-and-tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Permit key_serial() to be called with a const key pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This includes initial support for the recently published ACPI 5.0 spec.
In particular, support for the "hardware-reduced" bit that eliminates
the dependency on legacy hardware.
APEI has patches resulting from testing on real hardware.
Plus other random fixes.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits)
acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
intel_idle: Split up and provide per CPU initialization func
ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2
ACPI processor: Remove unneeded cpuidle_unregister_driver call
intel idle: Make idle driver more robust
intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
intel_idle: remove redundant local_irq_disable() call
ACPI processor: Fix error path, also remove sysdev link
ACPI: processor: fix acpi_get_cpuid for UP processor
intel_idle: fix API misuse
ACPI APEI: Convert atomicio routines
ACPI: Export interfaces for ioremapping/iounmapping ACPI registers
ACPI: Fix possible alignment issues with GAS 'address' references
ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
ACPI: Store SRAT table revision
ACPI, APEI, Resolve false conflict between ACPI NVS and APEI
ACPI, Record ACPI NVS regions
ACPI, APEI, EINJ, Refine the fix of resource conflict
...
* git://git.infradead.org/users/willy/linux-nvme: (105 commits)
NVMe: Set number of queues correctly
NVMe: Version 0.8
NVMe: Set queue flags correctly
NVMe: Simplify nvme_unmap_user_pages
NVMe: Mark the end of the sg list
NVMe: Fix DMA mapping for admin commands
NVMe: Rename IO_TIMEOUT to NVME_IO_TIMEOUT
NVMe: Merge the nvme_bio and nvme_prp data structures
NVMe: Change nvme_completion_fn to take a dev
NVMe: Change get_nvmeq to take a dev instead of a namespace
NVMe: Simplify completion handling
NVMe: Update Identify Controller data structure
NVMe: Implement doorbell stride capability
NVMe: Version 0.7
NVMe: Don't probe namespace 0
Fix calculation of number of pages in a PRP List
NVMe: Create nvme_identify and nvme_get_features functions
NVMe: Fix memory leak in nvme_dev_add()
NVMe: Fix calls to dma_unmap_sg
NVMe: Correct sg list setup in nvme_map_user_pages
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
tg3: Fix single-vector MSI-X code
openvswitch: Fix multipart datapath dumps.
ipv6: fix per device IP snmp counters
inetpeer: initialize ->redirect_genid in inet_getpeer()
net: fix NULL-deref in WARN() in skb_gso_segment()
net: WARN if skb_checksum_help() is called on skb requiring segmentation
caif: Remove bad WARN_ON in caif_dev
caif: Fix typo in Vendor/Product-ID for CAIF modems
bnx2x: Disable AN KR work-around for BCM57810
bnx2x: Remove AutoGrEEEn for BCM84833
bnx2x: Remove 100Mb force speed for BCM84833
bnx2x: Fix PFC setting on BCM57840
bnx2x: Fix Super-Isolate mode for BCM84833
net: fix some sparse errors
net: kill duplicate included header
net: sh-eth: Fix build error by the value which is not defined
net: Use device model to get driver name in skb_gso_segment()
bridge: BH already disabled in br_fdb_cleanup()
net: move sock_update_memcg outside of CONFIG_INET
mwl8k: Fixing Sparse ENDIAN CHECK warning
...
Function split up, should have no functional change.
Provides entry point for physically hotplugged CPUs
to initialize and activate cpuidle.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
CC: Shaohua Li <shaohua.li@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits)
audit: no leading space in audit_log_d_path prefix
audit: treat s_id as an untrusted string
audit: fix signedness bug in audit_log_execve_info()
audit: comparison on interprocess fields
audit: implement all object interfield comparisons
audit: allow interfield comparison between gid and ogid
audit: complex interfield comparison helper
audit: allow interfield comparison in audit rules
Kernel: Audit Support For The ARM Platform
audit: do not call audit_getname on error
audit: only allow tasks to set their loginuid if it is -1
audit: remove task argument to audit_set_loginuid
audit: allow audit matching on inode gid
audit: allow matching on obj_uid
audit: remove audit_finish_fork as it can't be called
audit: reject entry,always rules
audit: inline audit_free to simplify the look of generic code
audit: drop audit_set_macxattr as it doesn't do anything
audit: inline checks for not needing to collect aux records
audit: drop some potentially inadvisable likely notations
...
Use evil merge to fix up grammar mistakes in Kconfig file.
Bad speling and horrible grammar (and copious swearing) is to be
expected, but let's keep it to commit messages and comments, rather than
expose it to users in config help texts or printouts.
It was reported that DIGSIG is confusing name for digital signature
module. It was suggested to rename DIGSIG to SIGNATURE.
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: James Morris <jmorris@namei.org>
Define rcu_assign_keypointer(), which uses the key payload.rcudata instead
of payload.data, to resolve the CONFIG_SPARSE_RCU_POINTER message:
"incompatible types in comparison expression (different address spaces)"
Replace the rcu_assign_pointer() calls in encrypted/trusted keys with
rcu_assign_keypointer().
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
This set of build failures just started appearing on parisc:
In file included from drivers/input/serio/serio_raw.c:12:
include/linux/kref.h: In function 'kref_get':
include/linux/kref.h:40: error: 'TAINT_WARN' undeclared (first use in this function)
include/linux/kref.h:40: error: (Each undeclared identifier is reported only once
include/linux/kref.h:40: error: for each function it appears in.)
include/linux/kref.h: In function 'kref_sub':
include/linux/kref.h:65: error: 'TAINT_WARN' undeclared (first use in this function)
It happens because TAINT_WARN is defined in kernel.h and this particular
compile doesn't seem to include it (no idea why it's just manifesting ..
probably some #include file untangling exposed it).
Fix by adding
#include <linux/kernel.h>
to linux/kref.h
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows audit to specify rules in which we compare two fields of a
process. Such as is the running process uid != to the running process
euid?
Signed-off-by: Peter Moody <pmoody@google.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
This completes the matrix of interfield comparisons between uid/gid
information for the current task and the uid/gid information for inodes.
aka I can audit based on differences between the euid of the process and
the uid of fs objects.
Signed-off-by: Peter Moody <pmoody@google.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
We wish to be able to audit when a uid=500 task accesses a file which is
uid=0. Or vice versa. This patch introduces a new audit filter type
AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
should be compared. At this point we only define the task->uid vs
inode->uid, but other comparisons can be added.
Signed-off-by: Eric Paris <eparis@redhat.com>
The function always deals with current. Don't expose an option
pretending one can use it for something. You can't.
Signed-off-by: Eric Paris <eparis@redhat.com>
Much like the ability to filter audit on the uid of an inode collected, we
should be able to filter on the gid of the inode.
Signed-off-by: Eric Paris <eparis@redhat.com>
Allow syscall exit filter matching based on the uid of the owner of an
inode used in a syscall. aka:
auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa
Signed-off-by: Eric Paris <eparis@redhat.com>
Audit entry,always rules are not allowed and are automatically changed in
exit,always rules in userspace. The kernel refuses to load such rules.
Thus a task in the middle of a syscall (and thus in audit_finish_fork())
can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
going to actually use the code in audit_finish_fork() since it will
return without doing anything. Thus drop the code.
Signed-off-by: Eric Paris <eparis@redhat.com>
A number of audit hooks make function calls before they determine that
auxilary records do not need to be collected. Do those checks as static
inlines since the most common case is going to be that records are not
needed and we can skip the function call overhead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Every arch calls:
if (unlikely(current->audit_context))
audit_syscall_entry()
which requires knowledge about audit (the existance of audit_context) in
the arch code. Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.
Signed-off-by: Eric Paris <eparis@redhat.com>
The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.
We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.
The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.
In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].
For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]
The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.
Signed-off-by: Eric Paris <eparis@redhat.com>
This field is unused since 2.6.28 (commit fe6e29fdb1a7: "tty: simplify
ktermios allocation", to be exact)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now the low-level driver actually gets informed that it is getting suspended and resumed.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Export remapping and unmapping interfaces - acpi_os_map_generic_address()
and acpi_os_unmap_generic_address() - for ACPI generic registers that are
backed by memory mapped I/O (MMIO).
The acpi_os_map_generic_address() and acpi_os_unmap_generic_address()
declarations may more properly belong in include/acpi/acpiosxf.h next to
acpi_os_read_memory() but I believe that would require the ACPI CA making
them an official part of the ACPI CA - OS interface.
ACPI Generic Address Structure (GAS) reference (ACPI's fixed/generic
hardware registers use the GAS format):
ACPI Specification, Revision 4.0, Section 5.2.3.1, "Generic Address
Structure"
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Some firmware will access memory in ACPI NVS region via APEI. That
is, instructions in APEI ERST/EINJ table will read/write ACPI NVS
region. The original resource conflict checking in APEI code will
check memory/ioport accessed by APEI via general resource management
mechanism. But ACPI NVS region is marked as busy already, so that the
false resource conflict will prevent APEI ERST/EINJ to work.
To fix this, this patch record ACPI NVS regions, so that we can avoid
request resources for memory region inside it.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch fixes a build warning in -next due to a const pointer being
passed to is_idle_task(). Because is_idle_task() does not modify anything,
this commit adds the "const" to is_idle_task()'s argument declaration.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Merge reason: Add these commits so that fixes on this branch do not
conflict with already-mainlined code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This patch partially reverts:
3d058d7 netfilter: rework user-space expectation helper support
that was applied during the 3.2 development cycle.
After this patch, the tree remains just like before patch bc01bef,
that initially added the preliminary infrastructure.
I decided to partially revert this patch because the approach
that I proposed to resolve this problem is broken in NAT setups.
Moreover, a new infrastructure will be submitted for the 3.3.x
development cycle that resolve the existing issues while
providing a neat solution.
Since nobody has been seriously using this infrastructure in
user-space, the removal of this feature should affect any know
FOSS project (to my knowledge).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (655 commits)
[media] revert patch: HDIC HD29L2 DMB-TH USB2.0 reference design driver
mb86a20s: Add a few more register settings at the init seq
mb86a20s: Group registers into the same line
[media] [PATCH] don't reset the delivery system on DTV_CLEAR
[media] [BUG] it913x-fe fix typo error making SNR levels unstable
[media] cx23885: Query the CX25840 during enum_input for status
[media] cx25840: Add support for g_input_status
[media] rc-videomate-m1f.c Rename to match remote controler name
[media] drivers: media: au0828: Fix dependency for VIDEO_AU0828
[media] convert drivers/media/* to use module_platform_driver()
[media] drivers: video: cx231xx: Fix dependency for VIDEO_CX231XX_DVB
[media] Exynos4 JPEG codec v4l2 driver
[media] doc: v4l: selection: choose pixels as units for selection rectangles
[media] v4l: s5p-tv: mixer: fix setup of VP scaling
[media] v4l: s5p-tv: mixer: add support for selection API
[media] v4l: emulate old crop API using extended crop/compose API
[media] doc: v4l: add documentation for selection API
[media] doc: v4l: add binary images for selection API
[media] v4l: add support for selection api
[media] hd29l2: fix review findings
...
* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
Revert "block: recursive merge requests"
block: Stop using macro stubs for the bio data integrity calls
blockdev: convert some macros to static inlines
fs: remove unneeded plug in mpage_readpages()
block: Add BLKROTATIONAL ioctl
block: Introduce blk_set_stacking_limits function
block: remove WARN_ON_ONCE() in exit_io_context()
block: an exiting task should be allowed to create io_context
block: ioc_cgroup_changed() needs to be exported
block: recursive merge requests
block, cfq: fix empty queue crash caused by request merge
block, cfq: move icq creation and rq->elv.icq association to block core
block, cfq: restructure io_cq creation path for io_context interface cleanup
block, cfq: move io_cq exit/release to blk-ioc.c
block, cfq: move icq cache management to block core
block, cfq: move io_cq lookup to blk-ioc.c
block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
block, cfq: reorganize cfq_io_context into generic and cfq specific parts
block: remove elevator_queue->ops
block: reorder elevator switch sequence
...
Fix up conflicts in:
- block/blk-cgroup.c
Switch from can_attach_task to can_attach
- block/cfq-iosched.c
conflict with now removed cic index changes (we now use q->id instead)
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
perf tools: Fix compile error on x86_64 Ubuntu
perf report: Fix --stdio output alignment when --showcpuutilization used
perf annotate: Get rid of field_sep check
perf annotate: Fix usage string
perf kmem: Fix a memory leak
perf kmem: Add missing closedir() calls
perf top: Add error message for EMFILE
perf test: Change type of '-v' option to INCR
perf script: Add missing closedir() calls
tracing: Fix compile error when static ftrace is enabled
recordmcount: Fix handling of elf64 big-endian objects.
perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again
perf tools: Add support for guest/host-only profiling
perf kvm: Do guest-only counting by default
perf top: Don't update total_period on process_sample
perf hists: Stop using 'self' for struct hist_entry
perf hists: Rename total_session to total_period
x86: Add counter when debug stack is used with interrupts enabled
x86: Allow NMIs to hit breakpoints in i386
x86: Keep current stack in NMI breakpoints
...
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
capabilities: remove __cap_full_set definition
security: remove the security_netlink_recv hook as it is equivalent to capable()
ptrace: do not audit capability check when outputing /proc/pid/stat
capabilities: remove task_ns_* functions
capabitlies: ns_capable can use the cap helpers rather than lsm call
capabilities: style only - move capable below ns_capable
capabilites: introduce new has_ns_capabilities_noaudit
capabilities: call has_ns_capability from has_capability
capabilities: remove all _real_ interfaces
capabilities: introduce security_capable_noaudit
capabilities: reverse arguments to security_capable
capabilities: remove the task from capable LSM hook entirely
selinux: sparse fix: fix several warnings in the security server cod
selinux: sparse fix: fix warnings in netlink code
selinux: sparse fix: eliminate warnings for selinuxfs
selinux: sparse fix: declare selinux_disable() in security.h
selinux: sparse fix: move selinux_complete_init
selinux: sparse fix: make selinux_secmark_refcount static
SELinux: Fix RCU deref check warning in sel_netport_insert()
Manually fix up a semantic mis-merge wrt security_netlink_recv():
- the interface was removed in commit fd77846152 ("security: remove
the security_netlink_recv hook as it is equivalent to capable()")
- a new user of it appeared in commit a38f7907b9 ("crypto: Add
userspace configuration API")
causing no automatic merge conflict, but Eric Paris pointed out the
issue.
Main features:
- Handle percpu memory allocations (only scanning them, not actually
reporting).
- Memory hotplug support.
Usability improvements:
- Show the origin of early allocations.
- Report previously found leaks even if kmemleak has been disabled by
some error.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQIcBAABAgAGBQJPDCI0AAoJEGvWsS0AyF7x+MUQALEQTnREqBgpqa+95Wk8WaEB
F/00mbwRpLVKl1jsfCn4wxFPUGuXS/oaxYztDSTP8BrEzZ5E0Kq+Ejsby9yPLs5r
9nwsoRrxBerUjFHqXjx2xrTkAZQomLesNw5ZkaKFVgBzNo7O63Co4TGuP5J8s03G
7hyewcZvbmzkX1SpqMvPItdUTpK+vwABBHGvYta6NS89Bt9GuexC/NS3o2qy2q6c
2BXhUXSJyYsalxvsYYw+hNOyVWrFJ/TWJKsksg9ANxzcbkLKUat9IpvcR3CTRUpu
L/72GXGCDyMw3YgXs8MBlOk3KXRcobISYCVMsDuVz6tITP7RHCB6rG/Hg55YWxeS
1N2P0kMFkDGVui4pzPZZENUH1QfuwoZ5RpgJ2OCaVnfguLOgGM9k665KT9OScWeC
tpxoS82jGd5RezrgF30yvpLz2CivvjRiEpIXL8o47pg/kESgY1PFnDwTW8imoikt
dTQFZXYeFzjcHkN1YNUXgjNfh+CqCkUXLQ5k+8vQ+9TFWh21thwuzg5AGcK28xTc
6mGzSsJzx2w7IKTCjZ3BGN+IXt/KpC4iKyIEFeNsgy9Z8gU0I0GaMVixQtZFxeEt
asqNBaQGngJ86BeO1bjRB/YKO+F+ZIchJiGN4PNgtc4BGz45LGfKOfRjlku4rmsZ
8OJRqGx5qZykxYhNSHXq
=5lb1
-----END PGP SIGNATURE-----
Merge tag 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux
Kmemleak patches
Main features:
- Handle percpu memory allocations (only scanning them, not actually
reporting).
- Memory hotplug support.
Usability improvements:
- Show the origin of early allocations.
- Report previously found leaks even if kmemleak has been disabled by
some error.
* tag 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux:
kmemleak: Add support for memory hotplug
kmemleak: Handle percpu memory allocation
kmemleak: Report previously found leaks even after an error
kmemleak: When the early log buffer is exceeded, report the actual number
kmemleak: Show where early_log issues come from
* 'for-next' of git://git.infradead.org/users/dhowells/linux-headers:
UAPI: Split trivial #if defined(__KERNEL__) && X conditionals
UAPI: Don't have a #elif clause in a __KERNEL__ guard in linux/soundcard.h
UAPI: Fix AHZ multiple inclusion when __KERNEL__ is removed
UAPI: Make linux/patchkey.h easier to parse
UAPI: Fix nested __KERNEL__ guards in video/edid.h
UAPI: Alter the S390 asm include guards to be recognisable by the UAPI splitter
UAPI: Guard linux/cuda.h
UAPI: Guard linux/pmu.h
UAPI: Guard linux/isdn_divertif.h
UAPI: Guard linux/sound.h
UAPI: Rearrange definition of HZ in asm-generic/param.h
UAPI: Make FRV use asm-generic/param.h
UAPI: Make M32R use asm-generic/param.h
UAPI: Make MN10300 use asm-generic/param.h
UAPI: elf_read_implies_exec() is a kernel-only feature - so hide from userspace
UAPI: Don't include linux/compat.h in sparc's asm/siginfo.h
UAPI: Fix arch/mips/include/asm/Kbuild to have separate header-y lines
* 'fbdev-next' of git://github.com/schandinat/linux-2.6: (175 commits)
module_param: make bool parameters really bool (drivers/video/i810)
Revert "atmel_lcdfb: Adjust HFP calculation so it matches the manual."
OMAPDSS: HDMI: Disable DDC internal pull up
OMAPDSS: HDMI: Move duplicate code from boardfile
OMAPDSS: add OrtusTech COM43H4M10XTC display support
OMAP: DSS2: Support for UMSH-8173MD TFT panel
ASoC: OMAP: HDMI: Move HDMI codec trigger function to generic HDMI driver
OMAPDSS: HDMI: Create function to enable HDMI audio
ASoC: OMAP: HDMI: Correct signature of ASoC functions
ASoC: OMAP: HDMI: Introduce driver data for audio codec
grvga: fix section mismatch warnings
video: s3c-fb: Don't keep device runtime active when open
video: s3c-fb: Hold runtime PM references when touching registers
video: s3c-fb: Take a runtime PM reference when unblanked
video: s3c-fb: Disable runtime PM in error paths from probe
video: s3c-fb: Use s3c_fb_enable() to enable the framebuffer
video: s3c-fb: Make runtime PM functional again
drivers/video: fsl-diu-fb: merge fsl_diu_alloc() into map_video_memory()
drivers/video: fsl-diu-fb: add default platform ops functions
drivers/video: fsl-diu-fb: remove broken reference count enabling the display
...
Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device. This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.
This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice. Still, I'm treating it specially to avoid spamming the logs.
In principle, this restriction should include programs running with
CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities. However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls. Their actions will still be logged.
This patch does not affect the non-libata IDE driver. That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a wrapper around scsi_cmd_ioctl that takes a block device.
The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPD2aFAAoJENkgDmzRrbjxNzsQAIeYbbrXYLjr6kQzUSngj/eC
FzjaTEfYTQIeuQCFJHcHthyc5lXV4sQbo3jOezW+Bp5yuDJL2aWIHesSfWZe7imu
zQdM4VshOYdAmUR9Q0AW5zhB8Smbs7/AyABiF2jm4p0ZPOuyMDSlei9sjvE9Vjvt
B7g5ht7L6kz0JbDnwwy0u5gs+tEitwpXYId9Y4ysZIBzIbL0qkPX8veOddGTMy0N
8xhWXaKtufpjvxFD2ORLDsw3AkoF1xXSNuFd/5nzCNpbeE7TW931jfkPoqJumuAO
7GLxcU9kKYl+IICobC6wBtsj/RrB7w+cBXMvPGwdBliam1qaRhUcJZi5FLM/Ha5d
2A9QDYNUpoXiO8JbPXrV9Z+Y0+Co8RilsQj7R/rjZh6AbbYCWt9nxzx2Svl/RfTr
xfiimHuB2P3rHjOvpCXULwOOuE5c8MzPuWncpdjiD3uGXOY/aY+X1m+if/quJw9D
grPlKL0+YiRakEYUeGG4M77KCqyKFZaF7L7UQPbqfZcj8V/9AW3/7U5I/B9RlAjs
idsr4fcf5s0N+oKUyTCW1ncpUDQNiwbU2NyJQqeu1ZxaRGj72AgyvsaNeyIPDyK+
f6x95Bi7i8KLjXc9Z1KvJwh2Nxt25gNUiTYVha/9H2NpJGd1cfI15kTOGXrgddVv
1pvuGcJDZwYiwfiXr3FL
=HHrh
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/rustyrussell/linux
Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1
* tag 'for-linus' of git://github.com/rustyrussell/linux:
module_param: check that bool parameters really are bool.
intelfbdrv.c: bailearly is an int module_param
paride/pcd: fix bool verbose module parameter.
module_param: make bool parameters really bool (drivers & misc)
module_param: make bool parameters really bool (arch)
module_param: make bool parameters really bool (core code)
kernel/async: remove redundant declaration.
printk: fix unnecessary module_param_name.
lirc_parallel: fix module parameter description.
module_param: avoid bool abuse, add bint for special cases.
module_param: check type correctness for module_param_array
modpost: use linker section to generate table.
modpost: use a table rather than a giant if/else statement.
modules: sysfs - export: taint, coresize, initsize
kernel/params: replace DEBUGP with pr_debug
module: replace DEBUGP with pr_debug
module: struct module_ref should contains long fields
module: Fix performance regression on modules with large symbol tables
module: Add comments describing how the "strmap" logic works
Fix up conflicts in scripts/mod/file2alias.c due to the new linker-
generated table approach to adding __mod_*_device_table entries. The
ARM sa11x0 mcp bus needed to be converted to that too.
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: nfsd4_create_clid_dir return value is unused
NFSD: Change name of extended attribute containing junction
svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
svcrpc: fix double-free on shutdown of nfsd after changing pool mode
nfsd4: be forgiving in the absence of the recovery directory
nfsd4: fix spurious 4.1 post-reboot failures
NFSD: forget_delegations should use list_for_each_entry_safe
NFSD: Only reinitilize the recall_lru list under the recall lock
nfsd4: initialize special stateid's at compile time
NFSd: use network-namespace-aware cache registering routines
SUNRPC: create svc_xprt in proper network namespace
svcrpc: update outdated BKL comment
nfsd41: allow non-reclaim open-by-fh's in 4.1
svcrpc: avoid memory-corruption on pool shutdown
svcrpc: destroy server sockets all at once
svcrpc: make svc_delete_xprt static
nfsd: Fix oops when parsing a 0 length export
nfsd4: Use kmemdup rather than duplicating its implementation
nfsd4: add a separate (lockowner, inode) lookup
nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
...
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (59 commits)
rtc: max8925: Add function to work as wakeup source
mfd: Add pm ops to max8925
mfd: Convert aat2870 to dev_pm_ops
mfd: Still check other interrupts if we get a wm831x touchscreen IRQ
mfd: Introduce missing kfree in 88pm860x probe routine
mfd: Add S5M series configuration
mfd: Add s5m series irq driver
mfd: Add S5M core driver
mfd: Improve mc13xxx dt binding document
mfd: Fix stmpe section mismatch
mfd: Fix stmpe build warning
mfd: Fix STMPE I2c build failure
mfd: Constify aat2870-core i2c_device_id table
gpio: Add support for stmpe variant 801
mfd: Add support for stmpe variant 801
mfd: Add support for stmpe variant 610
mfd: Add support for STMPE SPI interface
mfd: Separate out STMPE controller and interface specific code
misc: Remove max8997-muic sysfs attributes
mfd: Remove unused wm831x_irq_data_to_mask_reg()
...
Fix up trivial conflict in drivers/leds/Kconfig due to addition of
LEDS_MAX8997 and LEDS_TCA6507 next to each other.
Core:
* Support for the HS200 high-speed eMMC mode.
* Support SDIO 3.0 Ultra High Speed cards.
* Kill pending block requests immediately if card is removed.
* Enable the eMMC feature for locking boot partitions read-only
until next power on, exposed via sysfs.
Drivers:
* Runtime PM support for Intel Medfield SDIO.
* Suspend/resume support for sdhci-spear.
* sh-mmcif now processes requests asynchronously.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPD0P3AAoJEHNBYZ7TNxYMLBQP/1nwYE0Y1prrKsiv6PWQlcDu
LZl6kylgkq1Nd7hEX1E+kXjv+ec5Z+7RxqmyMerRJIMsfBtowLL0r9rzAu+BsNjZ
t7YSODLCzCp80WGMDSxhgQEOiAXMlpTzD46hXC2hOjNyTXCsGz1smUfryLPCP6v/
QbWZi43cDN+Ok5AEw8sdNIfdJYreIAPnziO6Mttvi9iFgozpxdmSJXco+kjJKd2U
lYEG/kcfug2eJRxmsQP9v4W1Y274tFrV9VLP4mgBabFMvKlXN3uU9tqf2S9JDPv8
Y95OzKsyk6pwOTKKL8LLS/FVd3pUoHWkf5CwlH7QPR/NgiWV8DcC7bGCyBKdTAwZ
14yolNVAKdLday7+DoKCk4Eac69exmBNnED9Ky37B+STiob/5qd3iLqspFBITUQU
F9bUSyw9HkdWfD4zeUD4qj9iyc6Xgzk5HbxOyJb8Wgg3VlhVESkIBeOQ/ZLT35EC
ihYStdu/No/p9r5XLClHAUuo55vT5ypGWFHAeJeUtt/BMKB3ZrHcqoihge2PAjkz
pnudMf5TXD4ZUYanIT8SFfmuGisoax0Qg99+xyuYZ1c8foJ7LsMSDqEicwrxi07e
0TT8Z37lrR+FRkusRA+nQ5Ba9/iOdVGq2DdiDUGlaYIaIWYVWNXEgjOPDzS5kfoA
PVuJRYhR4fIeFb0aqa/p
=LJnN
-----END PGP SIGNATURE-----
Merge tag 'mmc-merge-for-3.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
MMC highlights for 3.3:
Core:
* Support for the HS200 high-speed eMMC mode.
* Support SDIO 3.0 Ultra High Speed cards.
* Kill pending block requests immediately if card is removed.
* Enable the eMMC feature for locking boot partitions read-only
until next power on, exposed via sysfs.
Drivers:
* Runtime PM support for Intel Medfield SDIO.
* Suspend/resume support for sdhci-spear.
* sh-mmcif now processes requests asynchronously.
* tag 'mmc-merge-for-3.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (58 commits)
mmc: fix a deadlock between system suspend and MMC block IO
mmc: sdhci: restore the enabled dma when do reset all
mmc: dw_mmc: miscaculated the fifo-depth with wrong bit operation
mmc: host: Adds support for eMMC 4.5 HS200 mode
mmc: core: HS200 mode support for eMMC 4.5
mmc: dw_mmc: fixed wrong bit operation for SDMMC_GET_FCNT()
mmc: core: Separate the timeout value for cache-ctrl
mmc: sdhci-spear: Fix compilation error
mmc: sdhci: Deal with failure case in sdhci_suspend_host
mmc: dw_mmc: Clear the DDR mode for non-DDR
mmc: sd: Fix SDR12 timing regression
mmc: sdhci: Fix tuning timer incorrect setting when suspending host
mmc: core: Add option to prevent eMMC sleep command
mmc: omap_hsmmc: use threaded irq handler for card-detect.
mmc: sdhci-pci: enable runtime PM for Medfield SDIO
mmc: sdhci: Always pass clock request value zero to set_clock host op
mmc: sdhci-pci: remove SDHCI_QUIRK2_OWN_CARD_DETECTION
mmc: sdhci-pci: get gpio numbers from platform data
mmc: sdhci-pci: add platform data
mmc: sdhci: prevent card detection activity for non-removable cards
...
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Fix nlink setting on inode creation
GFS2: fail mount if journal recovery fails
GFS2: let spectator mount do read only recovery
GFS2: Fix a use-after-free that coverity spotted
GFS2: dlm based recovery coordination
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: ensure prealloc_blob is in place when removing xattr
rbd: initialize snap_rwsem in rbd_add()
ceph: enable/disable dentry complete flags via mount option
vfs: export symbol d_find_any_alias()
ceph: always initialize the dentry in open_root_dentry()
libceph: remove useless return value for osd_client __send_request()
ceph: avoid iput() while holding spinlock in ceph_dir_fsync
ceph: avoid useless dget/dput in encode_fh
ceph: dereference pointer after checking for NULL
crush: fix force for non-root TAKE
ceph: remove unnecessary d_fsdata conditional checks
ceph: Use kmemdup rather than duplicating its implementation
Fix up conflicts in fs/ceph/super.c (d_alloc_root() failure handling vs
always initialize the dentry in open_root_dentry)
By adding some module aliases, programs (or users) won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.
Also:
- use C99 style initialization.
- add missing entry in documentation for loop-control
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace preprocessor macro stubs with real function declarations to
prevent warnings when CONFIG_BLK_DEV_INTEGRITY is disabled.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Andrew explains:
- various misc stuff
- Most of the rest of MM: memcg, threaded hugepages, others.
- cpumask
- kexec
- kdump
- some direct-io performance tweaking
- radix-tree optimisations
- new selftests code
A note on this: often people will develop a new userspace-visible
feature and will develop userspace code to exercise/test that
feature. Then they merge the patch and the selftest code dies.
Sometimes we paste it into the changelog. Sometimes the code gets
thrown into Documentation/(!).
This saddens me. So this patch creates a bare-bones framework which
will henceforth allow me to ask people to include their test apps in
the kernel tree so we can keep them alive. Then when people enhance
or fix the feature, I can ask them to update the test app too.
The infrastruture is terribly trivial at present - let's see how it
evolves.
- checkpoint/restart feature work.
A note on this: this is a project by various mad Russians to perform
c/r mainly from userspace, with various oddball helper code added
into the kernel where the need is demonstrated.
So rather than some large central lump of code, what we have is
little bits and pieces popping up in various places which either
expose something new or which permit something which is normally
kernel-private to be modified.
The overall project is an ongoing thing. I've judged that the size
and scope of the thing means that we're more likely to be successful
with it if we integrate the support into mainline piecemeal rather
than allowing it all to develop out-of-tree.
However I'm less confident than the developers that it will all
eventually work! So what I'm asking them to do is to wrap each piece
of new code inside CONFIG_CHECKPOINT_RESTORE. So if it all
eventually comes to tears and the project as a whole fails, it should
be a simple matter to go through and delete all trace of it.
This lot pretty much wraps up the -rc1 merge for me.
* akpm: (96 commits)
unlzo: fix input buffer free
ramoops: update parameters only after successful init
ramoops: fix use of rounddown_pow_of_two()
c/r: prctl: add PR_SET_MM codes to set up mm_struct entries
c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4
c/r: introduce CHECKPOINT_RESTORE symbol
selftests: new x86 breakpoints selftest
selftests: new very basic kernel selftests directory
radix_tree: take radix_tree_path off stack
radix_tree: remove radix_tree_indirect_to_ptr()
dio: optimize cache misses in the submission path
vfs: cache request_queue in struct block_device
fs/direct-io.c: calculate fs_count correctly in get_more_blocks()
drivers/parport/parport_pc.c: fix warnings
panic: don't print redundant backtraces on oops
sysctl: add the kernel.ns_last_pid control
kdump: add udev events for memory online/offline
include/linux/crash_dump.h needs elf.h
kdump: fix crash_kexec()/smp_send_stop() race in panic()
kdump: crashk_res init check for /sys/kernel/kexec_crash_size
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
pptp: Accept packet with seq zero
RDS: Remove some unused iWARP code
net: fsl: fec: handle 10Mbps speed in RMII mode
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap
drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap
ksz884x: fix mtu for VLAN
net_sched: sfq: add optional RED on top of SFQ
dp83640: Fix NOHZ local_softirq_pending 08 warning
gianfar: Fix invalid TX frames returned on error queue when time stamping
gianfar: Fix missing sock reference when processing TX time stamps
phylib: introduce mdiobus_alloc_size()
net: decrement memcg jump label when limit, not usage, is changed
net: reintroduce missing rcu_assign_pointer() calls
inet_diag: Rename inet_diag_req_compat into inet_diag_req
inet_diag: Rename inet_diag_req into inet_diag_req_v2
bond_alb: don't disable softirq under bond_alb_xmit
mac80211: fix rx->key NULL pointer dereference in promiscuous mode
nl80211: fix old station flags compatibility
mdio-octeon: use an unique MDIO bus name.
mdio-gpio: use an unique MDIO bus name.
...
When we restore a task we need to set up text, data and data heap sizes
from userspace to the values a task had at checkpoint time. This patch
adds auxilary prctl codes for that.
While most of them have a statistical nature (their values are involved
into calculation of /proc/<pid>/statm output) the start_brk and brk values
are used to compute an allowed size of program data segment expansion.
Which means an arbitrary changes of this values might be dangerous
operation. So to restrict access the following requirements applied to
prctl calls:
- The process has to have CAP_SYS_ADMIN capability granted.
- For all opcodes except start_brk/brk members an appropriate
VMA area must exist and should fit certain VMA flags,
such as:
- code segment must be executable but not writable;
- data segment must not be executable.
start_brk/brk values must not intersect with data segment and must not
exceed RLIMIT_DATA resource limit.
Still the main guard is CAP_SYS_ADMIN capability check.
Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support
otherwise these prctl calls will return -EINVAL.
[akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is not used anymore, remove it
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes it possible to get from the inode to the request_queue with one
less cache miss. Used in followon optimization.
The livetime of the pointer is the same as the gendisk.
This assumes that the queue will always stay the same in the gendisk while
it's visible to block_devices. I think that's safe correct?
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building an ARM target we get the following warnings:
CC arch/arm/kernel/setup.o
In file included from arch/arm/kernel/setup.c:39:
arch/arm/include/asm/elf.h:102:1: warning: "vmcore_elf64_check_arch" redefined
In file included from arch/arm/kernel/setup.c:24:
include/linux/crash_dump.h:30:1: warning: this is the location of the previous definition
Quoting Russell King:
"linux/crash_dump.h makes no attempt to include asm/elf.h, but it depends
on stuff in asm/elf.h to determine how stuff inside this file is defined
at parse time.
So, if asm/elf.h is included after linux/crash_dump.h or not at all, you
get a different result from the situation where asm/elf.h is included
before."
So add elf.h header to crash_dump.h to avoid this problem.
The original discussion about this can be found at:
http://www.spinics.net/lists/arm-kernel/msg154113.html
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: <stable@vger.kernel.org> [3.2.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
del_page_from_lru() repeats del_page_from_lru_list(), also working out
which LRU the page was on, clearing the relevant bits. Decouple those
functions: remove del_page_from_lru() and add page_off_lru().
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mostly we use "enum lru_list lru": change those few "l"s to "lru"s.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What's so special about ____pagevec_lru_add() that it needs four leading
underscores? Nothing, it just helped to distinguish from
__pagevec_lru_add() in 2.6.28 development. Cut two leading underscores.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace pagevecs in putback_lru_pages() and move_active_pages_to_lru()
by lists of pages_to_free: then apply Konstantin Khlebnikov's
free_hot_cold_page_list() to them instead of pagevec_release().
Which simplifies the flow (no need to drop and retake lock whenever
pagevec fills up) and reduces stale addresses in stack backtraces
(which often showed through the pagevecs); but more importantly,
removes another 120 bytes from the deepest stacks in page reclaim.
Although I've not recently seen an actual stack overflow here with
a vanilla kernel, move_active_pages_to_lru() has often featured in
deep backtraces.
However, free_hot_cold_page_list() does not handle compound pages
(nor need it: a Transparent HugePage would have been split by the
time it reaches the call in shrink_page_list()), but it is possible
for putback_lru_pages() or move_active_pages_to_lru() to be left
holding the last reference on a THP, so must exclude the unlikely
compound case before putting on pages_to_free.
Remove pagevec_strip(), its work now done in move_active_pages_to_lru().
The pagevec in scan_mapping_unevictable_pages() remains in mm/vmscan.c,
but that is never on the reclaim path, and cannot be replaced by a list.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage. Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.
This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.
[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list. This
had to be partially reverted because some dirty pages can be migrated by
compaction without blocking.
This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise LRU
disruption.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time. Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates. Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;
if (PageDirty(page) && !sync &&
mapping->a_ops->migratepage != migrate_page)
rc = -EBUSY;
This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It is
the responsibility of the callback to fail gracefully if migration would
block.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have tlb_remove_tlb_entry to indicate a pte tlb flush entry should be
flushed, but not a corresponding API for pmd entry. This isn't a
problem so far because THP is only for x86 currently and tlb_flush()
under x86 will flush entire TLB. But this is confusion and could be
missed if thp is ported to other arch.
Also convert tlb->need_flush = 1 to a VM_BUG_ON(!tlb->need_flush) in
__tlb_remove_page() as suggested by Andrea Arcangeli. The
__tlb_remove_page() function is supposed to be called after
tlb_remove_xxx_tlb_entry() and we can catch any misuse.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, at LRU handling, memory cgroup needs to do complicated works to see
valid pc->mem_cgroup, which may be overwritten.
This patch is for relaxing the protocol. This patch guarantees
- when pc->mem_cgroup is overwritten, page must not be on LRU.
By this, LRU routine can believe pc->mem_cgroup and don't need to check
bits on pc->flags. This new rule may adds small overheads to swapin. But
in most case, lru handling gets faster.
After this patch, PCG_ACCT_LRU bit is obsolete and removed.
[akpm@linux-foundation.org: remove unneeded VM_BUG_ON(), restore hannes's christmas tree]
[akpm@linux-foundation.org: clean up code comment]
[hughd@google.com: fix NULL mem_cgroup_try_charge]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
and reducing atomic ops/complexity in memcg LRU handling.
In some cases, pages are added to lru before charge to memcg and pages
are not classfied to memory cgroup at lru addtion. Now, the lru where
the page should be added is determined a bit in page_cgroup->flags and
pc->mem_cgroup. I'd like to remove the check of flag.
To handle the case pc->mem_cgroup may contain stale pointers if pages
are added to LRU before classification. This patch resets
pc->mem_cgroup to root_mem_cgroup before lru additions.
[akpm@linux-foundation.org: fix CONFIG_CGROUP_MEM_CONT=n build]
[hughd@google.com: fix CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n build]
[akpm@linux-foundation.org: ksm.c needs memcontrol.h, per Michal]
[hughd@google.com: stop oops in mem_cgroup_reset_owner()]
[hughd@google.com: fix page migration to reset_owner]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are multiple places which need to get the swap_cgroup address, so
add a helper function:
static struct swap_cgroup *swap_cgroup_getsc(swp_entry_t ent,
struct swap_cgroup_ctrl **ctrl);
to simplify the code.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In split_huge_page(), mem_cgroup_split_huge_fixup() is called to handle
page_cgroup modifcations. It takes move_lock_page_cgroup() and modifies
page_cgroup and LRU accounting jobs and called HPAGE_PMD_SIZE - 1 times.
But thinking again,
- compound_lock() is held at move_accout...then, it's not necessary
to take move_lock_page_cgroup().
- LRU is locked and all tail pages will go into the same LRU as
head is now on.
- page_cgroup is contiguous in huge page range.
This patch fixes mem_cgroup_split_huge_fixup() as to be called once per
hugepage and reduce costs for spliting.
[akpm@linux-foundation.org: fix typo, per Michal]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To find the page corresponding to a certain page_cgroup, the pc->flags
encoded the node or section ID with the base array to compare the pc
pointer to.
Now that the per-memory cgroup LRU lists link page descriptors directly,
there is no longer any code that knows the struct page_cgroup of a PFN
but not the struct page.
[hughd@google.com: remove unused node/section info from pc->flags fix]
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that all code that operated on global per-zone LRU lists is
converted to operate on per-memory cgroup LRU lists instead, there is no
reason to keep the double-LRU scheme around any longer.
The pc->lru member is removed and page->lru is linked directly to the
per-memory cgroup LRU lists, which removes two pointers from a
descriptor that exists for every page frame in the system.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Ying Han <yinghan@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Having a unified structure with a LRU list set for both global zones and
per-memcg zones allows to keep that code simple which deals with LRU
lists and does not care about the container itself.
Once the per-memcg LRU lists directly link struct pages, the isolation
function and all other list manipulations are shared between the memcg
case and the global LRU case.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory cgroup limit reclaim and traditional global pressure reclaim will
soon share the same code to reclaim from a hierarchical tree of memory
cgroups.
In preparation of this, move the two right next to each other in
shrink_zone().
The mem_cgroup_hierarchical_reclaim() polymath is split into a soft
limit reclaim function, which still does hierarchy walking on its own,
and a limit (shrinking) reclaim function, which relies on generic
reclaim code to walk the hierarchy.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page(). This function replaces a page in the
radix-tree with a new page. WHen doing this, memory cgroup needs to fix
up the accounting information. memcg need to check PCG_USED bit etc.
In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache(). So, memcg's LRU accounting information should be
fixed, too.
This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
races with LRU handling.
Background:
replace_page_cache_page() is called by FUSE code in its splice() handling.
Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
because rmdir() checks the whole LRU is empty and there is no account leak.
If a page is on the other LRU than it should be, rmdir() will fail.
This bug was added in March 2011, but no bug report yet. I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.
The result of this bug is that admin cannot destroy a memcg because of
account leak. So, no panic, no deadlock. And, even if an active cgroup
exist, umount can succseed. So no problem at shutdown.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely. A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.
To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited. Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm. In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.
Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.
This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events. I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'. In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5. Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links. This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.
In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'. In this case, each 'source file descriptor' has a 1 path of
length 1. Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous. Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.
In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations. Currently its only used in a subset
of the add paths. I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths. I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead. Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.
Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1). However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order. Thus, this limit is currently easily bypassed. The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.
Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL. I've also
testing using the piptest.c epoll tester, which showed no difference in
performance. I've also created a number of different epoll networks and
tested that they behave as expectded.
I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While implementing cmpxchg_double() on s390 I realized that we don't set
CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it.
However setting that option will increase the size of struct page by
eight bytes on 64 bit, which we certainly do not want. Also, it doesn't
make sense that a present cpu feature should increase the size of struct
page.
Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and
that it should depend on CMPXCHG_DOUBLE instead.
This patch:
If an architecture supports CMPXCHG_LOCAL this shouldn't result
automatically in larger struct pages if the SLUB allocator is used.
Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which
can be selected if a double word aligned struct page is required. Also
update x86 Kconfig so that it should work as before.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The uses have been renamed so delete the unused macro.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only use in kernel.h is gone so remove the macro.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use __printf macro.
Convert NORET_AND to ATTRIB_NORET.
Use the normal kernel style for pointer arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds an optional Random Early Detection on each SFQ flow queue.
Traditional SFQ limits count of packets, while RED permits to also
control number of bytes per flow, and adds ECN capability as well.
1) We dont handle the idle time management in this RED implementation,
since each 'new flow' begins with a null qavg. We really want to address
backlogged flows.
2) if headdrop is selected, we try to ecn mark first packet instead of
currently enqueued packet. This gives faster feedback for tcp flows
compared to traditional RED [ marking the last packet in queue ]
Example of use :
tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \
limit 3000 headdrop flows 512 divisor 16384 \
redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn
qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
ewma 6 min 8000b max 60000b probability 0.2 ecn
prob_mark 0 prob_mark_head 4876 prob_drop 6131
forced_mark 0 forced_mark_head 0 forced_drop 0
Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007
requeues 0)
rate 99483Kbit 8219pps backlog 689392b 456p requeues 0
In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled
flows, we can see number of packets CE marked is smaller than number of
drops (for non ECN flows)
If same test is run, without RED, we can check backlog is much bigger.
qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop
flows 512/16384 divisor 16384
Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0)
rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Dave Taht <dave.taht@gmail.com>
Tested-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce function mdiobus_alloc_size() as an alternative to mdiobus_alloc().
Most callers of mdiobus_alloc() also allocate a private data structure, and
then manually point bus->priv to this object. mdiobus_alloc_size()
combines the two operations into one, which simplifies memory management.
The original mdiobus_alloc() now just calls mdiobus_alloc_size(0).
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
This tightens the check (you'll get a warning about incompatible
return type) but still allows it. Next kernel version, we'll remove
it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For historical reasons, we allow module_param(bool) to take an int (or
an unsigned int). That's going away.
A few drivers really want an int: they set it to -1 and a parameter
will set it to 0 or 1. This sucks: reading them from sysfs will give
'Y' for both -1 and 1, but if we change it to an int, then the users
might be broken (if they did "param" instead of "param=1").
Use a new 'bint' parser for them.
(ntfs has a different problem: it needs an int for debug_msgs because
it's also exposed via sysctl.)
Cc: Steve Glendinning <steve.glendinning@smsc.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux390@de.ibm.com
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: lm-sensors@lm-sensors.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: alsa-devel@alsa-project.org
Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part)
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>