Commit Graph

252983 Commits

Author SHA1 Message Date
Will Deacon eca5dc2a00 ARM: 6988/1: multi-cpu: remove arguments from CPU proc macros
The macros for invoking functions via the processor struct in the
MULTI_CPU case define the arguments as part of the macros, making it
impossible to take the address of those functions.

This patch removes the arguments from the macro definitions so that we
can take the address of these functions like we can for the !MULTI_CPU
case.

Reported-by: Frank Hofmann <frank.hofmann@tomtom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Linus Walleij f022e4e41b ARM: 6986/1: mach-realview: add TCM support for PB1176
Enable TCM support on the RealView PB1176 - we have now taken
the precautions necessary to support even multi-board builds of
RealView systems with TCM enabled.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Linus Walleij 201043f227 ARM: 6985/1: export functions to determine the presence of I/DTCM
By allowing code to detect whether DTCM or ITCM is present, code paths
involving TCM can be avoided when running on platforms that lack it.
This is good for creating single kernels across several archs, if some
of them utilize TCM but others don't.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Linus Walleij 9715efb8dc ARM: 6984/1: enhance TCM robustness
The PB11MPCore reports "3" DTCM banks, but anything above 2 is an
"undefined" value, so push this to become 0. Further add some checks
if code is compiled to TCM even if there is no D/ITCM present in the
system, and if we can really fit the compiled code. We don't do the
BUG() since it's not helpful, it's better to deal with non-present
TCM dynamically. If there is nothing compiled to the TCM and no TCM
is detected, it will now just shut up even if TCM support is enabled.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Mark Rutland e4b6381009 ARM: 6977/1: pmu: add platform_device_id table support
This patch adds support for platform_device_id tables, allowing new
PMU types to be registered with the correct type, without requiring
new platform_driver shims to provide the type. An single entry for
existing devices is provided.

Macros matching functionality of the of_device_id table macros are
provided for convenience.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-06-29 10:27:09 +01:00
Mark Rutland e73c34c3d5 ARM: 6976/1: pmu: add OF probing support
This is based on an earlier patch from Rob Herring <rob.herring@calxeda.com>

> Add OF match table to enable OF style driver binding. The dts entry is like
> this:
>
> pmu {
> 	compatible = "arm,cortex-a9-pmu";
> 	interrupts = <100 101>;
> };
>
> The use of pdev->id as an index breaks with OF device binding, so set the type
> based on the OF compatible string.

This modification sets the PMU hardware type based on data embedded in the
binding, allowing easy addition of new PMU types in future.

Support for new PMU types not provided by devicetree can be added later using
platform_device_id tables in a similar fashion.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-06-29 10:27:08 +01:00
Mark Rutland ae0c3751ab ARM: 6975/1: pmu: reject duplicate PMU registrations
Currently, the PMU reservation framework allows for multiple PMUs of
the same type to register themselves. This can lead to a bug with the
sequence:

register_pmu(pmu1);
reserve_pmu(pmu_type);
register_pmu(pmu2);
release_pmu(pmu1);

Here, pmu1 cannot be released, and pmu2 cannot be reserved.

This patch modifies register_pmu to reject registrations where a PMU is
already present, preventing this problem. PMUs which can have multiple
instances should not use the PMU reservation framework.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-06-29 10:27:08 +01:00
Mark Rutland f12482c939 ARM: 6974/1: pmu: refactor reservation
Currently, PMU platform_device reservation relies on some minor abuse
of the platform_device::id field for determining the type of PMU. This
is problematic for device tree based probing, where the ID cannot be
controlled.

This patch removes reliance on the id field, and depends on each PMU's
platform driver to figure out which type it is. As all PMUs handled by
the current platform_driver name "arm-pmu" are CPU PMUs, this
convention is hardcoded. New PMU types can be supported through the use
of {of,platform}_device_id tables

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-06-29 10:27:08 +01:00
Simon Horman 090ab3ff8e ARM: 6886/1: mmc, Add zboot from eSD support for SuperH Mobile ARM
This allows a ROM-able zImage to be written to eSD and for SuperH Mobile
ARM to boot directly from the SDHI hardware block.

This is achieved by the MaskROM loading the first portion of the image into
MERAM and then jumping to it.  This portion contains loader code which
copies the entire image to SDRAM and jumps to it. From there the zImage
boot code proceeds as normal, uncompressing the image into its final
location and then jumping to it.

Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Magnus Damm <magnus.damm@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-06-29 10:00:52 +01:00
Russell King 74facffeca ARM: Allow SoCs to enable scatterlist chaining
Allow SoCs to enable the scatterlist chaining support, which allows
scatterlist tables to be broken up into smaller allocations.

As support for this feature depends on the implementation details of
the users of the scatterlists, we can't enable this globally without
auditing all the users, which is a very big task.  Instead, let SoCs
progressively switch over to using this.

SoC drivers using scatterlists and SoC DMA implementations need
auditing before this option can be enabled for the SoC.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-06-02 11:16:22 +01:00
Linus Torvalds 1fa7b6a29c Revert "mm: fail GFP_DMA allocations when ZONE_DMA is not configured"
This reverts commit a197b59ae6.

As rmk says:
 "Commit a197b59ae6 (mm: fail GFP_DMA allocations when ZONE_DMA is not
  configured) is causing regressions on ARM with various drivers which
  use GFP_DMA.

  The behaviour up until now has been to silently ignore that flag when
  CONFIG_ZONE_DMA is not enabled, and to allocate from the normal zone.
  However, as a result of the above commit, such allocations now fail
  which causes drivers to fail.  These are regressions compared to the
  previous kernel version."

so just revert it.

Requested-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-02 06:11:24 +09:00
Linus Torvalds f0f52a9463 Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
  intel-iommu: Fix off-by-one in RMRR setup
  intel-iommu: Add domain check in domain_remove_one_dev_info
  intel-iommu: Remove Host Bridge devices from identity mapping
  intel-iommu: Use coherent DMA mask when requested
  intel-iommu: Dont cache iova above 32bit
  intel-iommu: Speed up processing of the identity_mapping function
  intel-iommu: Check for identity mapping candidate using system dma mask
  intel-iommu: Only unlink device domains from iommu
  intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
  intel-iommu: Flush unmaps at domain_exit
  intel-iommu: Remove obsolete comment from detect_intel_iommu
  intel-iommu: fix VT-d PMR disable for TXT on S3 resume
2011-06-02 05:48:50 +09:00
Linus Torvalds 0f48f26009 block: fix mismerge of the DISK_EVENT_MEDIA_CHANGE removal
Jens' back-merge commit 698567f3fa ("Merge commit 'v2.6.39' into
for-2.6.40/core") was incorrectly done, and re-introduced the
DISK_EVENT_MEDIA_CHANGE lines that had been removed earlier in commits

 - 9fd097b149 ("block: unexport DISK_EVENT_MEDIA_CHANGE for
   legacy/fringe drivers")

 - 7eec77a181 ("ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd
   and ide-cd")

because of conflicts with the "g->flags" updates near-by by commit
d4dc210f69 ("block: don't block events on excl write for non-optical
devices")

As a result, we re-introduced the hanging behavior due to infinite disk
media change reports.

Tssk, tssk, people! Don't do back-merges at all, and *definitely* don't
do them to hide merge conflicts from me - especially as I'm likely
better at merging them than you are, since I do so many merges.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-02 05:29:19 +09:00
Linus Torvalds 3f303103b8 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  mtd: fix physmap.h warnings
2011-06-01 21:47:39 +09:00
David Woodhouse 70e535d1e5 intel-iommu: Fix off-by-one in RMRR setup
We were mapping an extra byte (and hence usually an extra page):
iommu_prepare_identity_map() expects to be given an 'end' argument which
is the last byte to be mapped; not the first byte *not* to be mapped.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:48:21 +01:00
Mike Habeck 8519dc4401 intel-iommu: Add domain check in domain_remove_one_dev_info
The comment in domain_remove_one_dev_info() states "No need to compare
PCI domain; it has to be the same". But for the si_domain that isn't
going to be true, as it consists of all the PCI devices that are
identity mapped thus multiple PCI domains can be in si_domain.  The
code needs to validate the PCI domain too.

Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:48 +01:00
Mike Travis 825507d6d0 intel-iommu: Remove Host Bridge devices from identity mapping
When using the 1:1 (identity) PCI DMA remapping, PCI Host Bridge devices
that do not use the IOMMU causes a kernel panic.  Fix that by not
inserting those devices into the si_domain.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:46 +01:00
Mike Travis c681d0ba12 intel-iommu: Use coherent DMA mask when requested
The __intel_map_single function is not honoring the passed in DMA mask.
This results in not using the coherent DMA mask when called from
intel_alloc_coherent().

Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:45 +01:00
Chris Wright 1c9fc3d11b intel-iommu: Dont cache iova above 32bit
Mike Travis and Mike Habeck reported an issue where iova allocation
would return a range that was larger than a device's dma mask.

https://lkml.org/lkml/2011/3/29/423

The dmar initialization code will reserve all PCI MMIO regions and copy
those reservations into a domain specific iova tree.  It is possible for
one of those regions to be above the dma mask of a device.  It is typical
to allocate iovas with a 32bit mask (despite device's dma mask possibly
being larger) and cache the result until it exhausts the lower 32bit
address space.  Freeing the iova range that is >= the last iova in the
lower 32bit range when there is still an iova above the 32bit range will
corrupt the cached iova by pointing it to a region that is above 32bit.
If that region is also larger than the device's dma mask, a subsequent
allocation will return an unusable iova and cause dma failure.

Simply don't cache an iova that is above the 32bit caching boundary.

Reported-by: Mike Travis <travis@sgi.com>
Reported-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Acked-by: Mike Travis <travis@sgi.com>
Tested-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:40 +01:00
Mike Travis cb452a4040 intel-iommu: Speed up processing of the identity_mapping function
When there are a large count of PCI devices, and the pass through
option for iommu is set, much time is spent in the identity_mapping
function hunting though the iommu domains to check if a specific
device is "identity mapped".

Speed up the function by checking the cached info to see if
it's mapped to the static identity domain.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:36 +01:00
Chris Wright 8fcc5372fb intel-iommu: Check for identity mapping candidate using system dma mask
The identity mapping code appears to make the assumption that if the
devices dma_mask is greater than 32bits the device can use identity
mapping.  But that is not true: take the case where we have a 40bit
device in a 44bit architecture. The device can potentially receive a
physical address that it will truncate and cause incorrect addresses
to be used.

Instead check to see if the device's dma_mask is large enough
to address the system's dma_mask.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:34 +01:00
Alex Williamson 9b4554b21e intel-iommu: Only unlink device domains from iommu
Commit a97590e5 added unlinking domains from iommus to reciprocate the
iommu from domains unlinking that was already done.  We actually want
to only do this for device domains and never for the static
identity map domain or VM domains.  The SI domain is special and
never freed, while VM domain->id lives in their own special address
space, separate from iommu->domain_ids.

In the current code, a VM can get domain->id zero, then mark that
domain unused when unbound from pci-stub.  This leads to DMAR
write faults when the device is re-bound to the host driver.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@kernel.org
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:47:29 +01:00
Youquan Song 6dd9a7c737 intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
  - size >= 2MiB, and
  - virtual address aligned to 2MiB, and
  - physical address aligned to 2MiB, and
  - on hardware that supports superpages.

(and likewise for larger superpages).

We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.

Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.

Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.

==

The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.

I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.

The original sequence of commits leading to identical code was:

Youquan Song (3):
      intel-iommu: super page support
      intel-iommu: Fix superpage alignment calculation error
      intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()

David Woodhouse (4):
      intel-iommu: Precalculate superpage support for dmar_domain
      intel-iommu: Fix hardware_largepage_caps()
      intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
      intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages

Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:26:35 +01:00
Randy Dunlap 63da029015 mtd: fix physmap.h warnings
Fix build warnings in physmap.h:

include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list
include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list
include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 11:36:49 +01:00
Linus Torvalds 5c6cce92bc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  AppArmor: fix oops in apparmor_setprocattr
2011-06-01 16:35:37 +09:00
Mike Frysinger 603d04b201 kgdbts: only use new asm-generic/ptrace.h api when needed
The new instruction_pointer_set helper is defined for people who have
converted to asm-generic/ptrace.h, so don't use it generally unless
the arch needs it (in which case it has been converted).  This should
fix building of kgdb tests for arches not yet converted.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-01 16:35:01 +09:00
Kees Cook a5b2c5b2ad AppArmor: fix oops in apparmor_setprocattr
When invalid parameters are passed to apparmor_setprocattr a NULL deref
oops occurs when it tries to record an audit message. This is because
it is passing NULL for the profile parameter for aa_audit. But aa_audit
now requires that the profile passed is not NULL.

Fix this by passing the current profile on the task that is trying to
setprocattr.

Signed-off-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Cc: stable@kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
2011-06-01 13:07:03 +10:00
Linus Torvalds e12ca23d41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio_net: delay TX callbacks
  virtio: add api for delayed callbacks
  virtio_test: support event index
  vhost: support event index
  virtio_ring: support event idx feature
  virtio ring: inline function to check for events
  virtio: event index interface
  virtio: add full three-clause BSD text to headers.
  virtio balloon: kill tell-host-first logic
  virtio console: don't manually set or finalize VIRTIO_CONSOLE_F_MULTIPORT.
  drivers, block: virtio_blk: Replace cryptic number with the macro
  virtio_blk: allow re-reading config space at runtime
  lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY.
  lguest: fix up compilation after move
  lguest: fix timer interrupt setup
2011-06-01 06:45:08 +09:00
Linus Torvalds 850761b2b1 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] wire up sendmmsg() syscall for Itanium
2011-06-01 06:44:10 +09:00
Tony Luck 83caba8436 [IA64] wire up sendmmsg() syscall for Itanium
Add entries in unistd.h and entry.S to make this new syscall visible.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-05-31 10:09:24 -07:00
Linus Torvalds af0d6a0a3a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix mwait_play_dead() faulting on mwait-incapable cpus
  x86 idle: Fix mwait deprecation warning message

Evil merge to remove extra quote noticed by Joe Perches
2011-06-01 02:07:22 +09:00
Linus Torvalds 07ef3c3b44 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: Cure load woes
2011-06-01 02:03:22 +09:00
Linus Torvalds 643d2d7992 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Put back -pg to tsc.o and add no GCOV to vread_tsc_64.o
2011-05-31 20:32:54 +09:00
Linus Torvalds 89c122236e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  autofs4: bogus dentry_unhash() added in ->unlink()
  vfs: shrink_dcache_parent before rmdir, dir rename
2011-05-31 20:30:59 +09:00
Benjamin Herrenschmidt 339dedf709 powerpc/pmac: Don't register pmac PIC syscore ops when HW not present
The Apple custom PIC only exist in some earlier machine models,
anything with an MPIC will crash on suspend if we register those
syscore ops unconditionally.

This is a regression caused by commit f5a592f7d7 ("PM / PowerPC: Use
struct syscore_ops instead of sysdevs for PM")

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-31 20:30:20 +09:00
Peter Zijlstra d72bce0e67 rcu: Cure load woes
Commit cc3ce5176d (rcu: Start RCU kthreads in TASK_INTERRUPTIBLE
state) fudges a sleeping task' state, resulting in the scheduler seeing
a TASK_UNINTERRUPTIBLE task going to sleep, but a TASK_INTERRUPTIBLE
task waking up. The result is unbalanced load calculation.

The problem that patch tried to address is that the RCU threads could
stay in UNINTERRUPTIBLE state for quite a while and triggering the hung
task detector due to on-demand wake-ups.

Cure the problem differently by always giving the tasks at least one
wake-up once the CPU is fully up and running, this will kick them out of
the initial UNINTERRUPTIBLE state and into the regular INTERRUPTIBLE
wait state.

[ The alternative would be teaching kthread_create() to start threads as
  INTERRUPTIBLE but that needs a tad more thought. ]

Reported-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul E. McKenney <paul.mckenney@linaro.org>
Link: http://lkml.kernel.org/r/1306755291.1200.2872.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-31 10:01:48 +02:00
Avi Kivity 4f3c125c74 x86: Fix mwait_play_dead() faulting on mwait-incapable cpus
A logic error in mwait_play_dead() causes the kernel to use
mwait even on cpus which don't support it, such as KVM virtual
cpus.

Introduced by:

  349c004e3d31: x86: A fast way to check capabilities of the current cpu

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=36222
Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1306758237-9327-1-git-send-email-avi@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-30 14:37:54 +02:00
Borislav Petkov 598e887d8b x86 idle: Fix mwait deprecation warning message
Fix:

  arch/x86/kernel/process.c:645:1: warning: unknown escape sequence '\i'

due to missing escape backslash, introduced by this commit:

  5d4c47e0195b: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param

Signed-off-by: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1306748286-24701-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-30 13:02:04 +02:00
Al Viro c7427d23f7 autofs4: bogus dentry_unhash() added in ->unlink()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-30 01:50:53 -04:00
Sage Weil 3cebde2413 vfs: shrink_dcache_parent before rmdir, dir rename
The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-30 01:48:27 -04:00
Michael S. Tsirkin 7a66f78437 virtio_net: delay TX callbacks
Ask for delayed callbacks on TX ring full, to give the
other side more of a chance to make progress.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:16 +09:30
Michael S. Tsirkin 7ab358c23c virtio: add api for delayed callbacks
Add an API that tells the other side that callbacks
should be delayed until a lot of work has been done.
Implement using the new event_idx feature.

Note: it might seem advantageous to let the drivers
ask for a callback after a specific capacity has
been reached. However, as a single head can
free many entries in the descriptor table,
we don't really have a clue about capacity
until get_buf is called. The API is the simplest
to implement at the moment, we'll see what kind of
hints drivers can pass when there's more than one
user of the feature.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:16 +09:30
Michael S. Tsirkin 4423fe40b0 virtio_test: support event index
Add ability to test the new event idx feature,
enable by default.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:15 +09:30
Michael S. Tsirkin 8ea8cf89e1 vhost: support event index
Support the new event index feature. When acked,
utilize it to reduce the # of interrupts sent to the guest.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:15 +09:30
Michael S. Tsirkin a5c262c5fd virtio_ring: support event idx feature
Support for the new event idx feature:
1. When enabling interrupts, publish the current avail index
   value to the host to get interrupts on the next update.
2. Use the new avail_event feature to reduce the number
   of exits from the guest.

Simple test with the simulator:

[virtio]# time ./virtio_test
spurious wakeus: 0x7

real    0m0.169s
user    0m0.140s
sys     0m0.019s
[virtio]# time ./virtio_test --no-event-idx
spurious wakeus: 0x11

real    0m0.649s
user    0m0.295s
sys     0m0.335s

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:15 +09:30
Michael S. Tsirkin bf7035bf20 virtio ring: inline function to check for events
With the new used_event and avail_event and features, both
host and guest need similar logic to check whether events are
enabled, so it helps to put the common code in the header.

Note that Xen has similar logic for notification hold-off
in include/xen/interface/io/ring.h with req_event and req_prod
corresponding to event_idx + 1 and new_idx respectively.
+1 comes from the fact that req_event and req_prod in Xen start at 1,
while event index in virtio starts at 0.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:14 +09:30
Michael S. Tsirkin 770b31a85e virtio: event index interface
Define a new feature bit for the guest and host to utilize
an event index (like Xen) instead if a flag bit to enable/disable
interrupts and kicks.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:14 +09:30
Rusty Russell a1b383870a virtio: add full three-clause BSD text to headers.
It's unclear to me if it's important, but it's obviously causing my
technical colleages some headaches and I'd hate such imprecision to
slow virtio adoption.

I've emailed this to all non-trivial contributors for approval, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Ryan Harper <ryanh@us.ibm.com>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
Acked-by: john cooper <john.cooper@redhat.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
2011-05-30 11:14:14 +09:30
Dave Hansen bf50e69f63 virtio balloon: kill tell-host-first logic
The virtio balloon driver has a VIRTIO_BALLOON_F_MUST_TELL_HOST
feature bit.  Whenever the bit is set, the guest kernel must
always tell the host before we free pages back to the allocator.
Without this feature, we might free a page (and have another
user touch it) while the hypervisor is unprepared for it.

But, if the bit is _not_ set, we are under no obligation to
reverse the order; we're under no obligation to do _anything_.
As of now, qemu-kvm defines the bit, but doesn't set it.

This patch makes the "tell host first" logic the only case.  This
should make everybody happy, and reduce the amount of untested or
untestable code in the kernel.

This _also_ means that we don't have to preserve a pfn list
after the pages are freed, which should let us get rid of some
temporary storage (vb->pfns) eventually.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:13 +09:30
Rusty Russell 177dbd9563 virtio console: don't manually set or finalize VIRTIO_CONSOLE_F_MULTIPORT.
That's already been done by the virtio infrastructure before the probe
function is called.

Reported-by: alexey.kardashevskiy@au1.ibm.com
Acked-by: Amit Shah <amit.shah@redhat.com>
Tested-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-30 11:14:13 +09:30