For irq associated with hierarchy irqdomains, there will be multiple
irq_datas for one irq_desc. So enhance irq_data_to_desc() to support
hierarchy irqdomain. Also export irq_data_to_desc() as an inline
function for later reuse.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/1433145945-789-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Although the extended tables are theoretically a completely orthogonal
feature to PASID and anything else that *uses* the newly-available bits,
some of the early hardware has problems even when all we do is enable
them and use only the same bits that were in the old context tables.
For now, there's no motivation to support extended tables unless we're
going to use PASID support to do SVM. So just don't use them unless
PASID support is advertised too. Also add a command-line bailout just in
case later chips also have issues.
The equivalent problem for PASID support has already been fixed with the
upcoming VT-d spec update and commit bd00c606a ("iommu/vt-d: Change
PASID support to bit 40 of Extended Capability Register"), because the
problematic platforms use the old definition of the PASID-capable bit,
which is now marked as reserved and meaningless.
So with this change, we'll magically start using ECS again only when we
see the new hardware advertising "hey, we have PASID support and we
actually tested it this time" on bit 40.
The VT-d hardware architect has promised that we are not going to have
any reason to support ECS *without* PASID any time soon, and he'll make
sure he checks with us before changing that.
In the future, if hypothetical new features also use new bits in the
context tables and can be seen on implementations *without* PASID support,
we might need to add their feature bits to the ecs_enabled() macro.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
MD_RECOVERY_DONE is normally cleared by md_check_recovery after a
resync etc finished. However it is possible for raid5_start_reshape
to race and start a reshape before MD_RECOVERY_DONE is cleared. This
can lean to multiple reshapes running at the same time, which isn't
good.
To make sure it is cleared before starting a reshape, and also clear
it when reaping a thread, just to be safe.
Signed-off-by: NeilBrown <neilb@suse.de>
Checking ->sync_thread without holding the mddev_lock()
isn't really safe, even after flushing the workqueue which
ensures md_start_sync() has been run.
While this code is waiting for the lock, md_check_recovery could reap
the thread itself, and then start another thread (e.g. recovery might
finish, then reshape starts). When this thread gets the lock
md_start_sync() hasn't run so it doesn't get reaped, but
MD_RECOVERY_RUNNING gets cleared. This allows two threads to start
which leads to confusion.
So don't both if MD_RECOVERY_RUNNING isn't set, but if it is do
the flush and the test and the reap all under the mddev_lock to
avoid any race with md_check_recovery.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: 6791875e2e ("md: make reconfig_mutex optional for writes to md sysfs files.")
Cc: stable@vger.kernel.org (v4.0+)
Returning zero from a 'store' function is bad.
The return value should be either len length of the string
or an error.
So use 'len' if 'err' is zero.
Fixes: 6791875e2e ("md: make reconfig_mutex optional for writes to md sysfs files.")
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel (v4.0+)
Some drivers implement only pause operation (no resuming). Example is
pl330 where pause is needed for getting residuum. pl330 does not support
resume operation, transfer must be stopped after pause.
However for slaves this is exposed always as "pause and resume" which
introduces subtle errors on Odroid U3 board (Exynos4412 with pl330).
After adding pause function to pl330 driver the audio playback
(utilizing DMA) gets choppy after some time (approximately 24 hours).
Fix this by exposing "cmd_pause" if and only if pause and resume are
implemented.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reported-by: gabriel@unseen.is
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: <stable@vger.kernel.org>
Fixes: 88987d2c75 ("dmaengine: pl330: add DMA_PAUSE feature")
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Add a new interface irq_remapping_cap() to detect whether irq
remapping supports new features, such as VT-d Posted-Interrupts.
Export the function, so that KVM code can check this and use this
mechanism properly.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-10-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the interrupt is configured in posted mode, the destination of
the interrupt is set in the Posted-Interrupts Descriptor and the
migration of these interrupts happens during vCPU scheduling.
We still update the cached irte, which will be used when changing back
to remapping mode, but we avoid writing the table entry as this would
overwrite the posted mode entry.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-7-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Interrupt chip callback to set the VCPU affinity for posted interrupts.
[ tglx: Use the helper function to copy from the remap irte instead of
open coding it. Massage the comment as well ]
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: joro@8bytes.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-5-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The IRTE (Interrupt Remapping Table Entry) is either an entry for
remapped or for posted interrupts. The hardware distiguishes between
remapped and posted entries by bit 15 in the low 64 bit of the
IRTE. If cleared the entry is remapped, if set it's posted.
The entries have common fields and dependent on the posted bit fields
with different meanings.
Extend struct irte to handle the differences between remap and posted
mode by having three structs in the unions:
- Shared
- Remapped
- Posted
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: jiang.liu@linux.intel.com
Cc: iommu@lists.linux-foundation.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-3-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add a new member 'capability' to struct irq_remap_ops for storing
information about available capabilities such as VT-d
Posted-Interrupts.
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/1433827237-3382-2-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Prarit reported an issue w/ timers around the leapsecond, where a
timer set for Midnight UTC (00:00:00) might fire a second early right
before the leapsecond (23:59:60 - though it appears as a repeated
23:59:59) is applied.
So I've updated the leap-a-day.c test to integrate a similar test,
where we set a timer and check if it triggers at the right time, and
if the ntp state transition is managed properly.
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since the leapsecond is applied at tick-time, this means there is a
small window of time at the start of a leap-second where we cross into
the next second before applying the leap.
This patch modified adjtimex so that the leap-second is applied on the
second edge. Providing more correct leapsecond behavior.
This does make it so that adjtimex()'s returned time values can be
inconsistent with time values read from gettimeofday() or
clock_gettime(CLOCK_REALTIME,...) for a brief period of one tick at
the leapsecond. However, those other interfaces do not provide the
TIME_OOP time_state return that adjtimex() provides, which allows the
leapsecond to be properly represented. They instead only see a time
discontinuity, and cannot tell the first 23:59:59 from the repeated
23:59:59 leap second.
This seems like a reasonable tradeoff given clock_gettime() /
gettimeofday() cannot properly represent a leapsecond, and users
likely care more about performance, while folks who are using
adjtimex() more likely care about leap-second correctness.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.
This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.
However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.
This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)
However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.
So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.
This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.
Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.
While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.
Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently the leapsecond logic uses what looks like magic values.
Improve this by defining SECS_PER_DAY and using that macro
to make the logic more clear.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It was reported that 868a3e915f (hrtimer: Make offset
update smarter) was causing timer problems after suspend/resume.
The problem with that change is the modification to
clock_was_set_seq in timekeeping_update is done prior to
mirroring the time state to the shadow-timekeeper. Thus the
next time we do update_wall_time() the updated sequence is
overwritten by whats in the shadow copy.
This patch moves the shadow-timekeeper mirroring to the end
of the function, after all updates have been made, so all data
is kept in sync.
(This patch also affects the update_fast_timekeeper calls which
were also problematically done prior to the mirroring).
Reported-and-tested-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I noticed that my MPX tracepoints were producing garbage for the
lower and upper bounds:
mpx_bounds_register_exception: address referenced: 0x00007fffffffccb7 bounds: lower: 0x0 ~upper: 0xffffffffffffffff
mpx_bounds_register_exception: address referenced: 0x00007fffffffccbf bounds: lower: 0x0 ~upper: 0xffffffffffffffff
This is, of course, bogus because 0x00007fffffffccbf is *within*
the bounds. I assumed that my instruction decoder was bad and
went looking at it. But I eventually realized that I was
getting a '0' offset back from xstate_offsets[BNDREGS].
It was being skipped in the initialization, which is obviously
bogus, so remove the extra leaf++.
This also goes an initializes xstate_offsets/sizes[] to -1 so
so that bugs like this will oops instead of silently failing
in interesting ways.
This was introduced by:
39f1acd ("x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150611193400.2E0B00DB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The previous patch tried to continue the probe if i915 binding fails.
For for simplicity reason, we haven't implemented abort even for
controller chips that are dedicated for HDMI/DP on HSW and BDW.
However, Mengdong suggested that this can be dangerous; BIOS may
disable gfx power well although the PCI entry for HD-audio is left,
and this may result in the unexpected behavior, kernel errors, etc.
For avoiding this situation, abort the probe at i915 binding failure
only for HSW/BDW chips selectively. For other chips, it still
continues.
Fixes: bf06848bdb ('ALSA: hda - Continue probing even if i915 binding fails')
Reported-by: Mengdong Lin <mengdong.lin@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Building perf out of kernel tree is currently broken because the
MANIFEST file refers to kernel files that have been removed. With this
patch make perf-targz-src-pkg succeeds as does building perf using the
generated tarfile.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Link: http://lkml.kernel.org/r/1433526173-172332-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull drm fixes from Dave Airlie:
"i915 and radeon fixes:
i915:
fix for connector oops regression
DDC probing fix
radeon:
two radeon reverts, along with a freeze workaround and a fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: Make sure radeon_vm_bo_set_addr always unreserves the BO
Revert "drm/radeon: adjust pll when audio is not enabled"
Revert "drm/radeon: don't share plls if monitors differ in audio support"
drm/radeon: fix freeze for laptop with Turks/Thames GPU.
drm/i915: Fix DDC probe for passive adapters
drm/i915: Properly initialize SDVO analog connectors
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f76858
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix for the regression Linus called out, and another for probing
dongles.
* tag 'drm-intel-fixes-2015-06-11' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix DDC probe for passive adapters
drm/i915: Properly initialize SDVO analog connectors
Two regression reverts, and two fixes, one for a dpm boot freeze.
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: Make sure radeon_vm_bo_set_addr always unreserves the BO
Revert "drm/radeon: adjust pll when audio is not enabled"
Revert "drm/radeon: don't share plls if monitors differ in audio support"
drm/radeon: fix freeze for laptop with Turks/Thames GPU.
If a device is renamed and the original name is subsequently reused
for a new device, the following warning is generated:
sysctl duplicate entry: /net/mpls/conf/veth0//input
CPU: 3 PID: 1379 Comm: ip Not tainted 4.1.0-rc4+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
0000000000000000 0000000000000000 ffffffff81566aaf 0000000000000000
ffffffff81236279 ffff88002f7d7f00 0000000000000000 ffff88000db336d8
ffff88000db33698 0000000000000005 ffff88002e046000 ffff8800168c9280
Call Trace:
[<ffffffff81566aaf>] ? dump_stack+0x40/0x50
[<ffffffff81236279>] ? __register_sysctl_table+0x289/0x5a0
[<ffffffffa051a24f>] ? mpls_dev_notify+0x1ff/0x300 [mpls_router]
[<ffffffff8108db7f>] ? notifier_call_chain+0x4f/0x70
[<ffffffff81470e72>] ? register_netdevice+0x2b2/0x480
[<ffffffffa0524748>] ? veth_newlink+0x178/0x2d3 [veth]
[<ffffffff8147f84c>] ? rtnl_newlink+0x73c/0x8e0
[<ffffffff8147f27a>] ? rtnl_newlink+0x16a/0x8e0
[<ffffffff81459ff2>] ? __kmalloc_reserve.isra.30+0x32/0x90
[<ffffffff8147ccfd>] ? rtnetlink_rcv_msg+0x8d/0x250
[<ffffffff8145b027>] ? __alloc_skb+0x47/0x1f0
[<ffffffff8149badb>] ? __netlink_lookup+0xab/0xe0
[<ffffffff8147cc70>] ? rtnetlink_rcv+0x30/0x30
[<ffffffff8149e7a0>] ? netlink_rcv_skb+0xb0/0xd0
[<ffffffff8147cc64>] ? rtnetlink_rcv+0x24/0x30
[<ffffffff8149df17>] ? netlink_unicast+0x107/0x1a0
[<ffffffff8149e4be>] ? netlink_sendmsg+0x50e/0x630
[<ffffffff8145209c>] ? sock_sendmsg+0x3c/0x50
[<ffffffff81452beb>] ? ___sys_sendmsg+0x27b/0x290
[<ffffffff811bd258>] ? mem_cgroup_try_charge+0x88/0x110
[<ffffffff811bd5b6>] ? mem_cgroup_commit_charge+0x56/0xa0
[<ffffffff811d7700>] ? do_filp_open+0x30/0xa0
[<ffffffff8145336e>] ? __sys_sendmsg+0x3e/0x80
[<ffffffff8156c3f2>] ? system_call_fastpath+0x16/0x75
Fix this by unregistering the previous sysctl table (registered for
the path containing the original device name) and re-registering the
table for the path containing the new device name.
Fixes: 37bde79979 ("mpls: Per-device enabling of packet input")
Reported-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When programming the start of a periodic output, the code wrongly places
the seconds value into the "low" register and the nanoseconds into the
"high" register. Even though this is backwards, it slipped through my
testing, because the re-arming code in the interrupt service routine is
correct, and the signal does appear starting with the second edge.
This patch fixes the issue by programming the registers correctly.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all architectures have io memory.
Fixes:
drivers/block/pmem.c: In function ‘pmem_alloc’:
drivers/block/pmem.c:146:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration]
pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
^
drivers/block/pmem.c:146:18: warning: assignment makes pointer from integer without a cast [enabled by default]
pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size);
^
drivers/block/pmem.c:182:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
iounmap(pmem->virt_addr);
^
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
ring buffer benchmark, where the produce_fifo was being ignored
and the producer thread's priority was being set with the consumer_fifo
parameter.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVeZEmAAoJEEjnJuOKh9ldrUYIAK9enlP7qdri5w3Urb9pNH81
gXqGINkEZWqbzwawb/b9avEXtcUB+pGGLE+ThB+s1DaEw4piLqaGyFRxlGXzU0F/
sFO/RxF+cPVtbEh8wAMHJD85g0j9kWB4Iy08rOezQiW9/YoATuk4QbrTlz6T++jD
6s4aqNUEQlxoCfWlkNmUbVIqRXrUuQGGc7bso1XY2/AAlSo1PjCDda/e5nDiCZ2d
pYr3CXiW+1xATZr1oS2aVgFcjIYqm5P3ijah1QlcvXEgD1ZYzsMsxxY7LQWCirZJ
GRFzXjZrCbTx6UnWc7CfcmtZVQpJhiKQ1Grum8/8uhjti7LwVCq99eFe5OsAe80=
=AC0N
-----END PGP SIGNATURE-----
Merge tag 'trace-rb-bm-fix-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ring buffer benchmark buglet fix from Steven Rostedt:
"Wang Long fixed a minor bug in the module parameter for the ring
buffer benchmark, where the produce_fifo was being ignored and the
producer thread's priority was being set with the consumer_fifo
parameter"
* tag 'trace-rb-bm-fix-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer-benchmark: Fix the wrong sched_priority of producer
=================================
[ INFO: inconsistent lock state ]
4.1.0-rc7+ #217 Tainted: G O
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(ext_devt_lock){+.?...}, at: [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810bf6b1>] __lock_acquire+0x461/0x1e70
[<ffffffff810c1947>] lock_acquire+0xb7/0x290
[<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
[<ffffffff8143a07d>] blk_alloc_devt+0x6d/0xd0 <-- take the lock in process context
[..]
[<ffffffff810bf64e>] __lock_acquire+0x3fe/0x1e70
[<ffffffff810c00ad>] ? __lock_acquire+0xe5d/0x1e70
[<ffffffff810c1947>] lock_acquire+0xb7/0x290
[<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
[<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
[<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
[<ffffffff8143a60c>] blk_free_devt+0x3c/0x70 <-- take the lock in softirq
[<ffffffff8143bfec>] part_release+0x1c/0x50
[<ffffffff8158edf6>] device_release+0x36/0xb0
[<ffffffff8145ac2b>] kobject_cleanup+0x7b/0x1a0
[<ffffffff8145aad0>] kobject_put+0x30/0x70
[<ffffffff8158f147>] put_device+0x17/0x20
[<ffffffff8143c29c>] delete_partition_rcu_cb+0x16c/0x180
[<ffffffff8143c130>] ? read_dev_sector+0xa0/0xa0
[<ffffffff810e0e0f>] rcu_process_callbacks+0x2ff/0xa90
[<ffffffff810e0dcf>] ? rcu_process_callbacks+0x2bf/0xa90
[<ffffffff81067e2e>] __do_softirq+0xde/0x600
Neil sees this in his tests and it also triggers on pmem driver unbind
for the libnvdimm tests. This fix is on top of an initial fix by Keith
for incorrect usage of mutex_lock() in this path: 2da78092dd "block:
Fix dev_t minor allocation lifetime". Both this and 2da78092dd are
candidates for -stable.
Fixes: 2da78092dd ("block: Fix dev_t minor allocation lifetime")
Cc: <stable@vger.kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Some error paths didn't unreserve the BO. This resulted in a deadlock
down the road on the next attempt to reserve the (still reserved) BO.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90873
Cc: stable@vger.kernel.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Laptop with Turks/Thames GPU will freeze if dpm is enabled. It seems
the SMC engine is relying on some state inside the CP engine. CP needs
to chew at least one packet for it to get in good state for dynamic
power management.
This patch simply disabled and re-enable DPM after the ring test which
is enough to avoid the freeze.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Failed in 32bit arch build like this:
CC /opt/h00206996/output/perf/arm32/builtin-record.o
util/session.c: In function ‘perf_session__warn_about_errors’:
util/session.c:1304:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’,
but argument 2 has type ‘long long unsigned int’ [-Werror=format=]
builtin-report.c: In function ‘perf_evlist__tty_browse_hists’:
builtin-report.c:323:2: error: format ‘%lu’ expects argument of type ‘long unsigned int’,
but argument 3 has type ‘u64’ [-Werror=format=]
Replace %lu format strings in warning message with PRIu64 for u64
'total_lost_samples' to fix this problem.
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1434026664-71642-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf stat ignores the unsupported event and continue to count supported
event. But if the unsupported event is group leader, perf tool will
crash. After applying this patch, the unsupported group leader will
error out immediately.
Without this patch:
$ perf stat -x, -e '{node-prefetch-refs,cycles}' -- sleep 1
perf: util/evsel.c:1009: get_group_fd: Assertion `!(fd == -1)' failed.
Aborted (core dumped)
With this patch:
$ perf stat -x, -e '{node-prefetch-refs,cycles}' -- sleep 1
Error:
The node-prefetch-refs event is not supported.
Commiter note: Here I got a different output, but no core dump:
[acme@zoo linux]$ perf stat -x, -e '{node-prefetch-refs,cycles}' -- sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (node-prefetch-refs).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/1434004360-8570-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit ab760a0 (ntb: Adding split BAR support for Haswell platforms)
changed ntb_device's mw from a fixed-size array into a pointer that is
allocated based on limits.max_mw; however, on Atom platforms, max_mw
is not initialized until ntb_device_setup(), which happens after the
allocation.
Fill out max_mw in ntb_atom_detect() to match ntb_xeon_detect(); this
happens before the use of max_mw in the ndev->mw allocation.
Fixes a null pointer dereference on Atom platforms with ntb hardware.
v2: fix typo (mw_max should be max_mw)
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Yet another regression by the transition to regmap cache; for better
usability, we had the fake mute control using the zero amp value for
Conexant codecs, and this was forgotten in the new hda core code.
Since the bits 4-7 are unused for the amp registers (as we follow the
syntax of AMP_GET verb), the bit 4 is now used to indicate the fake
mute. For setting this flag, snd_hda_codec_amp_update() becomes a
function from a simple macro. The bonus is that it gained a proper
function description.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
We do not check the return value of enic_dev_stats_dump(). If allocation
fails, we will hit NULL pointer reference.
Return only if memory allocation fails. For other failures, we return the
previously recorded values.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the crash kernel is loaded above 4GiB in memory, the
first kernel allocates only 72MiB of low-memory for the DMA
requirements of the second kernel. On systems with many
devices this is not enough and causes device driver
initialization errors and failed crash dumps. Testing by
SUSE and Redhat has shown that 256MiB is a good default
value for now and the discussion has lead to this value as
well. So set this default value to 256MiB to make sure there
is enough memory available for DMA.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
[ Reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1433500202-25531-4-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>