Commit Graph

602961 Commits

Author SHA1 Message Date
Chanwoo Choi 0d37189e80 PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
This patch sends the DEVFREQ_POSTCHANGE notification when
devfreq->profile->targer() is failed. The PRECHANGE/POSTCHANGE
should be paired.

Fixes: 0fe3a66410 (PM / devfreq: Add new DEVFREQ_TRANSITION_NOTIFIER notifier)
Reported-by: Lin Huang <hl@rock-chips.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-23 23:15:12 +02:00
Mike Galbraith 3c67a829bd cpufreq: pcc-cpufreq: Fix doorbell.access_width
Commit 920de6ebfa (ACPICA: Hardware: Enhance
acpi_hw_validate_register() with access_width/bit_offset awareness)
apparently exposed a latent bug, doorbell.access_width is initialized
to 64, but per Lv Zheng, it should be 4, and indeed, making that
change does bring pcc-cpufreq back to life.

Fixes: 920de6ebfa (ACPICA: Hardware: Enhance acpi_hw_validate_register() with access_width/bit_offset awareness)
Suggested-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-23 23:09:51 +02:00
Taeung Song 4a35b3497c perf config: Reimplement show_config() using config_set__for_each
Recently config_set__for_each got added.  In order to let show_config()
be short and clear, rewrite this function using it.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1466691272-24117-4-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 17:23:00 -03:00
Taeung Song 8a0a9c7e91 perf config: Introduce new init() and exit()
Many sub-commands use perf_config() but everytime perf_config() is
called, perf_config() always read config files.  (i.e. user config
'~/.perfconfig' and system config '$(sysconfdir)/perfconfig')

But it is better to use the config set that already contains all config
key-value pairs to avoid this repetitive work reading the config files
in perf_config(). (the config set mean a static variable 'config_set')

In other words, if new perf_config__init() is called, only first time
'config_set' is initialized collecting all configs from the config
files.  And then we could use new perf_config() like old perf_config().
When a sub-command finished, free the config set by perf_config__exit()
at run_builtin().

If we do, 'config_set' can be reused wherever perf_config() is called
and a feature of old perf_config() is the same as new perf_config() work
without the repetitive work that read the config files.

In summary, in order to use features about configuration,
we can call the functions at perf.c and other source files as below.

    # initialize a config set
    perf_config__init()

    # configure actual variables from a config set
    perf_config()

    # eliminate allocated config set
    perf_config__exit()

    # destroy existing config set and initialize a new config set.
    perf_config__refresh()

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1466691272-24117-3-git-send-email-treeze.taeung@gmail.com
[ 'init' counterpart is 'exit', not 'finish' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 17:20:04 -03:00
Adrian Hunter e216708d98 perf script: Add callindent option
Based on patches from Andi Kleen.

When printing PT instruction traces with perf script it is rather useful
to see some indentation for the call tree. This patch adds a new
callindent field to perf script that prints spaces for the function call
stack depth.

We already have code to track the function call stack for PT, that we
can reuse with minor modifications.

The resulting output is not quite as nice as ftrace yet, but a lot
better than what was there before.

Note there are some corner cases when the thread stack gets code
confused and prints incorrect indentation. Even with that it is fairly
useful.

When displaying kernel code traces it is recommended to run as root, as
otherwise perf doesn't understand the kernel addresses properly, and may
not reset the call stack correctly on kernel boundaries.

Example output:

	sudo perf-with-kcore record eg2 -a -e intel_pt// -- sleep 1
	sudo perf-with-kcore script eg2 --ns -F callindent,time,comm,pid,sym,ip,addr,flags,cpu --itrace=cre | less
	...
         swapper     0 [000]  5830.389116586:   call        irq_exit                                                     ffffffff8104d620 smp_call_function_single_interrupt+0x30 => ffffffff8107e720 irq_exit
         swapper     0 [000]  5830.389116586:   call            idle_cpu                                                 ffffffff8107e769 irq_exit+0x49 => ffffffff810a3970 idle_cpu
         swapper     0 [000]  5830.389116586:   return          idle_cpu                                                 ffffffff810a39b7 idle_cpu+0x47 => ffffffff8107e76e irq_exit
         swapper     0 [000]  5830.389116586:   call            tick_nohz_irq_exit                                       ffffffff8107e7bd irq_exit+0x9d => ffffffff810f2fc0 tick_nohz_irq_exit
         swapper     0 [000]  5830.389116919:   call                __tick_nohz_idle_enter                               ffffffff810f2fe0 tick_nohz_irq_exit+0x20 => ffffffff810f28d0 __tick_nohz_idle_enter
         swapper     0 [000]  5830.389116919:   call                    ktime_get                                        ffffffff810f28f1 __tick_nohz_idle_enter+0x21 => ffffffff810e9ec0 ktime_get
         swapper     0 [000]  5830.389116919:   call                        read_tsc                                     ffffffff810e9ef6 ktime_get+0x36 => ffffffff81035070 read_tsc
         swapper     0 [000]  5830.389116919:   return                      read_tsc                                     ffffffff81035084 read_tsc+0x14 => ffffffff810e9efc ktime_get
         swapper     0 [000]  5830.389116919:   return                  ktime_get                                        ffffffff810e9f46 ktime_get+0x86 => ffffffff810f28f6 __tick_nohz_idle_enter
         swapper     0 [000]  5830.389116919:   call                    sched_clock_idle_sleep_event                     ffffffff810f290b __tick_nohz_idle_enter+0x3b => ffffffff810a7380 sched_clock_idle_sleep_event
         swapper     0 [000]  5830.389116919:   call                        sched_clock_cpu                              ffffffff810a738b sched_clock_idle_sleep_event+0xb => ffffffff810a72e0 sched_clock_cpu
         swapper     0 [000]  5830.389116919:   call                            sched_clock                              ffffffff810a734d sched_clock_cpu+0x6d => ffffffff81035750 sched_clock
         swapper     0 [000]  5830.389116919:   call                                native_sched_clock                   ffffffff81035754 sched_clock+0x4 => ffffffff81035640 native_sched_clock
         swapper     0 [000]  5830.389116919:   return                              native_sched_clock                   ffffffff8103568c native_sched_clock+0x4c => ffffffff81035759 sched_clock
         swapper     0 [000]  5830.389116919:   return                          sched_clock                              ffffffff8103575c sched_clock+0xc => ffffffff810a7352 sched_clock_cpu
         swapper     0 [000]  5830.389116919:   return                      sched_clock_cpu                              ffffffff810a7356 sched_clock_cpu+0x76 => ffffffff810a7390 sched_clock_idle_sleep_event
         swapper     0 [000]  5830.389116919:   return                  sched_clock_idle_sleep_event                     ffffffff810a7391 sched_clock_idle_sleep_event+0x11 => ffffffff810f2910 __tick_nohz_idle_enter
	...

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466689258-28493-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 17:04:26 -03:00
Adrian Hunter 50f736372d perf auxtrace: Add option to feed branches to the thread stack
In preparation for using the thread stack to print an indent
representing the stack depth in perf script, add an option to tell
decoders to feed branches to the thread stack. Add support for that
option to Intel PT and Intel BTS.

The advantage of using the decoder to feed the thread stack is that it
happens before branch filtering and so can be used with different itrace
options (e.g. it still works when only showing calls, even though the
thread stack needs to see calls and returns). Also it does not conflict
with using the thread stack to get callchains.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466689258-28493-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 17:02:59 -03:00
Adrian Hunter 055cd33d93 perf script: Print sample flags more nicely
The flags field is synthesized and may have a value when Instruction
Trace decoding. The flags are "bcrosyiABEx" which stand for branch,
call, return, conditional, system, asynchronous, interrupt, transaction
abort, trace begin, trace end, and in transaction, respectively.

Change the display so that known combinations of flags are printed more
nicely e.g.: "call" for "bc", "return" for "br", "jcc" for "bo", "jmp"
for "b", "int" for "bci", "iret" for "bri", "syscall" for "bcs",
"sysret" for "brs", "async" for "by", "hw int" for "bcyi", "tx abrt" for
"bA", "tr strt" for "bB", "tr end" for "bE".

However the "x" flag will be displayed separately in those cases e.g.
"jcc (x)" for a condition branch within a transaction.

Example:

    perf record -e intel_pt//u ls
    perf script --ns -F comm,cpu,pid,tid,time,ip,addr,sym,dso,symoff,flags
    ...
    ls  3689/3689  [001]  2062.020965237:   jcc          7f06a958847a _dl_sysdep_start+0xfa (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a9588450 _dl_sysdep_start+0xd0 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020965237:   jmp          7f06a9588461 _dl_sysdep_start+0xe1 (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a95885a0 _dl_sysdep_start+0x220 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020965237:   jmp          7f06a95885a4 _dl_sysdep_start+0x224 (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a9588470 _dl_sysdep_start+0xf0 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020965904:   call         7f06a95884c3 _dl_sysdep_start+0x143 (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a9589140 brk+0x0 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020965904:   syscall      7f06a958914a brk+0xa (/lib/x86_64-linux-gnu/ld-2.19.so) =>                0 [unknown] ([unknown])
    ls  3689/3689  [001]  2062.020966237:   tr strt                 0 [unknown] ([unknown]) =>     7f06a958914c brk+0xc (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020966237:   return       7f06a9589165 brk+0x25 (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a95884c8 _dl_sysdep_start+0x148 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020966237:   jcc          7f06a95884d7 _dl_sysdep_start+0x157 (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a95885f0 _dl_sysdep_start+0x270 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020966237:   call         7f06a95885f0 _dl_sysdep_start+0x270 (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a958ac50 strlen+0x0 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ls  3689/3689  [001]  2062.020966237:   jcc          7f06a958ac6e strlen+0x1e (/lib/x86_64-linux-gnu/ld-2.19.so) =>     7f06a958ac60 strlen+0x10 (/lib/x86_64-linux-gnu/ld-2.19.so)
    ...

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1466689258-28493-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 16:36:59 -03:00
Linus Torvalds da01e18a37 x86: avoid avoid passing around 'thread_info' in stack dumping code
None of the code actually wants a thread_info, it all wants a
task_struct, and it's just converting to a thread_info pointer much too
early.

No semantic change.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23 12:20:01 -07:00
Linus Torvalds 6720a305df locking: avoid passing around 'thread_info' in mutex debugging code
None of the code actually wants a thread_info, it all wants a
task_struct, and it's just converting back and forth between the two
("ti->task" to get the task_struct from the thread_info, and
"task_thread_info(task)" to go the other way).

No semantic change.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23 12:11:17 -07:00
Chandan Rajendra b7f67055d2 Btrfs: Force stripesize to the value of sectorsize
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.

This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:42 -07:00
Wang Xiaoguang c0d2f6104e btrfs: fix disk_i_size update bug when fallocate() fails
When doing truncate operation, btrfs_setsize() will first call
truncate_setsize() to set new inode->i_size, but if later
btrfs_truncate() fails, btrfs_setsize() will call
"i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the
inmemory inode size, now bug occurs. It's because for truncate
case btrfs_ordered_update_i_size() directly uses inode->i_size
to update BTRFS_I(inode)->disk_i_size, indeed we should use the
"offset" argument to update disk_i_size. Here is the call graph:
==>btrfs_truncate()
====>btrfs_truncate_inode_items()
======>btrfs_ordered_update_i_size(inode, last_size, NULL);
Here btrfs_ordered_update_i_size()'s offset argument is last_size.

And below test case can reveal this bug:

dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs  -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint

echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
	i=$((i + 1))
	dst_offset=$((blocksize * i))
	xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
		testfile > /dev/null
done
sync

truncate --size 0 testfile
ls -l testfile
du -sh testfile
exit

In this case, truncate operation will fail for enospc reason and
"du -sh testfile" returns value greater than 0, but testfile's
size is 0, we need to reflect correct inode->i_size.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:41 -07:00
Liu Bo 415b35a55b Btrfs: fix error handling in map_private_extent_buffer
map_private_extent_buffer() can return -EINVAL in two different cases,
1. when the requested contents span two pages if nodesize is larger
   than pagesize,
2. when it detects something insane.

The 2nd one used to be only a WARN_ON(1), and we decided to return a error
to callers, but we didn't fix up all its callers, which will be
addressed by this patch.

Without this, btrfs may end up with 'general protection', ie.
reading invalid memory.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:40 -07:00
Wei Yongjun 04e1b65af2 Btrfs: fix error return code in btrfs_init_test_fs()
Fix to return a negative error code from the kern_mount() error handling
case instead of 0(ret is set to 0 by register_filesystem), as done
elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23 10:44:39 -07:00
Doug Ledford 9903fd1374 Merge branches '4.7-rc-misc', 'hfi1-fixes', 'i40iw-rc-fixes' and 'mellanox-rc-fixes' into k.o/for-4.7-rc 2016-06-23 12:22:33 -04:00
Bart Van Assche c0cf4512a3 IB/srpt: Reduce QP buffer size
The memory needed for the send and receive queues associated with
a QP is proportional to the max_sge parameter. The current value
of that parameter is such that with an mlx4 HCA the QP buffer size
is 8 MB. Since DMA is used for communication between HCA and CPU
that buffer either has to be allocated coherently or map_single()
must succeed for that buffer. Since large contiguous allocations
are fragile and since the maximum segment size for e.g. swiotlb
is 256 KB, reduce the max_sge parameter. This patch avoids that
the following text appears on the console after SRP logout and
relogin on a system equipped with multiple IB HCAs:

mlx4_core 0000:05:00.0: swiotlb buffer is full (sz: 8388608 bytes)
swiotlb: coherent allocation failed for device 0000:05:00.0 size=8388608
CPU: 11 PID: 148 Comm: kworker/11:1 Not tainted 4.7.0-rc4-dbg+ #1
Call Trace:
 [<ffffffff812c6d35>] dump_stack+0x67/0x92
 [<ffffffff812efe71>] swiotlb_alloc_coherent+0x141/0x150
 [<ffffffff810458be>] x86_swiotlb_alloc_coherent+0x3e/0x50
 [<ffffffffa03861fa>] mlx4_buf_direct_alloc.isra.5+0x9a/0x120 [mlx4_core]
 [<ffffffffa0386545>] mlx4_buf_alloc+0x165/0x1a0 [mlx4_core]
 [<ffffffffa035053d>] create_qp_common.isra.29+0x57d/0xff0 [mlx4_ib]
 [<ffffffffa03510da>] mlx4_ib_create_qp+0x12a/0x3f0 [mlx4_ib]
 [<ffffffffa031154a>] ib_create_qp+0x3a/0x250 [ib_core]
 [<ffffffffa055dd4b>] srpt_cm_handler+0x4bb/0xcad [ib_srpt]
 [<ffffffffa02c1ab0>] cm_process_work+0x20/0xf0 [ib_cm]
 [<ffffffffa02c3640>] cm_work_handler+0x1ac0/0x2059 [ib_cm]
 [<ffffffff810737ed>] process_one_work+0x19d/0x490
 [<ffffffff81073b29>] worker_thread+0x49/0x490
 [<ffffffff8107a0ea>] kthread+0xea/0x100
 [<ffffffff815b25af>] ret_from_fork+0x1f/0x40

Fixes: b99f8e4d7b ("IB/srpt: convert to the generic RDMA READ/WRITE API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 12:04:09 -04:00
Arnaldo Carvalho de Melo 10daf4d01b perf intlist: Rename for_each() macros to for_each_entry()
To match the semantics for list.h in the kernel, that are the
interface we use in them.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-mdp1heu9xjjc12zebh91232l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 11:39:19 -03:00
Shiraz Saleem 7748e4990d i40iw: Enable level-1 PBL for fast memory registration
Set the chunk_size to enable level-1 PBL support when the fast memory
page count is more than one.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:35:34 -04:00
Faisal Latif 0477e18145 i40iw: Return correct max_fast_reg_page_list_len
Return correct value for max_fast_reg_page_list_len from
i40iw_query_device().

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:35:34 -04:00
Faisal Latif ee23abd75c i40iw: Correct status check on i40iw_get_pble
i40iw_get_pble returns 0 on success. Correct the check on return
code.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:35:34 -04:00
Shiraz Saleem 747f1c6d9b i40iw: Correct CQ arming
CQ is armed for solicited events only, ignoring other notification
flags. Correct this by arming for next and arming for solicited
event if IB_CQ_SOLICITED is set. Also protect CQ shadow area update
with spinlock.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:35:34 -04:00
Arnaldo Carvalho de Melo 98a91837dd perf rb_resort: Rename for_each() macros to for_each_entry()
To match the semantics for list.h in the kernel, that are the
interface we use in them.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-iaxuq2yu43mtb504j96q0axs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 11:35:07 -03:00
Arnaldo Carvalho de Melo 602a1f4daa perf tools: Rename strlist_for_each() macros to for_each_entry()
To match the semantics for list.h in the kernel, that are the
interface we use in them.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-0b5i2ki9c3di6706fxpticsb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 11:35:01 -03:00
Arnaldo Carvalho de Melo e5cadb93d0 perf evlist: Rename for_each() macros to for_each_entry()
To match the semantics for list.h in the kernel, that are used to
implement those macros.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qbcjlgj0ffxquxscahbpddi3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 11:26:15 -03:00
Mike Marciniszyn c755f4afa6 IB/rdmavt: Correct qp_priv_alloc() return value test
The current drivers return errors from this calldown
wrapped in an ERR_PTR().

The rdmavt code incorrectly tests for NULL.

The code is fixed to use IS_ERR() and change ret according
to the driver return value.

Cc: Stable <stable@vger.kernel.org> # 4.6+
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:16:15 -04:00
Ashutosh Dixit 8ae84f7c56 IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
Since rvt_reset_qp already zero's out qp->s_ack_queue head and tail
pointers, there is no need to zero out qp->s_ack_queue itself.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:16:15 -04:00
Mike Marciniszyn 2aee309d3e IB/hfi1: Fix deadlock with txreq allocation slow path
A failure in the get_txreq() inline will result in a
slow path retry using __get_txreq().

__get_txreq() attempts to procure the qp s_lock, which
is already held in all callers.

Fix by deleting the s_lock maintenance in __get_txreq()
and add sparse syntax hooks to future proof the code.

Cc: Stable <stable@vger.kernel.org> # 4.6+
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:16:15 -04:00
Chuck Lever cbc9355a93 IB/mlx4: Prevent cross page boundary allocation
Prevent cross page boundary allocation by allocating
new page, this is required to be aligned with ConnectX-3 HW
requirements.

Not doing that might cause to "RDMA read local protection" error.

Fixes: 1b2cd0fc67 ('IB/mlx4: Support the new memory registration API')
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:08:25 -04:00
Dotan Barak 5b420d9cf7 IB/mlx4: Fix memory leak if QP creation failed
When RC, UC, or RAW QPs are created, a qp object is allocated (kzalloc).
If at a later point (in procedure create_qp_common) the qp creation fails,
this qp object must be freed.

Fixes: 1ffeb2eb8b ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:08:25 -04:00
Yishai Hadas 5533c18ab0 IB/mlx4: Verify port number in flow steering create flow
In procedure mlx4_ib_create_flow, passing an invalid port number
will cause an out-of-bounds array access. Data passed to this procedure
can come from user-space.  Therefore, need to validate port number
before proceeding onwards.

Note that we check against the number of physical ports declared at
the verbs (ib core) level; When bonding is active, the verbs level
sees one physical port, even though the low-level driver sees two ports.

Fixes: f77c0162a3 ("IB/mlx4: Add receive flow steering support")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:07:04 -04:00
Yishai Hadas a6100603a4 IB/mlx4: Fix error flow when sending mads under SRIOV
Fix mad send error flow to prevent double freeing address handles,
and leaking tx_ring entries when SRIOV is active.

If ib_mad_post_send fails, the address handle pointer in the tx_ring entry
must be set to NULL (or there will be a double-free) and tx_tail must be
incremented (or there will be a leak of tx_ring entries).
The tx_ring is handled the same way in the send-completion handler.

Fixes: 37bfc7c1e8 ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:07:03 -04:00
Yishai Hadas f2940e2c76 IB/mlx4: Fix the SQ size of an RC QP
When calculating the required size of an RC QP send queue, leave
enough space for masked atomic operations, which require more space than
"regular" atomic operation.

Fixes: 6fa8f71984 ("IB/mlx4: Add support for masked atomic operations")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:06:54 -04:00
Talat Batheesh 00bf534fce IB/mlx5: Fix wrong naming of port_rcv_data counter
port_xmit_data is written instead of port_rcv_data.

Fixes: 3efd9a1121 ('IB/mlx5: Modify MAD reading counters method to use counter registers')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:03:57 -04:00
Eli Cohen c9b254955b IB/mlx5: Fix post send fence logic
If the caller specified IB_SEND_FENCE in the send flags of the work
request and no previous work request stated that the successive one
should be fenced, the work request would be executed without a fence.
This could result in RDMA read or atomic operations failure due to a MR
being invalidated. Fix this by adding the mlx5 enumeration for fencing
RDMA/atomic operations and fix the logic to apply this.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:03:57 -04:00
Maor Gottlieb b57141c1ab IB/uverbs: Initialize ib_qp_init_attr with zeros
Initialize ib_qp_init_attr with zeros in order to avoid from garbage
in fields that won't be set with user values.

Fixes: a060b5629a ('IB/core: generic RDMA READ/WRITE API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:03:57 -04:00
Eli Cohen b3556005c5 IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID
When virtualziation is supported, VFs may send SA MADs to a GID formed
by the concatenation of the subnet prefix with the
IB_SA_WELL_KNOWN_GUID. When a response is required, the current code
will search the local HCA's port for the received GID to figure out the
GID index of the entry containing this GID. However, since this is not a
real GID it will not be found and error will be printed.

We change the logic to check if the destination GID is this special GID
and avoid lookup in this case and use GID index 0.

Fixes: a0c1b2a350 ('IB/core: Support accessing SA in virtualized environment')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:03:57 -04:00
Alex Vesker c65f6c5a36 IB/core: Fix RoCE v1 multicast join logic issue
During multicast join of RoCEv1, IGMP join state and max hop limit
were updated incorrectly. IGMP join should be sent and marked as
joined only on RoCEv2 after a successful join. Max hops should be
updated to the hop limit on RoCEv2 regardless of the join state.

Fixes: bee3c3c918 ('IB/cma: Join and leave multicast groups...')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:03:57 -04:00
Talat Batheesh f336ae0314 IB/core: Fix no default GIDs when netdevice reregisters
Currently, when the netdevice returned by get_netdev is unregistered,
we delete all GIDs (including the default GIDs) and reset their
attributes. Therefore, when we re-register it, no default GIDs
will be assigned (as their "default GID") attribute will be reset.
Fixing this by keeping "default GID" attribute.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:03:57 -04:00
David Vrabel d6b186c1e2 x86/xen: avoid m2p lookup when setting early page table entries
When page tables entries are set using xen_set_pte_init() during early
boot there is no page fault handler that could handle a fault when
performing an M2P lookup.

In 64 bit guests (usually dom0) early_ioremap() would fault in
xen_set_pte_init() because an M2P lookup faults because the MFN is in
MMIO space and not mapped in the M2P.  This lookup is done to see if
the PFN in in the range used for the initial page table pages, so that
the PTE may be set as read-only.

The M2P lookup can be avoided by moving the check (and clear of RW)
earlier when the PFN is still available.

Reported-by: Kevin Moraga <kmoragas@riseup.net>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
2016-06-23 14:50:30 +01:00
He Kuang 3bd03c9583 perf unwind: Fix wrongly used regs for aarch64 unwind
By default, "unwind-libunwind-local.c" gets SP/IP register number
according to the host platform, for remote unwind, we should use
register number for target platform. Fix this by define
LIBUNWIND_ARCH_REG_SP/IP in the wrapper file of aarch64 platform.

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1466578626-92406-4-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 10:30:31 -03:00
He Kuang 5dafea097a perf unwind: Fix wrongly used regs for x86_32 unwind
By default, "unwind-libunwind-local.c" gets SP/IP register number
according to the host platform, for remote unwind, we should use
register number for target platform. Fix this by define
LIBUNWIND_ARCH_REG_SP/IP in the wrapper file of x86_32 platform.

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1466578626-92406-3-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 10:30:21 -03:00
He Kuang 78ff1d6d8b perf unwind: Change macro names of perf register
Use macro name prefixed with "LIBUNWIND_ARCH" for better understanding
that the regs used by callbacks of libunwind are arch specific. The real
regs used should be defined in the wrapper file of
"unwind-libunwind-local.c" for each supported arch.

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1466578626-92406-2-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 10:30:17 -03:00
He Kuang 76c588f1f6 perf tools: Find right DSO taking into account if binary is 32 or 64-bit
There's a problem in machine__findnew_vdso(), vdso buildid generated by a
32-bit machine stores it with the name 'vdso', but when processing buildid on a
64-bit machine with the same 'perf.data', perf will search for vdso named as
'vdso32' and get failed.

This patch tries to find the existing dsos in machine->dsos by thread dso_type.
64-bit thread tries to find vdso with name 'vdso', because all 64-bit vdso is
named as that. 32-bit thread first tries to find vdso with name 'vdso32' if
this thread was run on 64-bit machine, if failed, then it tries 'vdso' which
indicates that the thread was run on 32-bit machine when recording.

Committer note:

Additional explanation by Adrian Hunter:

We match maps to builds ids using the file name - consider
machine__findnew_[v]dso() called in map__new().  So in the context of a perf
data file, we consider the file name to be unique.

A vdso map does not have a file name - all we know is that it is vdso.  We look
at the thread to tell if it is 32-bit, 64-bit or x32.  Then we need to get the
build id which has been recorded using short name "[vdso]" or "[vdso32]" or
"[vdsox32]".

The problem is that on a 32-bit machine, we use the name "[vdso]".  If you take
a 32-bit perf data file to a 64-bit machine, it gets hard to figure out if
"[vdso]" is 32-bit or 64-bit.

This patch solves that problem.

 ----

This also merges a followup patch fixing a problem introduced by the
original submission of this patch, that would crash 'perf record' when
recording samples for a 32-bit app on a 64-bit system.

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1463475894-163531-1-git-send-email-hekuang@huawei.com
Link: http://lkml.kernel.org/r/1466578626-92406-6-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 10:25:58 -03:00
Pali Rohár 5ce91714b0 hwmon: (dell-smm) Cache fan_type() calls and change fan detection
On more Dell machines (e.g. Dell Precision M3800) fan_type() call is too
expensive (CPU is too long in SMM mode) and cause kernel to hang. This is
bug in Dell SMM or BIOS.

This patch caches type for each fan (as it should not change) and changes
the way how fan presense is detected. First it try function fan_status()
as was before commit f989e55452 ("i8k: Add support for fan labels"). And
if that fails fallback to fan_type(). *_status() functions can fail in case
fan is not currently accessible (e.g. present on GPU which is currently
turned off).

Reported-by: Tolga Cakir <cevelnet@gmail.com>
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021
Cc: stable@vger.kernel.org # v4.0+, will need backport
Tested-by: Tolga Cakir <cevelnet@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-06-23 06:24:23 -07:00
Rafael J. Wysocki 492a19ef8b Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq into pm-devfreq
Pull devfreq fixes for v4.7 from  MyungJoo Ham.

* 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq:
  PM / devfreq: fix initialization of current frequency in last status
  PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
  PM / devfreq: remove double put_device
  PM / devfreq: fix double call put_device
  PM / devfreq: fix duplicated kfree on devfreq pointer
  PM / devfreq: devm_kzalloc to have dev pointer more precisely
2016-06-23 14:48:41 +02:00
Taeung Song 41840d211c perf config: Move config declarations from util/cache.h to util/config.h
Lately util/config.h has been added but util/cache.h has declarations of
functions and a global variable for config features.

To manage codes about configuration at one spot, move them to
util/config.h and let source files that need config features include
config.h And if the source files that included previous cache.h need
only config.h, remove including cache.h.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1466672119-4852-2-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23 08:51:41 -03:00
Andrey Grodzovsky 02ef871eca xen/pciback: Fix conf_space read/write overlap check.
Current overlap check is evaluating to false a case where a filter
field is fully contained (proper subset) of a r/w request.  This
change applies classical overlap check instead to include all the
scenarios.

More specifically, for (Hilscher GmbH CIFX 50E-DP(M/S)) device driver
the logic is such that the entire confspace is read and written in 4
byte chunks. In this case as an example, CACHE_LINE_SIZE,
LATENCY_TIMER and PCI_BIST are arriving together in one call to
xen_pcibk_config_write() with offset == 0xc and size == 4.  With the
exsisting overlap check the LATENCY_TIMER field (offset == 0xd, length
== 1) is fully contained in the write request and hence is excluded
from write, which is incorrect.

Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23 11:36:15 +01:00
Juergen Gross 1cf3874130 x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper
bound of the loop setting non-kernel-image entries to zero should not
exceed the size of level2_kernel_pgt.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23 11:36:15 +01:00
Ross Lagerwall 842775f150 xen/balloon: Fix declared-but-not-defined warning
Fix a declared-but-not-defined warning when building with
XEN_BALLOON_MEMORY_HOTPLUG=n. This fixes a regression introduced by
commit dfd74a1edf ("xen/balloon: Fix crash when ballooning on x86 32
bit PAE").

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23 11:36:15 +01:00
Josef Bacik c6887cd111 Btrfs: don't do nocow check unless we have to
Before we write into prealloc/nocow space we have to make sure that there are no
references to the extents we are writing into, which means checking the extent
tree and csum tree in the case of nocow.  So we don't want to do the nocow dance
unless we can't reserve data space, since it's a serious drag on performance.
With the following sequence

fallocate -l10737418240 /mnt/btrfs-test/file
cp --reflink /mnt/btrfs-test/file /mnt/btrfs-test/link
fio --name=randwrite --rw=randwrite --bs=4k --filename=/mnt/btrfs-test/file \
	--end_fsync=1

we get the worst case scenario where we have to fall back on to doing the check
anyway.

Without this patch
lat (usec): min=5, max=111598, avg=27.65, stdev=124.51
write: io=10240MB, bw=126876KB/s, iops=31718, runt= 82646msec

With this patch
lat (usec): min=3, max=91210, avg=14.09, stdev=110.62
write: io=10240MB, bw=212753KB/s, iops=53188, runt= 49286msec

We get twice the throughput, half of the runtime, and half of the average
latency.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
[ PAGE_CACHE_ removal related fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:57:14 -07:00
Chris Mason 0f873eca82 btrfs: fix deadlock in delayed_ref_async_start
"Btrfs: track transid for delayed ref flushing" was deadlocking on
btrfs_attach_transaction because its not safe to call from the async
delayed ref start code.  This commit brings back btrfs_join_transaction
instead and checks for a blocked commit.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22 17:54:18 -07:00