Commit Graph

578112 Commits

Author SHA1 Message Date
Jesper Dangaard Brouer 0142eae3ae mm: kmemcheck skip object if slab allocation failed
In the SLAB allocator kmemcheck_slab_alloc() is guarded against being
called in case the object is NULL.  In SLUB allocator this NULL pointer
invocation can happen, which seems like an oversight.

Move the NULL pointer check into kmemcheck code (kmemcheck_slab_alloc)
so the check gets moved out of the fastpath, when not compiled with
CONFIG_KMEMCHECK.

This is a step towards sharing post_alloc_hook between SLUB and SLAB,
because slab_post_alloc_hook() does not perform this check before
calling kmemcheck_slab_alloc().

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Jesper Dangaard Brouer 011eceaf0a slab: use slab_pre_alloc_hook in SLAB allocator shared with SLUB
Deduplicate code in SLAB allocator functions slab_alloc() and
slab_alloc_node() by using the slab_pre_alloc_hook() call, which is now
shared between SLUB and SLAB.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Jesper Dangaard Brouer fab9963a69 mm: fault-inject take over bootstrap kmem_cache check
Remove the SLAB specific function slab_should_failslab(), by moving the
check against fault-injection for the bootstrap slab, into the shared
function should_failslab() (used by both SLAB and SLUB).

This is a step towards sharing alloc_hook's between SLUB and SLAB.

This bootstrap slab "kmem_cache" is used for allocating struct
kmem_cache objects to the allocator itself.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Jesper Dangaard Brouer 11c7aec2a9 mm/slab: move SLUB alloc hooks to common mm/slab.h
First step towards sharing alloc_hook's between SLUB and SLAB
allocators.  Move the SLUB allocators *_alloc_hook to the common
mm/slab.h for internal slab definitions.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Jesper Dangaard Brouer 376bf125ac slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
This change is primarily an attempt to make it easier to realize the
optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not
enabled.

Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the
overhead is zero.  This is because, as long as no process have enabled
kmem cgroups accounting, the assignment is replaced by asm-NOP
operations.  This is possible because memcg_kmem_enabled() uses a
static_key_false() construct.

It also helps readability as it avoid accessing the p[] array like:
p[size - 1] which "expose" that the array is processed backwards inside
helper function build_detached_freelist().

Lastly this also makes the code more robust, in error case like passing
NULL pointers in the array.  Which were previously handled before commit
033745189b ("slub: add missing kmem cgroup support to
kmem_cache_free_bulk").

Fixes: 033745189b ("slub: add missing kmem cgroup support to kmem_cache_free_bulk")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Arnd Bergmann dec63a4dec paride: make 'verbose' parameter an 'int' again
gcc-6.0 found an ancient bug in the paride driver, which had a
"module_param(verbose, bool, 0);" since before 2.6.12, but actually uses
it to accept '0', '1' or '2' as arguments:

  drivers/block/paride/pd.c: In function 'pd_init_dev_parms':
  drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare]
   #define DBMSG(msg) ((verbose>1)?(msg):NULL)

In 2012, Rusty did a cleanup patch that also changed the type of the
variable to 'bool', which introduced what is now a gcc warning.

This changes the type back to 'int' and adapts the module_param() line
instead, so it should work as documented in case anyone ever cares about
running the ancient driver with debugging.

Fixes: 90ab5ee941 ("module_param: make bool parameters really bool (drivers & misc)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Rusty Russell <rusty@rustcorp.com.au>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
San Mehat 0d9c51a6e1 block: partition: add partition specific uevent callbacks for partition info
This patch has been carried in the Android tree for quite some time and
is one of the few patches required to get a mainline kernel up and
running with an exsiting Android userspace.  So I wanted to submit it
for review and consideration if it should be merged.

For partitions, add new uevent parameters 'PARTN' which specifies the
partitions index in the table, and 'PARTNAME', which specifies PARTNAME
specifices the partition name of a partition device.

Android's userspace uses this for creating device node links from the
partition name and number, ie:

    /dev/block/platform/soc/by-name/system
or
    /dev/block/platform/soc/by-num/p1

One can see its usage here:
    https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#355
and
    https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#494

[john.stultz@linaro.org: dropped NPARTS and reworded commit message for context]
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <harald@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Jun Piao 8d67d3c244 ocfs2/dlm: fix a variable overflow problem in dlmdomain.c
In dlm_send_join_cancels(), node is defined with type unsigned int, but
initialized with -1, this will lead variable overflow.  Although this
won't cause any runtime problem, the code looks a little uncoordinated.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Jiufei Xue 814ce69432 ocfs2: fix a tiny race that leads file system read-only
when o2hb detect a node down, it first set the dead node to recovery map
and create ocfs2rec which will replay journal for dead node.  o2hb
thread then call dlm_do_local_recovery_cleanup() to delete the lock for
dead node.  After the lock of dead node is gone, locks for other nodes
can be granted and may modify the meta data without replaying journal of
the dead node.  The detail is described as follows.

     N1                         N2                   N3(master)
modify the extent tree of
inode, and commit
dirty metadata to journal,
then goes down.
                                                 o2hb thread detects
                                                 N1 goes down, set
                                                 recovery map and
                                                 delete the lock of N1.

                                                 dlm_thread flush ast
                                                 for the lock of N2.
                        do not detect the death
                        of N1, so recovery map is
                        empty.

                        read inode from disk
                        without replaying
                        the journal of N1 and
                        modify the extent tree
                        of the inode that N1
                        had modified.
                                                 ocfs2rec recover the
                                                 journal of N1.
                                                 The modification of N2
                                                 is lost.

The modification of N1 and N2 are not serial, and it will lead to
read-only file system.  We can set recovery_waiting flag to the lock
resource after delete the lock for dead node to prevent other node from
getting the lock before dlm recovery.  After dlm recovery, the recovery
map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2
recovery.

Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
xuejiufei d277f33eda ocfs2/dlm: return EINVAL when the lockres on migration target is in DROPPING_REF state
If master migrate this lock resource to node when it happened to purge
it, a new lock resource will be created and inserted into hash list.  If
then master goes down, the lock resource being purged is recovered, so
there exist two lock resource with different owner.  So return error to
master if the lock resource is in DROPPING state, master will retry to
migrate this lock resource.

Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
xuejiufei 8c03439681 ocfs2/dlm: clear DROPPING_REF flag when the master goes down
If the master goes down after return in-progress for deref message.  The
lock resource on non-master node can not be purged.  Clear the
DROPPING_REF flag and recovery it.

Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
xuejiufei 842b90b624 ocfs2/dlm: return in progress if master can not clear the refmap bit right now
Master returns in-progress to non-master node when it can not clear the
refmap bit right now.  And non-master node will not purge the lock
resource until receiving deref done message.

Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
xuejiufei 60d663cb52 ocfs2/dlm: add DEREF_DONE message
This series of patches is to fix the dis-order issue of setting/clearing
refmap bit described below.

Node 1                               Node 2(master)
dlmlock
dlm_do_master_request
                                dlm_master_request_handler
                                -> dlm_lockres_set_refmap_bit
dlmlock succeed
dlmunlock succeed

dlm_purge_lockres
                                dlm_deref_handler
                                -> find lock resource is in
                                   DLM_LOCK_RES_SETREF_INPROG state,
                                   so dispatch a deref work
dlm_purge_lockres succeed.

call dlmlock again
dlm_do_master_request
                                dlm_master_request_handler
                                -> dlm_lockres_set_refmap_bit

                                deref work trigger, call
                                dlm_lockres_clear_refmap_bit
                                to clear Node 1 from refmap

                                dlm_purge_lockres succeed

dlm_send_remote_lock_request
                                return DLM_IVLOCKID because
                                the lockres is not exist
BUG if the lockres is $RECOVERY

This series of patches add a new message to keep the order of set and
clear.  Other nodes can purge the lock resource only after the refmap bit
on master is cleared.

This patch is to add DEREF_DONE message and corresponding handler.  Node
can purge the lock resource after receiving this message.  As a new
message is added, so increase the minor number of dlm protocol version.

Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Joseph Qi 39b29af030 ocfs2/dlm: fix a typo in dlmcommon.h
Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for
O2NET_MAX_PAYLOAD_BYTES.

Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't
cause any problem.  But we'd better correct it for further use.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
jiangyiwen bfd97a0320 ocfs2: use spinlock_irqsave() to downconvert lock in ocfs2_osb_dump()
Commit a75e9ccabd ("ocfs2: use spinlock irqsave for downconvert lock")
missed an unmodified place in ocfs2_osb_dump(), so it still exists a
deadlock scenario.

    ocfs2_wake_downconvert_thread
    ocfs2_rw_unlock
    ocfs2_dio_end_io
    dio_complete
    .....
    bio_endio
    req_bio_endio
    ....
    scsi_io_completion
    blk_done_softirq
    __do_softirq
    do_softirq
    irq_exit
    do_IRQ
    ocfs2_osb_dump
    cat /sys/kernel/debug/ocfs2/${uuid}/fs_state

This patch still uses spin_lock_irqsave() - replace spin_lock() to solve
this situation.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
jiangyiwen 4d548f61d6 ocfs2/cluster: replace the interrupt safe spinlocks with common ones
There actually no hardware or software interrupts in the context which
using o2hb_live_lock, so we don't need to worry about race conditions
caused by irq/softirq with spinlock held.  Turning off irq is not good
for system performance after all.  Just replace them with a non
interrupt safe function.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Sudip Mukherjee e928f35061 blackfin: define dummy pgprot_writecombine for !MMU
blackfin allmodconfig build fails with the error:

  ../sound/core/pcm_native.c: In function 'snd_pcm_lib_default_mmap':
  ../sound/core/pcm_native.c:3386:24: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration]
     area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
                          ^
  ../sound/core/pcm_native.c:3386:22: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int'
     area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
                        ^

When !MMU, asm-generic will not define default pgprot_writecombine, so
blackfin needs to define it by itself.

The patch idea is from commit 65b9ab888c ("arch/c6x/include/asm/pgtable.h:
define dummy pgprot_writecombine for !MMU")

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Cc: Steven Miao <realmz6@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Sudip Mukherjee 3701dc8151 m32r: mm: fix build warning
While building we are getting warnings:

  arch/m32r/mm/init.c:63:17: warning: unused variable 'low'
  arch/m32r/mm/init.c:62:17: warning: unused variable 'max_dma'

max_dma and low are only used if CONFIG_MMU is defined.  Lets declare
the variables inside the #ifdef.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Peter Zijlstra 25528213fe tags: Fix DEFINE_PER_CPU expansions
$ make tags
  GEN     tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

Which are all the result of the DEFINE_PER_CPU pattern:

  scripts/tags.sh:200:	'/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
  scripts/tags.sh:201:	'/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'

The below cures them. All except the workqueue one are within reasonable
distance of the 80 char limit. TJ do you have any preference on how to
fix the wq one, or shall we just not care its too long?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
Geliang Tang e6fd1fb3b5 init/main.c: use list_for_each_entry()
Use list_for_each_entry() instead of list_for_each() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15 16:55:16 -07:00
James Bottomley a7dee8f45f Merge branch 'fixes' into misc 2016-03-15 15:24:44 -07:00
Linus Torvalds 710d60cbf1 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug updates from Thomas Gleixner:
 "This is the first part of the ongoing cpu hotplug rework:

   - Initial implementation of the state machine

   - Runs all online and prepare down callbacks on the plugged cpu and
     not on some random processor

   - Replaces busy loop waiting with completions

   - Adds tracepoints so the states can be followed"

More detailed commentary on this work from an earlier email:
 "What's wrong with the current cpu hotplug infrastructure?

   - Asymmetry

     The hotplug notifier mechanism is asymmetric versus the bringup and
     teardown.  This is mostly caused by the notifier mechanism.

   - Largely undocumented dependencies

     While some notifiers use explicitely defined notifier priorities,
     we have quite some notifiers which use numerical priorities to
     express dependencies without any documentation why.

   - Control processor driven

     Most of the bringup/teardown of a cpu is driven by a control
     processor.  While it is understandable, that preperatory steps,
     like idle thread creation, memory allocation for and initialization
     of essential facilities needs to be done before a cpu can boot,
     there is no reason why everything else must run on a control
     processor.  Before this patch series, bringup looks like this:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

       bring the rest up

   - All or nothing approach

     There is no way to do partial bringups.  That's something which is
     really desired because we waste e.g.  at boot substantial amount of
     time just busy waiting that the cpu comes to life.  That's stupid
     as we could very well do preparatory steps and the initial IPI for
     other cpus and then go back and do the necessary low level
     synchronization with the freshly booted cpu.

   - Minimal debuggability

     Due to the notifier based design, it's impossible to switch between
     two stages of the bringup/teardown back and forth in order to test
     the correctness.  So in many hotplug notifiers the cancel
     mechanisms are either not existant or completely untested.

   - Notifier [un]registering is tedious

     To [un]register notifiers we need to protect against hotplug at
     every callsite.  There is no mechanism that bringup/teardown
     callbacks are issued on the online cpus, so every caller needs to
     do it itself.  That also includes error rollback.

  What's the new design?

     The base of the new design is a symmetric state machine, where both
     the control processor and the booting/dying cpu execute a well
     defined set of states.  Each state is symmetric in the end, except
     for some well defined exceptions, and the bringup/teardown can be
     stopped and reversed at almost all states.

     So the bringup of a cpu will look like this in the future:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

                                       bring itself up

     The synchronization step does not require the control cpu to wait.
     That mechanism can be done asynchronously via a worker or some
     other mechanism.

     The teardown can be made very similar, so that the dying cpu cleans
     up and brings itself down.  Cleanups which need to be done after
     the cpu is gone, can be scheduled asynchronously as well.

  There is a long way to this, as we need to refactor the notion when a
  cpu is available.  Today we set the cpu online right after it comes
  out of the low level bringup, which is not really correct.

  The proper mechanism is to set it to available, i.e. cpu local
  threads, like softirqd, hotplug thread etc. can be scheduled on that
  cpu, and once it finished all booting steps, it's set to online, so
  general workloads can be scheduled on it.  The reverse happens on
  teardown.  First thing to do is to forbid scheduling of general
  workloads, then teardown all the per cpu resources and finally shut it
  off completely.

  This patch series implements the basic infrastructure for this at the
  core level.  This includes the following:

   - Basic state machine implementation with well defined states, so
     ordering and prioritization can be expressed.

   - Interfaces to [un]register state callbacks

     This invokes the bringup/teardown callback on all online cpus with
     the proper protection in place and [un]installs the callbacks in
     the state machine array.

     For callbacks which have no particular ordering requirement we have
     a dynamic state space, so that drivers don't have to register an
     explicit hotplug state.

     If a callback fails, the code automatically does a rollback to the
     previous state.

   - Sysfs interface to drive the state machine to a particular step.

     This is only partially functional today.  Full functionality and
     therefor testability will be achieved once we converted all
     existing hotplug notifiers over to the new scheme.

   - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
     processor:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu
       wait for boot
                                       bring itself up

                                       Signal completion to control cpu

     In a previous step of this work we've done a full tree mechanical
     conversion of all hotplug notifiers to the new scheme.  The balance
     is a net removal of about 4000 lines of code.

     This is not included in this series, as we decided to take a
     different approach.  Instead of mechanically converting everything
     over, we will do a proper overhaul of the usage sites one by one so
     they nicely fit into the symmetric callback scheme.

     I decided to do that after I looked at the ugliness of some of the
     converted sites and figured out that their hotplug mechanism is
     completely buggered anyway.  So there is no point to do a
     mechanical conversion first as we need to go through the usage
     sites one by one again in order to achieve a full symmetric and
     testable behaviour"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  cpu/hotplug: Document states better
  cpu/hotplug: Fix smpboot thread ordering
  cpu/hotplug: Remove redundant state check
  cpu/hotplug: Plug death reporting race
  rcu: Make CPU_DYING_IDLE an explicit call
  cpu/hotplug: Make wait for dead cpu completion based
  cpu/hotplug: Let upcoming cpu bring itself fully up
  arch/hotplug: Call into idle with a proper state
  cpu/hotplug: Move online calls to hotplugged cpu
  cpu/hotplug: Create hotplug threads
  cpu/hotplug: Split out the state walk into functions
  cpu/hotplug: Unpark smpboot threads from the state machine
  cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
  cpu/hotplug: Implement setup/removal interface
  cpu/hotplug: Make target state writeable
  cpu/hotplug: Add sysfs state interface
  cpu/hotplug: Hand in target state to _cpu_up/down
  cpu/hotplug: Convert the hotplugged cpu work to a state machine
  cpu/hotplug: Convert to a state machine for the control processor
  cpu/hotplug: Add tracepoints
  ...
2016-03-15 13:50:29 -07:00
Linus Torvalds df2e37c814 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The 4.6 pile of irq updates contains:

   - Support for IPI irqdomains to support proper integration of IPIs to
     and from coprocessors.  The first user of this new facility is
     MIPS.  The relevant MIPS patches come with the core to avoid merge
     ordering issues and have been acked by Ralf.

   - A new command line option to set the default interrupt affinity
     mask at boot time.

   - Support for some more new ARM and MIPS interrupt controllers:
     tango, alpine-msix and bcm6345-l1

   - Two small cleanups for x86/apic which we merged into irq/core to
     avoid yet another branch in x86 with two tiny commits.

   - The usual set of updates, cleanups in drivers/irqchip.  Mostly in
     the area of ARM-GIC, arada-37-xp and atmel chips.  Nothing
     outstanding here"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  irqchip/irq-alpine-msi: Release the correct domain on error
  irqchip/mxs: Fix error check of of_io_request_and_map()
  irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
  genirq: Export IRQ functions for module use
  irqchip/gic/realview: Support more RealView DCC variants
  Documentation/bindings: Document the Alpine MSIX driver
  irqchip: Add the Alpine MSIX interrupt controller
  irqchip/gic-v3: Always return IRQ_SET_MASK_OK_DONE in gic_set_affinity
  irqchip/gic-v3-its: Mark its_init() and its children as __init
  irqchip/gic-v3: Remove gic_root_node variable from the ITS code
  irqchip/gic-v3: ACPI: Add redistributor support via GICC structures
  irqchip/gic-v3: Add ACPI support for GICv3/4 initialization
  irqchip/gic-v3: Refactor gic_of_init() for GICv3 driver
  x86/apic: Deinline _flat_send_IPI_mask, save ~150 bytes
  x86/apic: Deinline __default_send_IPI_*, save ~200 bytes
  dt-bindings: interrupt-controller: Add SoC-specific compatible string to Marvell ODMI
  irqchip/mips-gic: Add new DT property to reserve IPIs
  MIPS: Delete smp-gic.c
  MIPS: Make smp CMP, CPS and MT use the new generic IPI functions
  MIPS: Add generic SMP IPI support
  ...
2016-03-15 12:48:48 -07:00
Linus Torvalds 8a284c062e Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer department delivers this time:

   - Support for cross clock domain timestamps in the core code plus a
     first user.  That allows more precise timestamping for PTP and
     later for audio and other peripherals.

     The ptp/e1000e patches have been acked by the relevant maintainers
     and are carried in the timer tree to avoid merge ordering issues.

   - Support for unregistering the current clocksource watchdog.  That
     lifts a limitation for switching clocksources which has been there
     from day 1

   - The usual pile of fixes and updates to the core and the drivers.
     Nothing outstanding and exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  time/timekeeping: Work around false positive GCC warning
  e1000e: Adds hardware supported cross timestamp on e1000e nic
  ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
  x86/tsc: Always Running Timer (ART) correlated clocksource
  hrtimer: Revert CLOCK_MONOTONIC_RAW support
  time: Add history to cross timestamp interface supporting slower devices
  time: Add driver cross timestamp interface for higher precision time synchronization
  time: Remove duplicated code in ktime_get_raw_and_real()
  time: Add timekeeping snapshot code capturing system time and counter
  time: Add cycles to nanoseconds translation
  jiffies: Use CLOCKSOURCE_MASK instead of constant
  clocksource: Introduce clocksource_freq2mult()
  clockevents/drivers/exynos_mct: Implement ->set_state_oneshot_stopped()
  clockevents/drivers/arm_global_timer: Implement ->set_state_oneshot_stopped()
  clockevents/drivers/arm_arch_timer: Implement ->set_state_oneshot_stopped()
  clocksource/drivers/arm_global_timer: Register delay timer
  clocksource/drivers/lpc32xx: Support timer-based ARM delay
  clocksource/drivers/lpc32xx: Support periodic mode
  clocksource/drivers/lpc32xx: Don't use the prescaler counter for clockevents
  clocksource/drivers/rockchip: Add err handle for rk_timer_init
  ...
2016-03-15 12:13:56 -07:00
Linus Torvalds 208de21477 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Miscellaneous fixes, cleanups, restructuring.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Export rcu_gp_is_normal()
  rcu: Remove rcu_user_hooks_switch
  rcu: Catch up rcu_report_qs_rdp() comment with reality
  rcu: Document unique-name limitation for DEFINE_STATIC_SRCU()
  rcu: Make rcu/tiny_plugin.h explicitly non-modular
  irq: Privatize irq_common_data::state_use_accessors
  RCU: Privatize rcu_node::lock
  sparse: Add __private to privatize members of structs
  rcu: Remove useless rcu_data_p when !PREEMPT_RCU
  rcutorture: Correct no-expedite console messages
  rcu: Set rdp->gpwrap when CPU is idle
  rcu: Stop treating in-kernel CPU-bound workloads as errors
  rcu: Update rcu_report_qs_rsp() comment
  rcu: Assign false instead of 0 for ->core_needs_qs
  rcutorture: Check for self-detected stalls
  rcutorture: Don't keep empty console.log.diags files
  rcutorture: Add checks for rcutorture writer starvation
2016-03-15 11:38:05 -07:00
Linus Torvalds ae465beeff Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer update from Ingo Molnar:
 "A single simplification of the x86 TSC code"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Use topology functions
2016-03-15 11:29:24 -07:00
Linus Torvalds 8ab84ef699 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core platform updates from Ingo Molnar:
 "Intel Quark and Geode SoC platform updates"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/intel/quark: Drop IMR lock bit support
  x86/platform/intel/mid: Remove dead code
  x86/platform: Make platform/geode/net5501.c explicitly non-modular
  x86/platform: Make platform/geode/alix.c explicitly non-modular
  x86/platform: Make platform/geode/geos.c explicitly non-modular
  x86/platform: Make platform/intel-quark/imr_selftest.c explicitly non-modular
  x86/platform: Make platform/intel-quark/imr.c explicitly non-modular
2016-03-15 11:20:44 -07:00
Linus Torvalds 13c76ad872 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Enable full ASLR randomization for 32-bit programs (Hector
     Marco-Gisbert)

   - Add initial minimal INVPCI support, to flush global mappings (Andy
     Lutomirski)

   - Add KASAN enhancements (Andrey Ryabinin)

   - Fix mmiotrace for huge pages (Karol Herbst)

   - ... misc cleanups and small enhancements"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/32: Enable full randomization on i386 and X86_32
  x86/mm/kmmio: Fix mmiotrace for hugepages
  x86/mm: Avoid premature success when changing page attributes
  x86/mm/ptdump: Remove paravirt_enabled()
  x86/mm: Fix INVPCID asm constraint
  x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()
  x86/mm: If INVPCID is available, use it to flush global mappings
  x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
  x86/mm: Add INVPCID helpers
  x86/kasan: Write protect kasan zero shadow
  x86/kasan: Clear kasan_zero_page after TLB flush
  x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug()
  x86/mm/numa: Clean up numa_clear_kernel_node_hotplug()
  x86/mm: Make kmap_prot into a #define
  x86/mm/32: Set NX in __supported_pte_mask before enabling paging
  x86/mm: Streamline and restore probe_memory_block_size()
2016-03-15 10:45:39 -07:00
Linus Torvalds 9cf8d6360c Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Ingo Molnar:
 "The biggest change in this cycle was the separation of the microcode
  loading mechanism from the initrd code plus the support of built-in
  microcode images.

  There were also lots cleanups and general restructuring (by Borislav
  Petkov)"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/microcode/intel: Drop orig_sum from ext signature checksum
  x86/microcode/intel: Improve microcode sanity-checking error messages
  x86/microcode/intel: Merge two consecutive if-statements
  x86/microcode/intel: Get rid of DWSIZE
  x86/microcode/intel: Change checksum variables to u32
  x86/microcode: Use kmemdup() rather than duplicating its implementation
  x86/microcode: Remove unnecessary paravirt_enabled check
  x86/microcode: Document builtin microcode loading method
  x86/microcode/AMD: Issue microcode updated message later
  x86/microcode/intel: Cleanup get_matching_model_microcode()
  x86/microcode/intel: Remove unused arg of get_matching_model_microcode()
  x86/microcode/intel: Rename mc_saved_in_initrd
  x86/microcode/intel: Use *wrmsrl variants
  x86/microcode/intel: Cleanup apply_microcode_intel()
  x86/microcode/intel: Move the BUG_ON up and turn it into WARN_ON
  x86/microcode/intel: Rename mc_intel variable to mc
  x86/microcode/intel: Rename mc_saved_count to num_saved
  x86/microcode/intel: Rename local variables of type struct mc_saved_data
  x86/microcode/AMD: Drop redundant printk prefix
  x86/microcode: Issue update message only once
  ...
2016-03-15 10:39:22 -07:00
Linus Torvalds ecc026bff6 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The biggest change in terms of impact is the changing of the FPU
  context switch model to 'eagerfpu' for all CPU types, via: commit
  58122bf1d856: "x86/fpu: Default eagerfpu=on on all CPUs"

  This makes all FPU saves and restores synchronous and makes the FPU
  code a lot more obvious to read.  In the next cycle, if this change is
  problem free, we'll remove the old lazy FPU restore code altogether.

  This change flushed out some old bugs, which should all be fixed by
  now, BYMMV"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Default eagerfpu=on on all CPUs
  x86/fpu: Speed up lazy FPU restores slightly
  x86/fpu: Fold fpu_copy() into fpu__copy()
  x86/fpu: Fix FNSAVE usage in eagerfpu mode
  x86/fpu: Fix math emulation in eager fpu mode
2016-03-15 10:23:56 -07:00
Linus Torvalds fa53c48939 Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build update from Ingo Molnar:
 "A single adjustment of a defconfig value"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/defconfigs/32: Set CONFIG_FRAME_WARN to the Kconfig default
2016-03-15 10:16:48 -07:00
Linus Torvalds 42576bee6e Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Early command line options parsing enhancements from Dave Hansen, plus
  minor cleanups and enhancements"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Remove unused 'is_big_kernel' variable
  x86/boot: Use proper array element type in memset() size calculation
  x86/boot: Pass in size to early cmdline parsing
  x86/boot: Simplify early command line parsing
  x86/boot: Fix early command-line parsing when partial word matches
  x86/boot: Fix early command-line parsing when matching at end
  x86/boot: Simplify kernel load address alignment check
  x86/boot: Micro-optimize reset_early_page_tables()
2016-03-15 10:02:25 -07:00
Linus Torvalds ba33ea811e Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "This is another big update. Main changes are:

   - lots of x86 system call (and other traps/exceptions) entry code
     enhancements.  In particular the complex parts of the 64-bit entry
     code have been migrated to C code as well, and a number of dusty
     corners have been refreshed.  (Andy Lutomirski)

   - vDSO special mapping robustification and general cleanups (Andy
     Lutomirski)

   - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

   - lots of other changes ..."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/cpufeature: Enable new AVX-512 features
  x86/entry/traps: Show unhandled signal for i386 in do_trap()
  x86/entry: Call enter_from_user_mode() with IRQs off
  x86/entry/32: Change INT80 to be an interrupt gate
  x86/entry: Improve system call entry comments
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/asm-offsets: Remove PARAVIRT_enabled
  x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
  uprobes: __create_xol_area() must nullify xol_mapping.fault
  x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
  ...
2016-03-15 09:32:27 -07:00
Bjorn Helgaas 6e6f498b03 Merge branch 'pci/resource' into next
* pci/resource:
  PCI: Simplify pci_create_attr() control flow
  PCI: Don't leak memory if sysfs_create_bin_file() fails
  PCI: Simplify sysfs ROM cleanup
  PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
  MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
  MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
  ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
  ia64/PCI: Use ioremap() instead of open-coded equivalent
  ia64/PCI: Use temporary struct resource * to avoid repetition
  PCI: Clean up pci_map_rom() whitespace
  PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
  PCI: Set ROM shadow location in arch code, not in PCI core
  PCI: Don't enable/disable ROM BAR if we're using a RAM shadow copy
  PCI: Don't assign or reassign immutable resources
  PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED
  x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs
  PCI: Disable IO/MEM decoding for devices with non-compliant BARs
2016-03-15 08:56:28 -05:00
Bjorn Helgaas cfeb8139a1 Merge branch 'pci/host-hv' into next
* pci/host-hv:
  PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs
  PCI: Look up IRQ domain by fwnode_handle
  PCI: Add fwnode_handle to x86 pci_sysdata
2016-03-15 08:56:16 -05:00
Bjorn Helgaas 562df5c852 Merge branch 'pci/host-designware' into next
* pci/host-designware:
  PCI: designware: Add driver for prototyping kits based on ARC SDP
  PCI: designware: Add default link up check if sub-driver doesn't override
  PCI: designware: Add generic dw_pcie_wait_for_link()
  ARC: Add PCI support
2016-03-15 08:55:52 -05:00
Bjorn Helgaas c334f9c89e Merge branches 'pci/host-altera', 'pci/host-imx6', 'pci/host-keystone', 'pci/host-rcar', 'pci/host-tegra', 'pci/host-thunder', 'pci/host-vmd', 'pci/host-xilinx' and 'pci/host-xilinx-nwl' into next
* pci/host-altera:
  PCI: altera: Fix altera_pcie_link_is_up()

* pci/host-imx6:
  PCI: imx6: Add DT bindings to configure PHY Tx driver settings

* pci/host-keystone:
  PCI: keystone: Defer probing if devm_phy_get() returns -EPROBE_DEFER

* pci/host-rcar:
  PCI: rcar: Depend on ARCH_RENESAS, not ARCH_SHMOBILE

* pci/host-tegra:
  PCI: tegra: Remove misleading PHYS_OFFSET
  PCI: tegra: Track bus -> CPU mapping
  PCI: tegra: Remove unused struct tegra_pcie.num_ports field
  PCI: tegra: Implement ->{add,remove}_bus() callbacks
  PCI: Add pci_ops.{add,remove}_bus() callbacks

* pci/host-thunder:
  PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices
  PCI: thunder: Add PCIe host driver for ThunderX processors
  PCI: generic: Expose pci_host_common_probe() for use by other drivers
  PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe()
  PCI: generic: Move structure definitions to separate header file

* pci/host-vmd:
  x86/PCI: VMD: Attach VMD resources to parent domain's resource tree
  x86/PCI: VMD: Set bus resource start to 0
  x86/PCI: VMD: Document code for maintainability

* pci/host-xilinx:
  microblaze/PCI: Support generic Xilinx AXI PCIe Host Bridge IP driver
  PCI: xilinx: Update Zynq binding with Microblaze node
  PCI: xilinx: Don't call pci_fixup_irqs() on Microblaze
  PCI: xilinx: Remove dependency on ARM-specific struct hw_pci
  PCI: xilinx: Use of_pci_get_host_bridge_resources() to parse DT

* pci/host-xilinx-nwl:
  PCI: xilinx-nwl: Add support for Xilinx NWL PCIe Host Controller
2016-03-15 08:55:19 -05:00
Bjorn Helgaas 18e5e6913b Merge branches 'pci/aer', 'pci/enumeration', 'pci/kconfig', 'pci/misc', 'pci/virtualization' and 'pci/vpd' into next
* pci/aer:
  PCI/AER: Log aer_inject error injections
  PCI/AER: Log actual error causes in aer_inject
  PCI/AER: Use dev_warn() in aer_inject
  PCI/AER: Fix aer_inject error codes

* pci/enumeration:
  PCI: Fix broken URL for Dell biosdevname

* pci/kconfig:
  PCI: Cleanup pci/pcie/Kconfig whitespace
  PCI: Include pci/hotplug Kconfig directly from pci/Kconfig
  PCI: Include pci/pcie/Kconfig directly from pci/Kconfig

* pci/misc:
  PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
  PCI: Add QEMU top-level IDs for (sub)vendor & device
  unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition
  PCI: Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h
  PCI: Move pci_dma_* helpers to common code
  frv/PCI: Remove stray pci_{alloc,free}_consistent() declaration

* pci/virtualization:
  PCI: Wait for up to 1000ms after FLR reset
  PCI: Support SR-IOV on any function type

* pci/vpd:
  PCI: Prevent VPD access for buggy devices
  PCI: Sleep rather than busy-wait for VPD access completion
  PCI: Fold struct pci_vpd_pci22 into struct pci_vpd
  PCI: Rename VPD symbols to remove unnecessary "pci22"
  PCI: Remove struct pci_vpd_ops.release function pointer
  PCI: Move pci_vpd_release() from header file to pci/access.c
  PCI: Move pci_read_vpd() and pci_write_vpd() close to other VPD code
  PCI: Determine actual VPD size on first access
  PCI: Use bitfield instead of bool for struct pci_vpd_pci22.busy
  PCI: Allow access to VPD attributes with size 0
  PCI: Update VPD definitions
2016-03-15 08:55:02 -05:00
Heikki Krogerus 7b78f48a04 PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
PCI-SIG has defined Interface FEh for Base Class 0Ch, Sub-Class 03h as "USB
Device (not host controller)".  It is already being used in various USB
device controller drivers for matching, so add PCI_CLASS_SERIAL_USB_DEVICE
and use it.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-15 08:52:28 -05:00
Joao Pinto 5a3aa2a8fa PCI: designware: Add driver for prototyping kits based on ARC SDP
Add a reference platform driver for PCI RC IP Protoyping Kits based on the
ARC SDP.

[bhelgaas: changelog, split patch up, MAINTAINERS update]
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Pratyush Anand <pratyush.anand@gmail.com>
2016-03-15 08:50:45 -05:00
Joao Pinto dac29e6c54 PCI: designware: Add default link up check if sub-driver doesn't override
Add a default DesignWare "link_up" test for use when a sub-driver doesn't
supply its own pcie_host_ops.link_up() method.

[bhelgaas: changelog, split into its own patch]
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Pratyush Anand <pratyush.anand@gmail.com>
2016-03-15 08:50:45 -05:00
Joao Pinto 886bc5ceb5 PCI: designware: Add generic dw_pcie_wait_for_link()
Several DesignWare-based drivers (dra7xx, exynos, imx6, keystone, qcom, and
spear13xx) had similar loops waiting for the link to come up.

Add a generic dw_pcie_wait_for_link() for use by all these drivers so the
waiting is done consistently, e.g., always using usleep_range() rather than
mdelay() and using similar timeouts and retry counts.

Note that this changes the Keystone link training/wait for link strategy,
so we initiate link training, then wait longer for the link to come up
before re-initiating link training.

[bhelgaas: changelog, split into its own patch, update pci-keystone.c, pcie-qcom.c]
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Pratyush Anand <pratyush.anand@gmail.com>
2016-03-15 08:50:45 -05:00
Andreas Ziegler cc73176cc9 PCI: Cleanup pci/pcie/Kconfig whitespace
Clean up style issues in drivers/pci/pcie/Kconfig, in particular all
indentation is now done using tabs, not spaces, and the definition of
PCIEASPM_DEBUG is now separated from the definition of PCIEASPM with a
newline.

Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-15 08:46:57 -05:00
Mauro Carvalho Chehab 8331c055b2 Merge commit '840f5b0572ea' into v4l_for_linus
* commit '840f5b0572ea': (381 commits)
  media: au0828 disable tuner to demod link in au0828_media_device_register()
  [media] touptek: cast char types on %x printk
  [media] touptek: don't DMA at the stack
  [media] mceusb: use %*ph for small buffer dumps
  [media] v4l: exynos4-is: Drop unneeded check when setting up fimc-lite links
  [media] v4l: vsp1: Check if an entity is a subdev with the right function
  [media] hide unused functions for !MEDIA_CONTROLLER
  [media] em28xx: fix Terratec Grabby AC97 codec detection
  [media] media: add prefixes to interface types
  [media] media: rc: nuvoton: switch attribute wakeup_data to text
  [media] v4l2-ioctl: fix YUV422P pixel format description
  [media] media: fix null pointer dereference in v4l_vb2q_enable_media_source()
  [media] v4l2-mc.h: fix yet more compiler errors
  [media] staging/media: add missing TODO files
  [media] media.h: always start with 1 for the audio entities
  [media] sound/usb: Use meaninful names for goto labels
  [media] v4l2-mc.h: fix compiler warnings
  [media] media: au0828 audio mixer isn't connected to decoder
  [media] sound/usb: Use Media Controller API to share media resources
  [media] dw2102: add support for TeVii S662
  ...
2016-03-15 07:48:28 -03:00
Ingo Molnar 670191572e Merge commit 'torture.2015.02.23a' into core/rcu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-15 09:01:17 +01:00
Ingo Molnar 8bc6782fe2 Merge commit 'fixes.2015.02.23a' into core/rcu
Conflicts:
	kernel/rcu/tree.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-15 09:01:06 +01:00
Linus Torvalds e23604edac Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ updates from Ingo Molnar:
 "NOHZ enhancements, by Frederic Weisbecker, which reorganizes/refactors
  the NOHZ 'can the tick be stopped?' infrastructure and related code to
  be data driven, and harmonizes the naming and handling of all the
  various properties"

[ This makes the ugly "fetch_or()" macro that the scheduler used
  internally a new generic helper, and does a bad job at it.

  I'm pulling it, but I've asked Ingo and Frederic to get this
  fixed up ]

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched-clock: Migrate to use new tick dependency mask model
  posix-cpu-timers: Migrate to use new tick dependency mask model
  sched: Migrate sched to use new tick dependency mask model
  sched: Account rr tasks
  perf: Migrate perf to use new tick dependency mask model
  nohz: Use enum code for tick stop failure tracing message
  nohz: New tick dependency mask
  nohz: Implement wide kick on top of irq work
  atomic: Export fetch_or()
2016-03-14 19:44:38 -07:00
Linus Torvalds d4e796152a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Make schedstats a runtime tunable (disabled by default) and
     optimize it via static keys.

     As most distributions enable CONFIG_SCHEDSTATS=y due to its
     instrumentation value, this is a nice performance enhancement.
     (Mel Gorman)

   - Implement 'simple waitqueues' (swait): these are just pure
     waitqueues without any of the more complex features of full-blown
     waitqueues (callbacks, wake flags, wake keys, etc.).  Simple
     waitqueues have less memory overhead and are faster.

     Use simple waitqueues in the RCU code (in 4 different places) and
     for handling KVM vCPU wakeups.

     (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
     Marcelo Tosatti)

   - sched/numa enhancements (Rik van Riel)

   - NOHZ performance enhancements (Rik van Riel)

   - Various sched/deadline enhancements (Steven Rostedt)

   - Various fixes (Peter Zijlstra)

   - ... and a number of other fixes, cleanups and smaller enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  sched/cputime: Fix steal_account_process_tick() to always return jiffies
  sched/deadline: Remove dl_new from struct sched_dl_entity
  Revert "kbuild: Add option to turn incompatible pointer check into error"
  sched/deadline: Remove superfluous call to switched_to_dl()
  sched/debug: Fix preempt_disable_ip recording for preempt_disable()
  sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
  time, acct: Drop irq save & restore from __acct_update_integrals()
  acct, time: Change indentation in __acct_update_integrals()
  sched, time: Remove non-power-of-two divides from __acct_update_integrals()
  sched/rt: Kick RT bandwidth timer immediately on start up
  sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
  sched/debug: Move sched_domain_sysctl to debug.c
  sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
  sched/rt: Fix PI handling vs. sched_setscheduler()
  sched/core: Remove duplicated sched_group_set_shares() prototype
  sched/fair: Consolidate nohz CPU load update code
  sched/fair: Avoid using decay_load_missed() with a negative value
  sched/deadline: Always calculate end of period on sched_yield()
  sched/cgroup: Fix cgroup entity load tracking tear-down
  rcu: Use simple wait queues where possible in rcutree
  ...
2016-03-14 19:14:06 -07:00
Linus Torvalds d88bfe1d68 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "Various RAS updates:

   - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
     MCA) error decoding support (Aravind Gopalakrishnan)

   - x86 memcpy_mcsafe() support, to enable smart(er) hardware error
     recovery in NVDIMM drivers, based on an extension of the x86
     exception handling code.  (Tony Luck)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/sb_edac: Fix computation of channel address
  x86/mm, x86/mce: Add memcpy_mcsafe()
  x86/mce/AMD: Document some functionality
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
  x86/mce: Move MCx_CONFIG MSR definitions
  x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
  x86/mm: Expand the exception table logic to allow new handling options
  x86/mce/AMD: Set MCAX Enable bit
  x86/mce/AMD: Carve out threshold block preparation
  x86/mce/AMD: Fix LVT offset configuration for thresholding
  x86/mce/AMD: Reduce number of blocks scanned per bank
  x86/mce/AMD: Do not perform shared bank check for future processors
  x86/mce: Fix order of AMD MCE init function call
2016-03-14 18:43:51 -07:00
Linus Torvalds e71c2c1eeb Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Main kernel side changes:

   - Big reorganization of the x86 perf support code.  The old code grew
     organically deep inside arch/x86/kernel/cpu/perf* and its naming
     became somewhat messy.

     The new location is under arch/x86/events/, using the following
     cleaner hierarchy of source code files:

       perf/x86: Move perf_event.c .................. => x86/events/core.c
       perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
       perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
       perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
       perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
       perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
       perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
       perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
       perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
       perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
       perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
       perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
       perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
       perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
       perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
       perf/x86: Move perf_event_intel_uncore_snb.c   => x86/events/intel/uncore_snb.c
       perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
       perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
       perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
       perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
       perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

     (Borislav Petkov)

   - Update various x86 PMU constraint and hw support details (Stephane
     Eranian)

   - Optimize kprobes for BPF execution (Martin KaFai Lau)

   - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
     Gleixner)

   - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

   - Various fixes and smaller cleanups.

  There are lots of perf tooling updates as well.  A few highlights:

  perf report/top:

     - Hierarchy histogram mode for 'perf top' and 'perf report',
       showing multiple levels, one per --sort entry: (Namhyung Kim)

       On a mostly idle system:

         # perf top --hierarchy -s comm,dso

       Then expand some levels and use 'P' to take a snapshot:

         # cat perf.hist.0
         -  92.32%         perf
               58.20%         perf
               22.29%         libc-2.22.so
                5.97%         [kernel]
                4.18%         libelf-0.165.so
                1.69%         [unknown]
         -   4.71%         qemu-system-x86
                3.10%         [kernel]
                1.60%         qemu-system-x86_64 (deleted)
         +   2.97%         swapper
         #

     - Add 'L' hotkey to dynamicly set the percent threshold for
       histogram entries and callchains, i.e.  dynamicly do what the
       --percent-limit command line option to 'top' and 'report' does.
       (Namhyung Kim)

  perf mem:

     - Allow specifying events via -e in 'perf mem record', also listing
       what events can be specified via 'perf mem record -e list' (Jiri
       Olsa)

  perf record:

     - Add 'perf record' --all-user/--all-kernel options, so that one
       can tell that all the events in the command line should be
       restricted to the user or kernel levels (Jiri Olsa), i.e.:

         perf record -e cycles:u,instructions:u

       is equivalent to:

         perf record --all-user -e cycles,instructions

     - Make 'perf record' collect CPU cache info in the perf.data file header:

         $ perf record usleep 1
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
         $ perf report --header-only -I | tail -10 | head -8
         # CPU cache info:
         #  L1 Data                 32K [0-1]
         #  L1 Instruction          32K [0-1]
         #  L1 Data                 32K [2-3]
         #  L1 Instruction          32K [2-3]
         #  L2 Unified             256K [0-1]
         #  L2 Unified             256K [2-3]
         #  L3 Unified            4096K [0-3]

       Will be used in 'perf c2c' and eventually in 'perf diff' to
       allow, for instance running the same workload in multiple
       machines and then when using 'diff' show the hardware difference.
       (Jiri Olsa)

     - Improved support for Java, using the JVMTI agent library to do
       jitdumps that then will be inserted in synthesized
       PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
       ELF files stored in ~/.debug and keyed with build-ids, to allow
       symbol resolution and even annotation with source line info, see
       the changeset comments to see how to use it (Stephane Eranian)

  perf script/trace:

     - Decode data_src values (e.g.  perf.data files generated by 'perf
       mem record') in 'perf script': (Jiri Olsa)

         # perf script
           perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     - Improve support to 'data_src', 'weight' and 'addr' fields in
       'perf script' (Jiri Olsa)

     - Handle empty print fmts in 'perf script -s' i.e. when running
       python or perl scripts (Taeung Song)

  perf stat:

     - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
       interval mode too.  E.g:

         # perf stat -I 1000 -e instructions,cycles sleep 1
         #         time   counts unit events
            1.000215928  519,620      instructions     #  0.69 insn per cycle
            1.000215928  752,003      cycles
         <SNIP>

     - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

     - Implement CSV metrics output in 'perf stat' (Andi Kleen)

  perf BPF support:

     - Support converting data from bpf events in 'perf data' (Wang Nan)

     - Print bpf-output events in 'perf script': (Wang Nan).

         # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
         # perf script
            usleep  4882 21384.532523:   evt:  ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
         #

     - Add API to set values of map entries in a BPF object, be it
       individual map slots or ranges (Wang Nan)

     - Introduce support for the 'bpf-output' event (Wang Nan)

     - Add glue to read perf events in a BPF program (Wang Nan)

     - Improve support for bpf-output events in 'perf trace' (Wang Nan)

  ... and tons of other changes as well - see the shortlog and git log
  for details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
  perf stat: Add --metric-only support for -A
  perf stat: Implement --metric-only mode
  perf stat: Document CSV format in manpage
  perf hists browser: Check sort keys before hot key actions
  perf hists browser: Allow thread filtering for comm sort key
  perf tools: Add sort__has_comm variable
  perf tools: Recalc total periods using top-level entries in hierarchy
  perf tools: Remove nr_sort_keys field
  perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
  perf tools: Remove hist_entry->fmt field
  perf tools: Fix command line filters in hierarchy mode
  perf tools: Add more sort entry check functions
  perf tools: Fix hist_entry__filter() for hierarchy
  perf jitdump: Build only on supported archs
  tools lib traceevent: Add '~' operation within arg_num_eval()
  perf tools: Omit unnecessary cast in perf_pmu__parse_scale
  perf tools: Pass perf_hpp_list all the way through setup_sort_list
  perf tools: Fix perf script python database export crash
  perf jitdump: DWARF is also needed
  perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
  ...
2016-03-14 17:58:53 -07:00