Commit Graph

3874 Commits

Author SHA1 Message Date
Vikas Shivappa adcbdd7030 x86/intel_rdt: Fix padding when resource is enabled via mount
Currently max width of 'resource name' and 'resource data' is being
initialized based on 'enabled resources' during boot. But the mount can
enable different capable resources at a later time which upsets the
tabular format of schemata. Fix this to be based on 'all capable'
resources.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Prakhya, Sai Praneeth <sai.praneeth.prakhya@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1492645804-17465-2-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-20 15:57:59 +02:00
Borislav Petkov c6a9583fb4 x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only
mce_usable_address() does a bunch of basic sanity checks to verify
whether the address reported with the error is usable for further
processing. However, we do check MCi_STATUS[MISCV] and that is not
needed on AMD as that bit says that there's additional information about
the logged error in the MCi_MISCj banks.

But we don't need that to know whether the address is usable - we only
need to know whether the physical address is valid - i.e., ADDRV.

On Intel the MISCV bit is needed to perform additional checks to determine
whether the reported address is a physical one, etc.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170418183924.6agjkebilwqj26or@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-19 12:04:46 +02:00
Vishal Verma 0dc9c639e6 x86/mce: Make the MCE notifier a blocking one
The NFIT MCE handler callback (for handling media errors on NVDIMMs)
takes a mutex to add the location of a memory error to a list. But since
the notifier call chain for machine checks (x86_mce_decoder_chain) is
atomic, we get a lockdep splat like:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
  in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
  [..]
  Call Trace:
   dump_stack
   ___might_sleep
   __might_sleep
   mutex_lock_nested
   ? __lock_acquire
   nfit_handle_mce
   notifier_call_chain
   atomic_notifier_call_chain
   ? atomic_notifier_call_chain
   mce_gen_pool_process

Convert the notifier to a blocking one which gets to run only in process
context.

Boris: remove the notifier call in atomic context in print_mce(). For
now, let's print the MCE on the atomic path so that we can make sure
they go out and get logged at least.

Fixes: 6839a6d96f ("nfit: do an ARS scrub on hitting a latent media error")
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-18 22:23:48 +02:00
Borislav Petkov 415601b191 x86/mce: Update notifier priority check
Update the check which enforces the registration of MCE decoder notifier
callbacks with valid priority only, to include mcelog's priority.

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20170418073820.i6kl5tggcntwlisa@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-18 10:27:52 +02:00
Vikas Shivappa 64e8ed3d4a x86/intel_rdt/mba: Add schemata file support for MBA
Add support to update the MBA bandwidth values for the domains via the
schemata file.

 - Verify that the bandwidth value is valid

 - Round to the next control step depending on the bandwidth granularity of
   the hardware

 - Convert the bandwidth to delay values and write the delay values to
   the corresponding domain PQOS_MSRs.

[ tglx: Massaged changelog ]

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-9-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:09 +02:00
Vikas Shivappa c6ea67de52 x86/intel_rdt: Make schemata file parsers resource specific
The schemata files are the user space interface to update resource
controls. The parser is hardwired to support only cache resources, which do
not fit the requirements of memory resources.

Add a function pointer for a parser to the struct rdt_resource and switch
the cache parsing over.

[ tglx: Massaged changelog ]

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-8-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:09 +02:00
Vikas Shivappa db69ef6563 x86/intel_rdt/mba: Add info directory files for Memory Bandwidth Allocation
The files in the info directory for MBA are as follows:

 num_closids
 	The maximum number of CLOSids available for MBA

 min_bandwidth
 	The minimum memory bandwidth percentage value

 bandwidth_gran
 	The granularity of the bandwidth control in percent for the
	particular CPU SKU. Intermediate values entered are rounded off
	to the previous control step available. Available bandwidth
	control steps are minimum_bandwidth + N * bandwidth_gran.

 delay_linear
 	When set, the OS writes a linear percentage based value to the
	control MSRs ranging from minimum_bandwidth to 100 percent.

	This value is informational and has no influence on the values
	written to the schemata files. The values written to the
	schemata are always bandwidth percentage that is requested.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-7-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:08 +02:00
Vikas Shivappa 6a507a6ad8 x86/intel_rdt: Make information files resource specific
Cache allocation and memory bandwidth allocation require different
information files in the resctrl/info directory, but the current
implementation does not allow to have files per resource.

Add the necessary fields to the resource struct and assign the files
dynamically depending on the resource type.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-6-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:08 +02:00
Vikas Shivappa 05b93417ce x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)
The MBA feature details like minimum bandwidth supported, bandwidth
granularity etc are obtained via executing CPUID with EAX=10H ,ECX=3.

Setup and initialize the MBA specific extensions to data structures like
global list of RDT resources, RDT resource structure and RDT domain
structure.

[ tglx: Split out the seperate structure and the CBM related parts ]

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-5-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:08 +02:00
Vikas Shivappa ab66a33b03 x86/intel_rdt/mba: Memory bandwith allocation feature detect
Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
Add supporting data structures to detect feature details which is done
in later patch using CPUID with EAX=10H, ECX= 3.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:07 +02:00
Thomas Gleixner 0921c54769 x86/intel_rdt: Add resource specific msr update function
Updating of Cache and Memory bandwidth QOS MSRs is different.

Add a function pointer to struct rdt_resource and convert the cache part
over.

Based on Vikas all in one patch^Wmess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:07 +02:00
Thomas Gleixner d3e11b4d6f x86/intel_rdt: Move CBM specific data into a struct
Memory bandwidth allocation requires different information than cache
allocation.

To avoid a lump of data in struct rdt_resource, move all cache related
information into a seperate structure and add that to struct rdt_resource.

Sanitize the data types while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:07 +02:00
Vikas Shivappa 2545e9f51e x86/intel_rdt: Cleanup namespace to support multiple resource types
Lot of data structures and functions are named after cache specific
resources(named after cbm, cache etc). In many cases other non cache
resources may need to share the same data structures/functions.

Generalize such naming to prepare to add more resources like memory
bandwidth.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1491611637-20417-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 16:10:07 +02:00
Thomas Gleixner 70a1ee9256 x86/intel_rdt: Organize code properly
Having init functions at random places in the middle of the code is
unintuitive.

Move them close to the init routine and mark them __init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:06 +02:00
Thomas Gleixner 06b57e4550 x86/intel_rdt: Init padding only if a device exists
If no device exists it's pointless to calculate the padding data for the
schemata files.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
2017-04-14 16:10:06 +02:00
Piotr Luc 9ea74f7c70 x86/mce: Enable PPIN for Knights Landing/Mill
Intel Xeon Phi processors (KNL and KNM) support PPIN as well, so add their
CPUIDs to the whitelist of supported processors.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170408172004.8463-1-piotr.luc@intel.com
Link: http://lkml.kernel.org/r/20170413201056.10525-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14 10:46:12 +02:00
Ingo Molnar b6466d53af Merge branch 'x86/urgent' into x86/cpu, to resolve conflict
Conflicts:
	arch/x86/kernel/cpu/intel_rdt_schemata.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 10:47:28 +02:00
Jiri Olsa 7f00f38871 x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
The schemata lock is released before freeing the resource's temporary
tmp_cbms allocation. That's racy versus another write which allocates and
uses new temporary storage, resulting in memory leaks, freeing in use
memory, double a free or any combination of those.

Move the unlock after the release code.

Fixes: 60ec2440c6 ("x86/intel_rdt: Add schemata file")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170411071446.15241-1-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-11 09:48:12 +02:00
Ingo Molnar e5185a76a2 Merge branch 'x86/boot' into x86/mm, to avoid conflict
There's a conflict between ongoing level-5 paging support and
the E820 rewrite. Since the E820 rewrite is essentially ready,
merge it into x86/mm to reduce tree conflicts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:56:05 +02:00
Ingo Molnar 4729277156 Merge branch 'WIP.x86/boot' into x86/boot, to pick up ready branch
The E820 rework in WIP.x86/boot has gone through a couple of weeks
of exposure in -tip, merge it in a wider fashion.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 08:49:31 +02:00
Jiri Olsa 4ffa3c977b x86/intel_rdt: Add cpus_list rdtgroup file
The resource control filesystem provides only a bitmask based cpus file for
assigning CPUs to a resource group. That's cumbersome with large cpumasks
and non-intuitive when modifying the file from the command line.

Range based cpu lists are commonly used along with bitmask based cpu files
in various subsystems throughout the kernel.

Add 'cpus_list' file which is CPU range based.

  # cd /sys/fs/resctrl/
  # echo 1-10 > krava/cpus_list
  # cat krava/cpus_list
  1-10
  # cat krava/cpus
  0007fe
  # cat cpus
  fffff9
  # cat cpus_list
  0,3-23

[ tglx: Massaged changelog and replaced "bitmask lists" by "CPU ranges" ]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20170410145232.GF25354@krava
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-10 19:10:25 +02:00
K. Y. Srinivasan a33fd4c27b Drivers: hv: Issue explicit EOI when autoeoi is not enabled
When auto EOI is not enabled; issue an explicit EOI for hyper-v
interrupts.

Fixes: 6c248aad81 ("Drivers: hv: Base autoeoi enablement based on hypervisor hints")

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 18:07:51 +02:00
Vikas Shivappa de016df88f x86/intel_rdt: Update schemata read to show data in tabular format
The schemata file displays data from different resources on all
domains. Its cumbersome to read since they are not tabular and data/names
could be of different widths.  Make the schemata file to display data in a
tabular format thereby making it nice and simple to read.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: vikas.shivappa@intel.com
Cc: h.peter.anvin@intel.com
Link: http://lkml.kernel.org/r/1491255857-17213-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05 17:22:31 +02:00
Tony Luck c4026b7b95 x86/intel_rdt: Implement "update" mode when writing schemata file
The schemata file can have multiple lines and it is cumbersome to update
all lines.

Remove code that requires that the user provides values for every resource
(in the right order).  If the user provides values for just a few
resources, update them and leave the rest unchanged.

Side benefit: we now check which values were updated and only send IPIs to
cpus that actually have updates.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: ravi.v.shankar@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: vikas.shivappa@intel.com
Cc: h.peter.anvin@intel.com
Link: http://lkml.kernel.org/r/1491255857-17213-3-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-05 17:22:31 +02:00
Linus Torvalds 3ccfcdc9ef Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fix from Thomas Gleixner:
 "Prevent dmesg from being spammed when MCE logging is active"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Don't print MCEs when mcelog is active
2017-04-03 08:36:24 -07:00
Ingo Molnar 7f75540ff2 Linux 4.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY4ZYkAAoJEHm+PkMAQRiGsq4H/R4PMXDoe2XhSSk7IoT97pXV
 /A8np/scAPjzEgYUidbb54OSqWwsPRuPGWONTFeSrE2u0L4wln/REI91jg7QetLq
 IisncExlYeJ/XQ+iO0ZZh9fLbqwIlEJFdSXmyIFr3m/TBxe8a61C8j93oNgM1tHT
 yuwzlq7c3sLq2hsmUG2HyL2kJsEfRasv4Rk0yhFuti12zVsBoTW4qmZuMauq+gdf
 f7cSYgiHhPTdb2o+azg5O7uYNHaQQBxdUMlIuhhYtVOUq+pFDO23SLHSFIW2NwOm
 Zn5R6CFSrLsCw0Bx0v8Xlc151QUbaRK4h9lhUhkBr6d3uNShU1NQ9JojpSvYwBo=
 =vP6E
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03 16:36:32 +02:00
Yazen Ghannam 29f72ce3e4 x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name.
However, MCA bank 3 is defined on Fam17h systems and can be accessed
using legacy MSRs. Without a name we get a stack trace on Fam17h systems
when trying to register sysfs files for bank 3 on kernels that don't
recognize Scalable MCA.

Call MCA bank 3 "decode_unit" since this is what it represents on
Fam17h. This will allow kernels without SMCA support to see this bank on
Fam17h+ and prevent the stack trace. This will not affect older systems
since this bank is reserved on them, i.e. it'll be ignored.

Tested on AMD Fam15h and Fam17h systems.

  WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal
  kobject: (ffff88085bb256c0): attempted to be registered with empty name!
  ...
  Call Trace:
   kobject_add_internal
   kobject_add
   kobject_create_and_add
   threshold_create_device
   threshold_init_device

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-31 10:09:44 +02:00
Ingo Molnar 73fa1362a7 Merge branch 'x86/cpu' into x86/mm, before applying dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:07:54 +02:00
Borislav Petkov 32b40a82e8 x86/mce: Do not register notifiers with invalid prio
This is just a defensive precaution: do not register notifiers with a
priority which would disrupt the error handling in the notifiers with
prio higher than MCE_PRIO_EDAC.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:55:15 +02:00
Tony Luck 5de97c9f6d x86/mce: Factor out and deprecate the /dev/mcelog driver
Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de
[ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:55:01 +02:00
Borislav Petkov 011d826111 RAS: Add a Corrected Errors Collector
Introduce a simple data structure for collecting correctable errors
along with accessors. More detailed description in the code itself.

The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.

When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:48 +02:00
Borislav Petkov e64edfcce9 x86/mce: Rename mce_log to mce_log_buffer
It is confusing when staring at "struct mce_log mcelog" and then there's
also a function called mce_log(). So call the buffer what it is.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:42 +02:00
Borislav Petkov fe3ed20fdd x86/mce: Rename mce_log()'s argument
We call it everywhere "struct mce *m". Adjust that here too to avoid
confusion.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:38 +02:00
Ingo Molnar 4a96d1a5f0 Merge branch 'ras/urgent' into ras/core, to pick up fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:54:25 +02:00
Andi Kleen cc66afea58 x86/mce: Don't print MCEs when mcelog is active
Since:

  cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")

all MCEs are printed even when mcelog is running. Fix the regression to
not print to dmesg when mcelog is running as it is a consumer too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.10..
Fixes: cd9c57cad3 ("x86/MCE: Dump MCE to dmesg if no consumers")
Link: http://lkml.kernel.org/r/20170327093304.10683-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-28 08:53:52 +02:00
Ingo Molnar 1f9ca18404 Merge branch 'x86/process' into x86/mm, to create new base for further patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:28:19 +01:00
Andy Lutomirski b23adb7d3f x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for the GDT fixup
Xen imposes special requirements on the GDT.  Rather than using a
global variable for the pgprot, just use an explicit special case
for Xen -- this makes it clearer what's going on.  It also debloats
64-bit kernels very slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e9ea96abbfd6a8c87753849171bb5987ecfeb523.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:08 +01:00
Andy Lutomirski aa4ea67552 x86/gdt: Fix setup_fixmap_gdt() to use the correct PA
__pa() cannot be used on percpu pointers because they may be
virtually mapped.  Use per_cpu_ptr_to_phys() instead.

This fixes a boot crash on a some 32-bit configurations.  I assume
this is related to which allocation strategy is chosen by the percpu
core.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 69218e4799 x86: ("Remap GDT tables in the fixmap section")
Link: http://lkml.kernel.org/r/22e0069c29fba31998f193201e359eebfdac4960.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:07 +01:00
Kyle Huey e9ea1e7f53 x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.

When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991

Implement a new pair of arch_prctls, available on both x86-32 and x86-64.

ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
    is enabled (and thus the CPUID instruction is not available) or 1 if
    CPUID faulting is not enabled.

ARCH_SET_CPUID: Set the CPUID state to the second argument. If
    cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
    be deactivated. Returns ENODEV if CPUID faulting is not supported on
    this system.

The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:34 +01:00
Kyle Huey 90218ac77d x86/cpufeature: Detect CPUID faulting support
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. This will allow a ptracer to emulate the CPUID
instruction.

Bit 31 of MSR_PLATFORM_INFO advertises support for this feature. It is
documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991

Detect support for this feature and expose it as X86_FEATURE_CPUID_FAULT.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-8-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:34 +01:00
Kyle Huey ab6d946863 x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES
This matches the only public Intel documentation of this MSR, in the
"Virtualization Technology FlexMigration Application Note"
(preserved at https://bugzilla.kernel.org/attachment.cgi?id=243991)

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-2-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:32 +01:00
Yazen Ghannam 5204bf1703 x86/mce: Init some CPU features early
When the MCA banks in __mcheck_cpu_init_generic() are polled for leftover
errors logged during boot or from the previous boot, its required to have
CPU features detected sufficiently so that the reading out and handling of
those early errors is done correctly.

If those features are not available, the decoding may miss some information
and get incomplete errors logged. For example, on SMCA systems the MCA_IPID
and MCA_SYND registers are not logged and MCA_ADDR is not masked
appropriately.

To cure that, do a subset of the basic feature detection early while the
rest happens in its usual place in __mcheck_cpu_init_vendor().

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1489599055-20756-1-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message and simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-18 13:03:44 +01:00
Colin Ian King cf8178f786 x86/microcode/AMD: Remove redundant NULL check on mc
mc is a pointer to the static u8 array amd_ucode_patch and
therefore can never be null, so the check is redundant. Remove it.

Detected by CoverityScan, CID#1372871 ("Logically Dead Code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: kernel-janitors@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20170315171010.17536-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-18 12:31:58 +01:00
Thomas Garnier 45fc8757d1 x86: Make the GDT remapping read-only on 64-bit
This patch makes the GDT remapped pages read-only, to prevent accidental
(or intentional) corruption of this key data structure.

This change is done only on 64-bit, because 32-bit needs it to be writable
for TSS switches.

The native_load_tr_desc function was adapted to correctly handle a
read-only GDT. The LTR instruction always writes to the GDT TSS entry.
This generates a page fault if the GDT is read-only. This change checks
if the current GDT is a remap and swap GDTs as needed. This function was
tested by booting multiple machines and checking hibernation works
properly.

KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the
per-cpu variable was removed for functions to fetch the original GDT.
Instead of reloading the previous GDT, VMX will reload the fixmap GDT as
expected. For testing, VMs were started and restored on multiple
configurations.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:06:35 +01:00
Thomas Garnier 69218e4799 x86: Remap GDT tables in the fixmap section
Each processor holds a GDT in its per-cpu structure. The sgdt
instruction gives the base address of the current GDT. This address can
be used to bypass KASLR memory randomization. With another bug, an
attacker could target other per-cpu structures or deduce the base of
the main memory section (PAGE_OFFSET).

This patch relocates the GDT table for each processor inside the
fixmap section. The space is reserved based on number of supported
processors.

For consistency, the remapping is done by default on 32 and 64-bit.

Each processor switches to its remapped GDT at the end of
initialization. For hibernation, the main processor returns with the
original GDT and switches back to the remapping at completion.

This patch was tested on both architectures. Hibernation and KVM were
both tested specially for their usage of the GDT.

Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
recommending changes for Xen support.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:06:35 +01:00
Jiri Olsa 49ec8f5b6a x86/intel_rdt: Put group node in rdtgroup_kn_unlock
The rdtgroup_kn_unlock waits for the last user to release and put its
node. But it's calling kernfs_put on the node which calls the
rdtgroup_kn_unlock, which might not be the group's directory node, but
another group's file node.

This race could be easily reproduced by running 2 instances
of following script:

  mount -t resctrl resctrl /sys/fs/resctrl/
  pushd /sys/fs/resctrl/
  mkdir krava
  echo "krava" > krava/schemata
  rmdir krava
  popd
  umount  /sys/fs/resctrl

It triggers the slub debug error message with following command
line config: slub_debug=,kernfs_node_cache.

Call kernfs_put on the group's node to fix it.

Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1489501253-20248-1-git-send-email-jolsa@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-14 21:51:58 +01:00
Xunlei Pang 5bc329503e x86/mce: Handle broadcasted MCE gracefully with kexec
When we are about to kexec a crash kernel and right then and there a
broadcasted MCE fires while we're still in the first kernel and while
the other CPUs remain in a holding pattern, the #MC handler of the
first kernel will timeout and then panic due to never completing MCE
synchronization.

Handle this in a similar way as to when the CPUs are offlined when that
broadcasted MCE happens.

[ Boris: rewrote commit message and comments. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: kexec@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487857012-9059-1-git-send-email-xlpang@redhat.com
Link: http://lkml.kernel.org/r/20170313095019.19351-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13 20:18:07 +01:00
Mathias Krause 6415813bae x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86
Remove the wp_works_ok member of struct cpuinfo_x86. It's an
optimization back from Linux v0.99 times where we had no fixup support
yet and did the CR0.WP test via special code in the page fault handler.
The < 0 test was an optimization to not do the special casing for each
NULL ptr access violation but just for the first one doing the WP test.
Today it serves no real purpose as the test no longer needs special code
in the page fault handler and the only call side -- mem_init() -- calls
it just once, anyway. However, Xen pre-initializes it to 1, to skip the
test.

Doing the test again for Xen should be no issue at all, as even the
commit introducing skipping the test (commit d560bc6157 ("x86, xen:
Suppress WP test on Xen")) mentioned it being ban aid only. And, in
fact, testing the patch on Xen showed nothing breaks.

The pre-fixup times are long gone and with the removal of the fallback
handling code in commit a5c2a893db ("x86, 386 removal: Remove
CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway.
So just get rid of the "optimization" and do the test unconditionally.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11 14:30:24 +01:00
Linus Torvalds ec3b93ae0b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes and minor updates all over the place:

   - an SGI/UV fix
   - a defconfig update
   - a build warning fix
   - move the boot_params file to the arch location in debugfs
   - a pkeys fix
   - selftests fix
   - boot message fixes
   - sparse fixes
   - a resume warning fix
   - ioapic hotplug fixes
   - reboot quirks

  ... plus various minor cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build/x86_64_defconfig: Enable CONFIG_R8169
  x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk
  x86/hpet: Prevent might sleep splat on resume
  x86/boot: Correct setup_header.start_sys name
  x86/purgatory: Fix sparse warning, symbol not declared
  x86/purgatory: Make functions and variables static
  x86/events: Remove last remnants of old filenames
  x86/pkeys: Check against max pkey to avoid overflows
  x86/ioapic: Split IOAPIC hot-removal into two steps
  x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC
  x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h
  x86/vmware: Remove duplicate inclusion of asm/timer.h
  x86/hyperv: Hide unused label
  x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk
  x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
  x86/selftests: Add clobbers for int80 on x86_64
  x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()
  x86/apic: Fix a warning message in logical CPU IDs allocation
  x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/
2017-03-07 14:47:24 -08:00
Linus Torvalds 609b07b72d Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A fix for KVM's scheduler clock which (erroneously) was always marked
  unstable, a fix for RT/DL load balancing, plus latency fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
  sched/core: Fix pick_next_task() for RT,DL
  sched/fair: Make select_idle_cpu() more aggressive
2017-03-07 14:42:34 -08:00
Ingo Molnar 68e21be291 sched/headers: Move task->mm handling methods to <linux/sched/mm.h>
Move the following task->mm helper APIs into a new header file,
<linux/sched/mm.h>, to further reduce the size and complexity
of <linux/sched.h>.

Here are how the APIs are used in various kernel files:

  # mm_alloc():
  arch/arm/mach-rpc/ecard.c
  fs/exec.c
  include/linux/sched/mm.h
  kernel/fork.c

  # __mmdrop():
  arch/arc/include/asm/mmu_context.h
  include/linux/sched/mm.h
  kernel/fork.c

  # mmdrop():
  arch/arm/mach-rpc/ecard.c
  arch/m68k/sun3/mmu_emu.c
  arch/x86/mm/tlb.c
  drivers/gpu/drm/amd/amdkfd/kfd_process.c
  drivers/gpu/drm/i915/i915_gem_userptr.c
  drivers/infiniband/hw/hfi1/file_ops.c
  drivers/vfio/vfio_iommu_spapr_tce.c
  fs/exec.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  fs/proc/task_nommu.c
  fs/userfaultfd.c
  include/linux/mmu_notifier.h
  include/linux/sched/mm.h
  kernel/fork.c
  kernel/futex.c
  kernel/sched/core.c
  mm/khugepaged.c
  mm/ksm.c
  mm/mmu_context.c
  mm/mmu_notifier.c
  mm/oom_kill.c
  virt/kvm/kvm_main.c

  # mmdrop_async_fn():
  include/linux/sched/mm.h

  # mmdrop_async():
  include/linux/sched/mm.h
  kernel/fork.c

  # mmget_not_zero():
  fs/userfaultfd.c
  include/linux/sched/mm.h
  mm/oom_kill.c

  # mmput():
  arch/arc/include/asm/mmu_context.h
  arch/arc/kernel/troubleshoot.c
  arch/frv/mm/mmu-context.c
  arch/powerpc/platforms/cell/spufs/context.c
  arch/sparc/include/asm/mmu_context_32.h
  drivers/android/binder.c
  drivers/gpu/drm/etnaviv/etnaviv_gem.c
  drivers/gpu/drm/i915/i915_gem_userptr.c
  drivers/infiniband/core/umem.c
  drivers/infiniband/core/umem_odp.c
  drivers/infiniband/core/uverbs_main.c
  drivers/infiniband/hw/mlx4/main.c
  drivers/infiniband/hw/mlx5/main.c
  drivers/infiniband/hw/usnic/usnic_uiom.c
  drivers/iommu/amd_iommu_v2.c
  drivers/iommu/intel-svm.c
  drivers/lguest/lguest_user.c
  drivers/misc/cxl/fault.c
  drivers/misc/mic/scif/scif_rma.c
  drivers/oprofile/buffer_sync.c
  drivers/vfio/vfio_iommu_type1.c
  drivers/vhost/vhost.c
  drivers/xen/gntdev.c
  fs/exec.c
  fs/proc/array.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  fs/proc/task_nommu.c
  fs/userfaultfd.c
  include/linux/sched/mm.h
  kernel/cpuset.c
  kernel/events/core.c
  kernel/events/uprobes.c
  kernel/exit.c
  kernel/fork.c
  kernel/ptrace.c
  kernel/sys.c
  kernel/trace/trace_output.c
  kernel/tsacct.c
  mm/memcontrol.c
  mm/memory.c
  mm/mempolicy.c
  mm/migrate.c
  mm/mmu_notifier.c
  mm/nommu.c
  mm/oom_kill.c
  mm/process_vm_access.c
  mm/rmap.c
  mm/swapfile.c
  mm/util.c
  virt/kvm/async_pf.c

  # mmput_async():
  include/linux/sched/mm.h
  kernel/fork.c
  mm/oom_kill.c

  # get_task_mm():
  arch/arc/kernel/troubleshoot.c
  arch/powerpc/platforms/cell/spufs/context.c
  drivers/android/binder.c
  drivers/gpu/drm/etnaviv/etnaviv_gem.c
  drivers/infiniband/core/umem.c
  drivers/infiniband/core/umem_odp.c
  drivers/infiniband/hw/mlx4/main.c
  drivers/infiniband/hw/mlx5/main.c
  drivers/infiniband/hw/usnic/usnic_uiom.c
  drivers/iommu/amd_iommu_v2.c
  drivers/iommu/intel-svm.c
  drivers/lguest/lguest_user.c
  drivers/misc/cxl/fault.c
  drivers/misc/mic/scif/scif_rma.c
  drivers/oprofile/buffer_sync.c
  drivers/vfio/vfio_iommu_type1.c
  drivers/vhost/vhost.c
  drivers/xen/gntdev.c
  fs/proc/array.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  include/linux/sched/mm.h
  kernel/cpuset.c
  kernel/events/core.c
  kernel/exit.c
  kernel/fork.c
  kernel/ptrace.c
  kernel/sys.c
  kernel/trace/trace_output.c
  kernel/tsacct.c
  mm/memcontrol.c
  mm/memory.c
  mm/mempolicy.c
  mm/migrate.c
  mm/mmu_notifier.c
  mm/nommu.c
  mm/util.c

  # mm_access():
  fs/proc/base.c
  include/linux/sched/mm.h
  kernel/fork.c
  mm/process_vm_access.c

  # mm_release():
  arch/arc/include/asm/mmu_context.h
  fs/exec.c
  include/linux/sched/mm.h
  include/uapi/linux/sched.h
  kernel/exit.c
  kernel/fork.c

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-03 01:43:28 +01:00
Peter Zijlstra f94c8d1169 sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
Wanpeng Li reported that since the following commit:

  acb04058de ("sched/clock: Fix hotplug crash")

... KVM always runs with unstable sched-clock even though KVM's
kvm_clock _is_ stable.

The problem is that we've tied clear_sched_clock_stable() to the TSC
state, and overlooked that sched_clock() is a paravirt function.

Solve this by doing two things:

 - tie the sched_clock() stable state more clearly to the TSC stable
   state for the normal (!paravirt) case.

 - only call clear_sched_clock_stable() when we mark TSC unstable
   when we use native_sched_clock().

The first means we can actually run with stable sched_clock in more
situations then before, which is good. And since commit:

  12907fbb1a ("sched/clock, clocksource: Add optional cs::mark_unstable() method")

... this should be reliable. Since any detection of TSC fail now results
in marking the TSC unstable.

Reported-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: acb04058de ("sched/clock: Fix hotplug crash")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:50:49 +01:00
Ingo Molnar 9164bb4a18 sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>
Update all usage sites first.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:38 +01:00
Ingo Molnar 299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar 5b825c3af1 sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because <linux/sched.h> is included in over
2,200 files ...

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:31 +01:00
Ingo Molnar 3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar e601757102 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Masanari Iida e86a2d2d34 x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Cc: fenghua.yu@intel.com
Link: http://lkml.kernel.org/r/20170227130703.26968-1-standby24x7@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-01 10:51:41 +01:00
Masanari Iida aa5ec3f715 x86/vmware: Remove duplicate inclusion of asm/timer.h
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Cc: akataria@vmware.com
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170227122922.26230-1-standby24x7@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-01 10:51:40 +01:00
Ingo Molnar 0871d5a66d Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up updates
Conflicts:
	arch/x86/xen/setup.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 09:02:26 +01:00
Linus Torvalds f89db789de Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two documentation updates, plus a debugging annotation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/crash: Update the stale comment in reserve_crashkernel()
  x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
  Documentation, x86, resctrl: Recommend locking for resctrlfs
2017-02-28 11:46:00 -08:00
Vegard Nossum f1f1007644 mm: add new mmgrab() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Linus Torvalds e30aee9e10 char/misc driver patches for 4.11-rc1
Here is the big char/misc driver patchset for 4.11-rc1.
 
 Lots of different driver subsystems updated here.  Rework for the hyperv
 subsystem to handle new platforms better, mei and w1 and extcon driver
 updates, as well as a number of other "minor" driver updates.  Full
 details are in the shortlog below.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2iRQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynhFACguVE+/ixj5u5bT5DXQaZNai/6zIAAmgMWwd/t
 YTD2cwsJsGbTT1fY3SUe
 =CiSI
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc driver patchset for 4.11-rc1.

  Lots of different driver subsystems updated here: rework for the
  hyperv subsystem to handle new platforms better, mei and w1 and extcon
  driver updates, as well as a number of other "minor" driver updates.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits)
  goldfish: Sanitize the broken interrupt handler
  x86/platform/goldfish: Prevent unconditional loading
  vmbus: replace modulus operation with subtraction
  vmbus: constify parameters where possible
  vmbus: expose hv_begin/end_read
  vmbus: remove conditional locking of vmbus_write
  vmbus: add direct isr callback mode
  vmbus: change to per channel tasklet
  vmbus: put related per-cpu variable together
  vmbus: callback is in softirq not workqueue
  binder: Add support for file-descriptor arrays
  binder: Add support for scatter-gather
  binder: Add extra size to allocator
  binder: Refactor binder_transact()
  binder: Support multiple /dev instances
  binder: Deal with contexts in debugfs
  binder: Support multiple context managers
  binder: Split flat_binder_object
  auxdisplay: ht16k33: remove private workqueue
  auxdisplay: ht16k33: rework input device initialization
  ...
2017-02-22 11:38:22 -08:00
Linus Torvalds a25a1d6c24 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Ingo Molnar:
 "The main changes are further simplification and unification of the
  code between the AMD and Intel microcode loaders, plus other
  simplifications - by Borislav Petkov"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Remove struct cont_desc.eq_id
  x86/microcode/AMD: Remove AP scanning optimization
  x86/microcode/AMD: Simplify saving from initrd
  x86/microcode/AMD: Unify load_ucode_amd_ap()
  x86/microcode/AMD: Check patch level only on the BSP
  x86/microcode: Remove local vendor variable
  x86/microcode/AMD: Use find_microcode_in_initrd()
  x86/microcode/AMD: Get rid of global this_equiv_id
  x86/microcode: Decrease CPUID use
  x86/microcode/AMD: Rework container parsing
  x86/microcode/AMD: Extend the container struct
  x86/microcode/AMD: Shorten function parameter's name
  x86/microcode/AMD: Clean up find_equiv_id()
  x86/microcode: Convert to bare minimum MSR accessors
  x86/MSR: Carve out bare minimum accessors
2017-02-20 15:30:51 -08:00
Linus Torvalds 280d7a1ede Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The main changes relate to fixes between (lack of) CPUID and FPU
  detection that should only affect old or weird CPUs, by Andy
  Lutomirski"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Fix the "Giving up, no FPU found" test
  x86/fpu: Fix CPUID-less FPU detection
  x86/fpu: Fix "x86/fpu: Legacy x87 FPU detected" message
  x86/cpu: Re-apply forced caps every time CPU caps are re-read
  x86/cpu: Factor out application of forced CPU caps
  x86/cpu: Add X86_FEATURE_CPUID
  x86/fpu/xstate: Move XSAVES state init to a function
2017-02-20 15:03:51 -08:00
Linus Torvalds 8a9365a472 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "The main changes in this cycle were related to enable ring-3
  MONITOR/MWAIT instructions support on supported CPUs, by Grzegorz
  Andrejczuk and Piotr Luc"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpufeature: Move RING3MWAIT feature to avoid conflicts
  x86/cpufeature: Enable RING3MWAIT for Knights Mill
  x86/cpufeature: Enable RING3MWAIT for Knights Landing
  x86/cpufeature: Add RING3MWAIT to CPU features
  x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
  x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit
  x86/cpufeature: Add AVX512_VPOPCNTDQ feature
2017-02-20 14:37:08 -08:00
Linus Torvalds 828cad8ea0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this (fairly busy) cycle were:

   - There was a class of scheduler bugs related to forgetting to update
     the rq-clock timestamp which can cause weird and hard to debug
     problems, so there's a new debug facility for this: which uncovered
     a whole lot of bugs which convinced us that we want to keep the
     debug facility.

     (Peter Zijlstra, Matt Fleming)

   - Various cputime related updates: eliminate cputime and use u64
     nanoseconds directly, simplify and improve the arch interfaces,
     implement delayed accounting more widely, etc. - (Frederic
     Weisbecker)

   - Move code around for better structure plus cleanups (Ingo Molnar)

   - Move IO schedule accounting deeper into the scheduler plus related
     changes to improve the situation (Tejun Heo)

   - ... plus a round of sched/rt and sched/deadline fixes, plus other
     fixes, updats and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
  sched/core: Remove unlikely() annotation from sched_move_task()
  sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
  sched/topology: Split out scheduler topology code from core.c into topology.c
  sched/core: Remove unnecessary #include headers
  sched/rq_clock: Consolidate the ordering of the rq_clock methods
  delayacct: Include <uapi/linux/taskstats.h>
  sched/core: Clean up comments
  sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
  sched/clock: Add dummy clear_sched_clock_stable() stub function
  sched/cputime: Remove generic asm headers
  sched/cputime: Remove unused nsec_to_cputime()
  s390, sched/cputime: Remove unused cputime definitions
  powerpc, sched/cputime: Remove unused cputime definitions
  s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
  ia64, sched/cputime: Remove unused cputime definitions
  ia64: Convert vtime to use nsec units directly
  ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
  sched/cputime: Remove jiffies based cputime
  sched/cputime, vtime: Return nsecs instead of cputime_t to account
  sched/cputime: Complete nsec conversion of tick based accounting
  ...
2017-02-20 12:52:55 -08:00
Linus Torvalds 60c906bab1 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

  - Assign notifier chain priorities for all RAS related handlers to
    make the ordering explicit (Borislav Petkov)

  - Improve the AMD MCA banks sysfs output (Yazen Ghannam)

  - Various cleanups and restructuring of the x86 RAS code (Borislav
    Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
  x86/ras: Get rid of mce_process_work()
  EDAC/mce/amd: Dump TSC value
  EDAC/mce/amd: Unexport amd_decode_mce()
  x86/ras/amd/inj: Change dependency
  x86/ras: Flip the TSC-adding logic
  x86/ras/amd: Make sysfs names of banks more user-friendly
  x86/ras/therm_throt: Do not log a fake MCE for thermal events
  x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
2017-02-20 12:47:44 -08:00
Ingo Molnar 8312593a55 Merge branches 'x86/cache', 'x86/debug' and 'x86/irq' into x86/urgent
Pick up simple singular commits from their topic branches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-20 14:16:58 +01:00
Greg Kroah-Hartman 17fa87fe5a Merge 4.10-rc7 into char-misc-next
We want the hv and other fixes in here as well to handle merge and
testing issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-06 09:39:13 +01:00
Yazen Ghannam 08b259631b x86/CPU/AMD: Fix Zen SMT topology
After:

  a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")

our  SMT scheduling topology for Fam17h systems is broken, because
the ThreadId is included in the ApicId when SMT is enabled.

So, without further decoding cpu_core_id is unique for each thread
rather than the same for threads on the same core. This didn't affect
systems with SMT disabled. Make cpu_core_id be what it is defined to be.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170205105022.8705-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-05 12:18:45 +01:00
Borislav Petkov 79a8b9aa38 x86/CPU/AMD: Bring back Compute Unit ID
Commit:

  a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")

restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.

Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.

That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.

When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.

Reported-by: Yves Dionne <yves.dionne@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-05 12:18:45 +01:00
Piotr Luc 4d8bb00604 x86/cpufeature: Enable RING3MWAIT for Knights Mill
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi codenamed Knights Mill. We
can't guarantee that this (KNM) will be the last CPU model that needs this
hack.  But, we do recognize that this is far from optimal, and there is an
effort to ensure we don't keep doing extending this hack forever.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-6-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-05 00:19:52 +01:00
Grzegorz Andrejczuk e16fd002af x86/cpufeature: Enable RING3MWAIT for Knights Landing
Enable ring 3 MONITOR/MWAIT for Intel Xeon Phi x200 codenamed Knights
Landing.

Presence of this feature cannot be detected automatically (by reading any
other MSR) therefore it is required to explicitly check for the family and
model of the CPU before attempting to enable it.

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-5-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 08:51:09 +01:00
Grzegorz Andrejczuk 0274f9551e x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the
ring 3 MONITOR/MWAIT.

HWCAP variables contain bitmasks which can be used by userspace
applications to detect which instruction sets are supported by CPU.  On x86
architecture information about CPU capabilities can be checked via CPUID
instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot
be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is
set to CPUID[1].EDX which means that all bits are reserved there.

HWCAP2 approach was chosen because it reuses existing solution present
in other architectures, so only minor modifications are required to the
kernel and userspace applications. When ELF_HWCAP2 is defined
kernel maps it to AT_HWCAP2 during the start of the application.
This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval()
API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent
with x86 ELF_HWCAP type.

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 08:51:09 +01:00
Ingo Molnar ed5c8c854f Merge branch 'linus' into sched/core, to pick up fixes and refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:12:25 +01:00
Thomas Gleixner 0becc0ae5b x86/mce: Make timer handling more robust
Erik reported that on a preproduction hardware a CMCI storm triggers the
BUG_ON in add_timer_on(). The reason is that the per CPU MCE timer is
started by the CMCI logic before the MCE CPU hotplug callback starts the
timer with add_timer_on(). So the timer is already queued which triggers
the BUG.

Using add_timer_on() is pretty pointless in this code because the timer is
strictlty per CPU, initialized as pinned and all operations which arm the
timer happen on the CPU to which the timer belongs.

Simplify the whole machinery by using mod_timer() instead of add_timer_on()
which avoids the problem because mod_timer() can handle already queued
timers. Use __start_timer() everywhere so the earliest armed expiry time is
preserved.

Reported-by: Erik Veijola <erik.veijola@intel.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701310936080.3457@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-31 21:47:58 +01:00
Ingo Molnar f26483eaed Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts
Conflicts:
  arch/x86/kernel/cpu/microcode/amd.c
  arch/x86/kernel/cpu/microcode/core.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-31 08:38:17 +01:00
Borislav Petkov 24c2503255 x86/microcode: Do not access the initrd after it has been freed
When we look for microcode blobs, we first try builtin and if that
doesn't succeed, we fallback to the initrd supplied to the kernel.

However, at some point doing boot, that initrd gets jettisoned and we
shouldn't access it anymore. But we do, as the below KASAN report shows.
That's because find_microcode_in_initrd() doesn't check whether the
initrd is still valid or not.

So do that.

  ==================================================================
  BUG: KASAN: use-after-free in find_cpio_data
  Read of size 1 by task swapper/1/0
  page:ffffea0000db9d40 count:0 mapcount:0 mapping:          (null) index:0x1
  flags: 0x100000000000000()
  raw: 0100000000000000 0000000000000000 0000000000000001 00000000ffffffff
  raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
  page dumped because: kasan: bad access detected
  CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.10.0-rc5-debug-00075-g2dbde22 #3
  Hardware name: Dell Inc. XPS 13 9360/0839Y6, BIOS 1.2.3 12/01/2016
  Call Trace:
   dump_stack
   ? _atomic_dec_and_lock
   ? __dump_page
   kasan_report_error
   ? pointer
   ? find_cpio_data
   __asan_report_load1_noabort
   ? find_cpio_data
   find_cpio_data
   ? vsprintf
   ? dump_stack
   ? get_ucode_user
   ? print_usage_bug
   find_microcode_in_initrd
   __load_ucode_intel
   ? collect_cpu_info_early
   ? debug_check_no_locks_freed
   load_ucode_intel_ap
   ? collect_cpu_info
   ? trace_hardirqs_on
   ? flat_send_IPI_mask_allbutself
   load_ucode_ap
   ? get_builtin_firmware
   ? flush_tlb_func
   ? do_raw_spin_trylock
   ? cpumask_weight
   cpu_init
   ? trace_hardirqs_off
   ? play_dead_common
   ? native_play_dead
   ? hlt_play_dead
   ? syscall_init
   ? arch_cpu_idle_dead
   ? do_idle
   start_secondary
   start_cpu
  Memory state around the buggy address:
   ffff880036e74f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
   ffff880036e74f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  >ffff880036e75000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                     ^
   ffff880036e75080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
   ffff880036e75100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ==================================================================

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170126165833.evjemhbqzaepirxo@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 09:32:42 +01:00
Ingo Molnar 09821ff1d5 x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_"
So there's a number of constants that start with "E820" but which
are not types - these create a confusing mixture when seen together
with 'enum e820_type' values:

	E820MAP
	E820NR
	E820_X_MAX
	E820MAX

To better differentiate the 'enum e820_type' values prefix them
with E820_TYPE_.

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 22:55:22 +01:00
Ingo Molnar ab6bc04cfd x86/boot/e820: Create coherent API function names for E820 range operations
We have these three related functions:

 extern void e820_add_region(u64 start, u64 size, int type);
 extern u64  e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type);
 extern u64  e820_remove_range(u64 start, u64 size, unsigned old_type, int checktype);

But it's not clear from the naming that they are 3 operations based around the
same 'memory range' concept. Rename them to better signal this, and move
the prototypes next to each other:

 extern void e820__range_add   (u64 start, u64 size, int type);
 extern u64  e820__range_update(u64 start, u64 size, unsigned old_type, unsigned new_type);
 extern u64  e820__range_remove(u64 start, u64 size, unsigned old_type, int checktype);

Note that this improved organization of the functions shows another problem that was easy
to miss before: sometimes the E820 entry type is 'int', sometimes 'unsigned int' - but this
will be fixed in a separate patch.

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 14:42:32 +01:00
Ingo Molnar 6464d294d2 x86/boot/e820: Rename update_e820() to e820__update_table()
update_e820() should have 'e820' as a prefix as most of the other E820
functions have - but it's also a bit unclear about its purpose, as
it's unclear what is updated - the whole table, or an entry?

Also, the name does not express that it's a trivial wrapper
around sanitize_e820_table() that also prints out the resulting
table.

So rename it to e820__update_table_print(). This also makes it
harmonize with the e820__update_table_firmware() function which
has a very similar purpose.

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 14:42:30 +01:00
Ingo Molnar 66441bd3cf x86/boot/e820: Move asm/e820.h to asm/e820/api.h
In line with asm/e820/types.h, move the e820 API declarations to
asm/e820/api.h and update all usage sites.

This is just a mechanical, obviously correct move & replace patch,
there will be subsequent changes to clean up the code and to make
better use of the new header organization.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 09:31:13 +01:00
Andy Lutomirski 60d3450167 x86/cpu: Re-apply forced caps every time CPU caps are re-read
Calling get_cpu_cap() will reset a bunch of CPU features.  This will
cause the system to lose track of force-set and force-cleared
features in the words that are reset until the end of CPU
initialization.  This can cause X86_FEATURE_FPU, for example, to
change back and forth during boot and potentially confuse CPU setup.

To minimize the chance of confusion, re-apply forced caps every time
get_cpu_cap() is called.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/c817eb373d2c67c2c81413a70fc9b845fa34a37e.1484705016.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 10:12:41 +01:00
Andy Lutomirski 8bf1ebca21 x86/cpu: Factor out application of forced CPU caps
There are multiple call sites that apply forced CPU caps.  Factor
them into a helper.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/623ff7555488122143e4417de09b18be2085ad06.1484705016.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 10:12:40 +01:00
Borislav Petkov 78d1b29684 x86/cpu: Add X86_FEATURE_CPUID
Add a synthetic CPUID flag denoting whether the CPU sports the CPUID
instruction or not. This will come useful later when accomodating
CPUID-less CPUs.

Signed-off-by: Borislav Petkov <bp@suse.de>
[ Slightly prettified. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/dcb355adae3ab812c79397056a61c212f1a0c7cc.1484705016.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 10:12:39 +01:00
Borislav Petkov 9026cc82b6 x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority
Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:57 +01:00
Borislav Petkov cff4c0391a x86/ras: Get rid of mce_process_work()
Make mce_gen_pool_process() the workqueue function directly and save us
an indirection.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-9-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:56 +01:00
Borislav Petkov 669c00f099 x86/ras: Flip the TSC-adding logic
Add the TSC value to the MCE record only when the MCE being logged is
precise, i.e., it is logged as an exception or an MCE-related interrupt.

So it doesn't look particularly easy to do without touching/changing a
bunch of places. That's why I'm trying tricks first.

For example, the mce-apei.c case I'm addressing by setting ->tsc only
for errors of panic severity. The idea there is, that, panic errors will
have raised an #MC and not polled.

And then instead of propagating a flag to mce_setup(), it seems
easier/less code to set ->tsc depending on the call sites, i.e.,
are we polling or are we preparing an MCE record in an exception
handler/thresholding interrupt.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:54 +01:00
Yazen Ghannam 0b737a9c2a x86/ras/amd: Make sysfs names of banks more user-friendly
Currently, we append the MCA_IPID[InstanceId] to the bank name to create
the sysfs filename. The InstanceId field uniquely identifies a bank
instance but it doesn't look very nice for most banks.

Replace the InstanceId with a simpler, ascending (0, 1, ..) value.
Only use this in the sysfs name when there is more than 1 instance.
Otherwise, just use the bank's name as the sysfs name.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1484322741-41884-3-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/20170123183514.13356-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:53 +01:00
Borislav Petkov 9b052ea4ce x86/ras/therm_throt: Do not log a fake MCE for thermal events
We log a fake bank 128 MCE to note that we're handling a CPU thermal
event. However, this confuses people into thinking that their hardware
generates MCEs. Hijacking MCA for logging thermal events is a gross
misuse anyway and it shouldn't have been done in the first place. And
besides we have other means for dealing with thermal events which are
much more suitable.

So let's kill the MCE logging part.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170105213846.GA12024@gmail.com
Link: http://lkml.kernel.org/r/20170123183514.13356-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:53 +01:00
Borislav Petkov d4b2ac63b0 x86/ras/inject: Make it depend on X86_LOCAL_APIC=y
... and get rid of the annoying:

  arch/x86/kernel/cpu/mcheck/mce-inject.c:97:13: warning: ‘mce_irq_ipi’ defined but not used [-Wunused-function]

when doing randconfig builds.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-24 09:14:52 +01:00
Borislav Petkov da0aa3dde0 x86/microcode/AMD: Remove struct cont_desc.eq_id
The equivalence ID was needed outside of the container scanning logic
but now, after this has been cleaned up, not anymore. Now, cont_desc.mc
is used to denote whether the container we're looking at has the proper
microcode patch for this CPU or not.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-17-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:51 +01:00
Borislav Petkov 69f5f98300 x86/microcode/AMD: Remove AP scanning optimization
The idea was to not scan the microcode blob on each AP (Application
Processor) during boot and thus save us some milliseconds. However, on
architectures where the microcode engine is shared between threads, this
doesn't work. Here's why:

The microcode on CPU0, i.e., the first thread, gets updated. The second
thread, i.e., CPU1, i.e., the first AP walks into load_ucode_amd_ap(),
sees that there's no container cached and goes and scans for the proper
blob.

It finds it and as a last step of apply_microcode_early_amd(), it tries
to apply the patch but that core has already the updated microcode
revision which it has received through CPU0's update. So it returns
false and we do desc->size = -1 to prevent other APs from scanning.

However, the next AP, CPU2, has a different microcode engine which
hasn't been updated yet. The desc->size == -1 test prevents it from
scanning the blob anew and we fail to update it.

The fix is much more straight-forward than it looks: the BSP
(BootStrapping Processor), i.e., CPU0, caches the microcode patch
in amd_ucode_patch. We use that on the AP and try to apply it.
In the 99.9999% of cases where we have homogeneous cores - *not*
mixed-steppings - the application will be successful and we're good to
go.

In the remaining small set of systems, we will simply rescan the blob
and find (or not, if none present) the proper patch and apply it then.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-16-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:51 +01:00
Borislav Petkov 72edfe950b x86/microcode/AMD: Simplify saving from initrd
No need to use the previously stashed info in the container - simply go
ahead and parse the initrd once more. It simplifies and streamlines the
code a whole lot.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-15-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:50 +01:00
Borislav Petkov e71bb4ec07 x86/microcode/AMD: Unify load_ucode_amd_ap()
Use a version for both bitness by adding a helper which does the actual
container finding and parsing which can be used on any CPU - BSP or AP.
Streamlines the paths more.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-14-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:50 +01:00
Borislav Petkov f3ad136d6e x86/microcode/AMD: Check patch level only on the BSP
Check final patch levels for AMD only on the BSP. This way, we decide
early and only once whether to continue loading or to leave the loader
disabled on such systems.

Simplify a lot.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-13-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:50 +01:00
Borislav Petkov 7a93a40be2 x86/microcode: Remove local vendor variable
Use x86_cpuid_vendor() directly.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-12-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:49 +01:00
Borislav Petkov 8cc26e0b4c x86/microcode/AMD: Use find_microcode_in_initrd()
Use the generic helper instead of semi-open-coding the procedure.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-11-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:48 +01:00
Borislav Petkov 3da9b41794 x86/microcode/AMD: Get rid of global this_equiv_id
We have a container which we update/prepare each time before applying a
microcode patch instead of using a global.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-10-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:48 +01:00
Borislav Petkov 309aac7776 x86/microcode: Decrease CPUID use
Get CPUID(1).EAX value once per CPU and propagate value into the callers
instead of conveniently calling it every time.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-9-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:47 +01:00
Borislav Petkov 8801b3fcb5 x86/microcode/AMD: Rework container parsing
It was pretty clumsy before and the whole work of parsing the microcode
containers was spread around the functions wrongly.

Clean it up so that there's a main scan_containers() function which
iterates over the microcode blob and picks apart the containers glued
together. For each container, it calls a parse_container() helper which
concentrates on one container only: sanity-checking, parsing, counting
microcode patches in there, etc.

It makes much more sense now and it is actually very readable. Oh, and
we luvz a diffstat removing more crap than adding.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-8-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:47 +01:00
Borislav Petkov f454177f73 x86/microcode/AMD: Extend the container struct
Make it into a container descriptor which is being passed around and
stores important info like the matching container and the patch for the
current CPU. Make it static too.

Later patches will use this and thus get rid of a double container
parsing.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-7-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:47 +01:00
Borislav Petkov ef901dc33d x86/microcode/AMD: Shorten function parameter's name
The whole driver calls this "mc", do that here too.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-6-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:46 +01:00
Borislav Petkov 1f02ac0682 x86/microcode/AMD: Clean up find_equiv_id()
No need to have it marked "inline" - let gcc decide. Also, shorten the
argument name and simplify while-test.

While at it, make it into a proper for-loop and simplify it even more,
as tglx suggests.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 10:02:46 +01:00
Borislav Petkov c26665ab5c x86/microcode/intel: Drop stashed AP patch pointer optimization
This was meant to save us the scanning of the microcode containter in
the initrd since the first AP had already done that but it can also hurt
us:

Imagine a single hyperthreaded CPU (Intel(R) Atom(TM) CPU N270, for
example) which updates the microcode on the BSP but since the microcode
engine is shared between the two threads, the update on CPU1 doesn't
happen because it has already happened on CPU0 and we don't find a newer
microcode revision on CPU1.

Which doesn't set the intel_ucode_patch pointer and at initrd
jettisoning time we don't save the microcode patch for later
application.

Now, when we suspend to RAM, the loaded microcode gets cleared so we
need to reload but there's no patch saved in the cache.

Removing the optimization fixes this issue and all is fine and dandy.

Fixes: 06b8534cb7 ("x86/microcode: Rework microcode loading")
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170120202955.4091-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-23 09:39:55 +01:00
K. Y. Srinivasan 8de8af7e08 Drivers: hv: vmbus: Move the extracting of Hypervisor version information
As part of the effort to separate out architecture specific code,
extract hypervisor version information in an architecture specific
file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-20 14:48:03 +01:00
K. Y. Srinivasan 63ed4e0c67 Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code
As part of the effort to separate out architecture specific code,
consolidate all Hyper-V specific clocksource code to an architecture
specific code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-20 14:48:03 +01:00
Peter Zijlstra acb04058de sched/clock: Fix hotplug crash
Mike reported that he could trigger the WARN_ON_ONCE() in
set_sched_clock_stable() using hotplug.

This exposed a fundamental problem with the interface, we should never
mark the TSC stable if we ever find it to be unstable. Therefore
set_sched_clock_stable() is a broken interface.

The reason it existed is that not having it is a pain, it means all
relevant architecture code needs to call clear_sched_clock_stable()
where appropriate.

Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
and parisc are trivial in that they never called
set_sched_clock_stable(), so add an unconditional call to
clear_sched_clock_stable() to them.

For x86 the story is a lot more involved, and what this patch tries to
do is ensure we preserve the status quo. So even is Cyrix or Transmeta
have usable TSC they never called set_sched_clock_stable() so they now
get an explicit mark unstable.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9881b024b7 ("sched/clock: Delay switching sched_clock to stable")
Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-20 02:38:46 +01:00
K. Y. Srinivasan 8730046c14 Drivers: hv vmbus: Move Hypercall page setup out of common code
As part of the effort to separate out architecture specific code, move the
hypercall page setup to an architecture specific file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 11:42:07 +01:00
Junichi Nomura 2e86222c67 x86/microcode/intel: Use correct buffer size for saving microcode data
In generic_load_microcode(), curr_mc_size is the size of the last
allocated buffer and since we have this performance "optimization"
there to vmalloc a new buffer only when the current one is bigger,
curr_mc_size ends up becoming the size of the biggest buffer we've seen
so far.

However, we end up saving the microcode patch which matches our CPU
and its size is not curr_mc_size but the respective mc_size during the
iteration while we're staring at it.

So save that mc_size into a separate variable and use it to store the
previously found microcode buffer.

Without this fix, we could get oops like this:

  BUG: unable to handle kernel paging request at ffffc9000e30f000
  IP: __memcpy+0x12/0x20
  ...
  Call Trace:
  ? kmemdup+0x43/0x60
  __alloc_microcode_buf+0x44/0x70
  save_microcode_patch+0xd4/0x150
  generic_load_microcode+0x1b8/0x260
  request_microcode_user+0x15/0x20
  microcode_write+0x91/0x100
  __vfs_write+0x34/0x120
  vfs_write+0xc1/0x130
  SyS_write+0x56/0xc0
  do_syscall_64+0x6c/0x160
  entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 06b8534cb7 ("x86/microcode: Rework microcode loading")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/4f33cbfd-44f2-9bed-3b66-7446cd14256f@ce.jp.nec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09 23:11:15 +01:00
Junichi Nomura 9fcf5ba2ef x86/microcode/intel: Fix allocation size of struct ucode_patch
We allocate struct ucode_patch here. @size is the size of microcode data
and used for kmemdup() later in this function.

Fixes: 06b8534cb7 ("x86/microcode: Rework microcode loading")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/7a730dc9-ac17-35c4-fe76-dfc94e5ecd95@ce.jp.nec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09 23:11:14 +01:00
Borislav Petkov 4167709bbf x86/microcode/intel: Add a helper which gives the microcode revision
Since on Intel we're required to do CPUID(1) first, before reading
the microcode revision MSR, let's add a special helper which does the
required steps so that we don't forget to do them next time, when we
want to read the microcode revision.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170109114147.5082-4-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09 23:11:14 +01:00
Borislav Petkov f3e2a51f56 x86/microcode: Use native CPUID to tickle out microcode revision
Intel supplies the microcode revision value in MSR 0x8b
(IA32_BIOS_SIGN_ID) after CPUID(1) has been executed. Execute it each
time before reading that MSR.

It used to do sync_core() which did do CPUID but

  c198b121b1 ("x86/asm: Rewrite sync_core() to use IRET-to-self")

changed the sync_core() implementation so we better make the microcode
loading case explicit, as the SDM documents it.

Reported-and-tested-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170109114147.5082-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09 23:11:14 +01:00
Borislav Petkov a33d331761 x86/CPU/AMD: Fix Bulldozer topology
The following commit:

  8196dab4fc ("x86/cpu: Get rid of compute_unit_id")

... broke the initial strategy for Bulldozer-based cores' topology,
where we consider each thread of a compute unit a standalone core
and not a HT or SMT thread.

Revert to the firmware-supplied core_id numbering and do not make
them thread siblings as we don't consider them for such even if they
technically are, more or less.

Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.6+
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8196dab4fc ("x86/cpu: Get rid of compute_unit_id")
Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-06 08:37:41 +01:00
Daniel Bristot de Oliveira c4158ff536 x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
This patch adds the __irq_entry annotation to the default x86
platform IRQ handlers. ftrace's function_graph tracer uses the
__irq_entry annotation to notify the entry and return of IRQ
handlers.

For example, before the patch:
  354549.667252 |   3)  d..1              |  default_idle_call() {
  354549.667252 |   3)  d..1              |    arch_cpu_idle() {
  354549.667253 |   3)  d..1              |      default_idle() {
  354549.696886 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
  354549.696886 |   3)  d..1              |          irq_enter() {
  354549.696886 |   3)  d..1              |            rcu_irq_enter() {

After the patch:
  366416.254476 |   3)  d..1              |    arch_cpu_idle() {
  366416.254476 |   3)  d..1              |      default_idle() {
  366416.261566 |   3)  d..1  ==========> |
  366416.261566 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
  366416.261566 |   3)  d..1              |          irq_enter() {
  366416.261566 |   3)  d..1              |            rcu_irq_enter() {

KASAN also uses this annotation. The smp_apic_timer_interrupt()
was already annotated.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-05 08:58:49 +01:00
Lukasz Odzioba dd853fd216 x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
A negative number can be specified in the cmdline which will be used as
setup_clear_cpu_cap() argument. With that we can clear/set some bit in
memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel
to misbehave. This patch adds lower bound check to setup_disablecpuid().

Boris Petkov reproduced a crash:

  [    1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540
  [    1.236535] IP: memcpy_erms+0x6/0x10

Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andi.kleen@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: slaoub@gmail.com
Fixes: ac72e7888a ("x86: add generic clearcpuid=... option")
Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-05 08:54:34 +01:00
Thomas Gleixner 0dad3a3014 x86/mce/AMD: Make the init code more robust
If mce_device_init() fails then the mce device pointer is NULL and the
AMD mce code happily dereferences it.

Add a sanity check.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-26 17:30:24 -08:00
Thomas Gleixner a5a1d1c291 clocksource: Use a plain u64 instead of cycle_t
There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
2016-12-25 11:04:12 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds 6ac3bb167f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "There's a number of fixes:

   - a round of fixes for CPUID-less legacy CPUs
   - a number of microcode loader fixes
   - i8042 detection robustization fixes
   - stack dump/unwinder fixes
   - x86 SoC platform driver fixes
   - a GCC 7 warning fix
   - virtualization related fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  Revert "x86/unwind: Detect bad stack return address"
  x86/paravirt: Mark unused patch_default label
  x86/microcode/AMD: Reload proper initrd start address
  x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
  x86/platform/intel-mid: Switch MPU3050 driver to IIO
  x86/alternatives: Do not use sync_core() to serialize I$
  x86/topology: Document cpu_llc_id
  x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
  x86/asm: Rewrite sync_core() to use IRET-to-self
  x86/microcode/intel: Replace sync_core() with native_cpuid()
  Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
  x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
  x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
  x86/tools: Fix gcc-7 warning in relocs.c
  x86/unwind: Dump stack data on warnings
  x86/unwind: Adjust last frame check for aligned function stacks
  x86/init: Fix a couple of comment typos
  x86/init: Remove i8042_detect() from platform ops
  Input: i8042 - Trust firmware a bit more when probing on X86
  x86/init: Add i8042 state to the platform data
  ...
2016-12-23 16:54:46 -08:00
Linus Torvalds eb254f323b Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache allocation interface from Thomas Gleixner:
 "This provides support for Intel's Cache Allocation Technology, a cache
  partitioning mechanism.

  The interface is odd, but the hardware interface of that CAT stuff is
  odd as well.

  We tried hard to come up with an abstraction, but that only allows
  rather simple partitioning, but no way of sharing and dealing with the
  per package nature of this mechanism.

  In the end we decided to expose the allocation bitmaps directly so all
  combinations of the hardware can be utilized.

  There are two ways of associating a cache partition:

   - Task

     A task can be added to a resource group. It uses the cache
     partition associated to the group.

   - CPU

     All tasks which are not member of a resource group use the group to
     which the CPU they are running on is associated with.

     That allows for simple CPU based partitioning schemes.

  The main expected user sare:

   - Virtualization so a VM can only trash only the associated part of
     the cash w/o disturbing others

   - Real-Time systems to seperate RT and general workloads.

   - Latency sensitive enterprise workloads

   - In theory this also can be used to protect against cache side
     channel attacks"

[ Intel RDT is "Resource Director Technology". The interface really is
  rather odd and very specific, which delayed this pull request while I
  was thinking about it. The pull request itself came in early during
  the merge window, I just delayed it until things had calmed down and I
  had more time.

  But people tell me they'll use this, and the good news is that it is
  _so_ specific that it's rather independent of anything else, and no
  user is going to depend on the interface since it's pretty rare. So if
  push comes to shove, we can just remove the interface and nothing will
  break ]

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/intel_rdt: Implement show_options() for resctrlfs
  x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
  x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
  x86/intel_rdt: Fix setting of closid when adding CPUs to a group
  x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
  x86/intel_rdt: Reset per cpu closids on unmount
  x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A
  x86/intel_rdt: Prevent deadlock against hotplug lock
  x86/intel_rdt: Protect info directory from removal
  x86/intel_rdt: Add info files to Documentation
  x86/intel_rdt: Export the minimum number of set mask bits in sysfs
  x86/intel_rdt: Propagate error in rdt_mount() properly
  x86/intel_rdt: Add a missing #include
  MAINTAINERS: Add maintainer for Intel RDT resource allocation
  x86/intel_rdt: Add scheduler hook
  x86/intel_rdt: Add schemata file
  x86/intel_rdt: Add tasks files
  x86/intel_rdt: Add cpus file
  x86/intel_rdt: Add mkdir to resctrl file system
  x86/intel_rdt: Add "info" files to resctrl file system
  ...
2016-12-22 09:25:45 -08:00
Borislav Petkov 8877ebdd3f x86/microcode/AMD: Reload proper initrd start address
When we switch to virtual addresses and, especially after
reserve_initrd()->relocate_initrd() have run, we have the updated initrd
address in initrd_start. Use initrd_start then instead of the address
which has been passed to us through boot params. (That still gets used
when we're running the very early routines on the BSP).

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-21 10:50:04 +01:00
Vitaly Kuznetsov 59107e2f48 x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).

To properly unload VMBus making it possible to start over during kdump we
need to do the following:

 - Send an 'unload' message to the hypervisor. This can be done on any CPU
   so we do this the crashing CPU.

 - Receive the 'unload finished' reply message. WS2012R2 delivers this
   message to the CPU which was used to establish VMBus connection during
   module load and this CPU may differ from the CPU sending 'unload'.

Receiving a VMBus message means the following:

 - There is a per-CPU slot in memory for one message. This slot can in
   theory be accessed by any CPU.

 - We get an interrupt on the CPU when a message was placed into the slot.

 - When we read the message we need to clear the slot and signal the fact
   to the hypervisor. In case there are more messages to this CPU pending
   the hypervisor will deliver the next message. The signaling is done by
   writing to an MSR so this can only be done on the appropriate CPU.

To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:

 - We're crashing on a CPU which is different from the one which was used
   to initially contact the hypervisor.

 - The CPU which was used for the initial contact is blocked with interrupts
   disabled and there is a message pending in the message slot.

In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.

The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.

The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-20 09:31:48 +01:00
Andy Lutomirski 484d0e5c79 x86/microcode/intel: Replace sync_core() with native_cpuid()
The Intel microcode driver is using sync_core() to mean "do CPUID
with EAX=1".  I want to rework sync_core(), but first the Intel
microcode driver needs to stop depending on its current behavior.

Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/535a025bb91fed1a019c5412b036337ad239e5bb.1481307769.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 11:54:21 +01:00
Andy Lutomirski 3df8d92085 x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
A typo (or mis-merge?) resulted in leaf 6 only being probed if
cpuid_level >= 7.

Fixes: 2ccd71f1b2 ("x86/cpufeature: Move some of the scattered feature bits to x86_capability")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 11:50:24 +01:00
Boris Ostrovsky 2b4c91569a x86/microcode/AMD: Use native_cpuid() in load_ucode_amd_bsp()
When CONFIG_PARAVIRT is selected, cpuid() becomes a call. Since
for 32-bit kernels load_ucode_amd_bsp() is executed before paging
is enabled the call cannot be completed (as kernel virtual addresses
are not reachable yet).

Use native_cpuid() instead which is an asm wrapper for the CPUID
instruction.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jürgen Gross <jgross@suse.com>
Link: http://lkml.kernel.org/r/1481906392-3847-1-git-send-email-boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/20161218164414.9649-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 10:46:20 +01:00
Borislav Petkov a15a753539 x86/microcode/AMD: Do not load when running on a hypervisor
Doing so is completely void of sense for multiple reasons so prevent
it. Set dis_ucode_ldr to true and thus disable the microcode loader by
default to address xen pv guests which execute the AP path but not the
BSP path.

By having it turned off by default, the APs won't run into the loader
either.

Also, check CPUID(1).ECX[31] which hypervisors set. Well almost, not the
xen pv one. That one gets the aforementioned "fix".

Also, improve the detection method by caching the final decision whether
to continue loading in dis_ucode_ldr and do it once on the BSP. The APs
then simply test that value.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Juergen Gross <jgross@suse.com>
Link: http://lkml.kernel.org/r/20161218164414.9649-4-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 10:46:20 +01:00
Borislav Petkov 200d355316 x86/microcode/AMD: Sanitize apply_microcode_early_amd()
Make it simply return bool to denote whether it found a container or not
and return the pointer to the container and its size in the handed-in
container pointer instead, as returning a struct was just silly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jürgen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/20161218164414.9649-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 10:46:20 +01:00
Borislav Petkov 8feaa64a9a x86/microcode/AMD: Make find_proper_container() sane again
Fixup signature and retvals, return the container struct through the
passed in pointer, not as a function return value.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jürgen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/20161218164414.9649-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 10:46:19 +01:00
Linus Torvalds 1bbb05f520 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes and cleanups from Thomas Gleixner:
 "This set of updates contains:

   - Robustification for the logical package managment. Cures the AMD
     and virtualization issues.

   - Put the correct start_cpu() return address on the stack of the idle
     task.

   - Fixups for the fallout of the nodeid <-> cpuid persistent mapping
     modifciations

   - Move the x86/MPX specific mm_struct member to the arch specific
     mm_context where it belongs

   - Cleanups for C89 struct initializers and useless function
     arguments"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/floppy: Use designated initializers
  x86/mpx: Move bd_addr to mm_context_t
  x86/mm: Drop unused argument 'removed' from sync_global_pgds()
  ACPI/NUMA: Do not map pxm to node when NUMA is turned off
  x86/acpi: Use proper macro for invalid node
  x86/smpboot: Prevent false positive out of bounds cpumask access warning
  x86/boot/64: Push correct start_cpu() return address
  x86/boot/64: Use 'push' instead of 'call' in start_cpu()
  x86/smpboot: Make logical package management more robust
2016-12-18 11:12:53 -08:00
Linus Torvalds c11a6cfb01 Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Mostly patches to initialize workqueue subsystem earlier and get rid
  of keventd_up().

  The patches were headed for the last merge cycle but got delayed due
  to a bug found late minute, which is fixed now.

  Also, to help debugging, destroy_workqueue() is more chatty now on a
  sanity check failure."

* 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: move wq_numa_init() to workqueue_init()
  workqueue: remove keventd_up()
  debugobj, workqueue: remove keventd_up() usage
  slab, workqueue: remove keventd_up() usage
  power, workqueue: remove keventd_up() usage
  tty, workqueue: remove keventd_up() usage
  mce, workqueue: remove keventd_up() usage
  workqueue: make workqueue available early during boot
  workqueue: dump workqueue state on sanity check failures in destroy_workqueue()
2016-12-13 12:59:57 -08:00
Linus Torvalds 098c30557a Driver core patches for 4.10-rc1
Here's the new driver core patches for 4.10-rc1.
 
 Big thing here is the nice addition of "functional dependencies" to the
 driver core.  The idea has been talked about for a very long time, great
 job to Rafael for stepping up and implementing it. It's been tested for
 longer than the 4.9-rc1 date, we held off on merging it earlier in order
 to feel more comfortable about it.
 
 Other than that, it's just a handful of small other patches, some good
 cleanups to the mess that is the firmware class code, and we have a test
 driver for the deferred probe logic.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWFAvPQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym3NgCgmhFeWEkp9SDt17YGGavmnzQUlBQAoJlUipJp
 PHeQkq15ZWw3wWC9FEvM
 =91M1
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the new driver core patches for 4.10-rc1.

  Big thing here is the nice addition of "functional dependencies" to
  the driver core. The idea has been talked about for a very long time,
  great job to Rafael for stepping up and implementing it. It's been
  tested for longer than the 4.9-rc1 date, we held off on merging it
  earlier in order to feel more comfortable about it.

  Other than that, it's just a handful of small other patches, some good
  cleanups to the mess that is the firmware class code, and we have a
  test driver for the deferred probe logic.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
  firmware: Correct handling of fw_state_wait() return value
  driver core: Silence device links sphinx warning
  firmware: remove warning at documentation generation time
  drivers: base: dma-mapping: Fix typo in dmam_alloc_non_coherent comments
  driver core: test_async: fix up typo found by 0-day
  firmware: move fw_state_is_done() into UHM section
  firmware: do not use fw_lock for fw_state protection
  firmware: drop bit ops in favor of simple state machine
  firmware: refactor loading status
  firmware: fix usermode helper fallback loading
  driver core: firmware_class: convert to use class_groups
  driver core: devcoredump: convert to use class_groups
  driver core: class: add class_groups support
  kernfs: Declare two local data structures static
  driver-core: fix platform_no_drv_owner.cocci warnings
  drivers/base/memory.c: Remove unused 'first_page' variable
  driver core: add CLASS_ATTR_WO()
  drivers: base: cacheinfo: support DT overrides for cache properties
  drivers: base: cacheinfo: add pr_fmt logging
  drivers: base: cacheinfo: fix boot error message when acpi is enabled
  ...
2016-12-13 11:42:18 -08:00
Thomas Gleixner 9d85eb9119 x86/smpboot: Make logical package management more robust
The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3b ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: M. Vefa Bicakci <m.v.b@runbox.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Cc: Charles (Chas) Williams <ciwillia@brocade.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-13 10:22:39 +01:00
Linus Torvalds e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Linus Torvalds f797484c26 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "Two changes:

   - implement various VMWare guest OS improvements/fixes (Alexey
     Makhalov)

   - unexport a spurious export from the intel-mid platform driver
     (Lukas Wunner)"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vmware: Add paravirt sched clock
  x86/vmware: Add basic paravirt ops support
  x86/vmware: Use tsc_khz value for calibrate_cpu()
  x86/platform/intel-mid: Unexport intel_mid_pci_set_power_state()
  x86/vmware: Read tsc_khz only once at boot time
2016-12-12 15:29:06 -08:00
Linus Torvalds 991bc36254 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode update from Ingo Molnar:
 "The biggest change (by Borislav Petkov) is a thorough rewrite of the
  Intel microcode loader and its interactions with the core code.

  The biggest conceptual change is the decoupling of the microcode
  loading on boot and application processors (which load the microcode
  in different scenarios), so that both parse the input patches with as
  few assumptions as possible - this also fixes various kernel address
  space randomization bugs. (The AP side then goes on and caches the
  result to improve boot performance.)

  Since the AMD side already did this, this change also opened up the
  path towards more unification/simplification of the core microcode
  loading infrastructure:

     10 files changed, 647 insertions(+), 940 deletions(-)

  which speaks for itself"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Bump driver version, update copyrights
  x86/microcode: Rework microcode loading
  x86/microcode/intel: Remove intel_lib.c
  x86/microcode/amd: Move private inlines to .c and mark local functions static
  x86/microcode: Collect CPU info on resume
  x86/microcode: Issue the debug printk on resume only on success
  x86/microcode/amd: Hand down the CPU family
  x86/microcode: Export the microcode cache linked list
  x86/microcode: Remove one #ifdef clause
  x86/microcode/intel: Simplify generic_load_microcode()
  x86/microcode: Move driver authors to CREDITS
  x86/microcode: Run the AP-loading routine only on the application processors
2016-12-12 15:23:02 -08:00
Linus Torvalds 212f30008a Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 idle updates from Ingo Molnar:
 "There were two bigger changes in this development cycle:

   - remove idle notifiers:

       32 files changed, 74 insertions(+), 803 deletions(-)

     These notifiers were of questionable value and the main usecase,
     the i7300 driver, was essentially unmaintained and can be removed,
     plus modern power management concepts don't need the callback - so
     use this golden opportunity and get rid of this opaque and fragile
     callback from a latency sensitive code path.

     (Len Brown, Thomas Gleixner)

   - improve the AMD Erratum 400 workaround that used high overhead MSR
     polling in the idle loop (Borisla Petkov, Thomas Gleixner)"

* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove empty idle.h header
  x86/amd: Simplify AMD E400 aware idle routine
  x86/amd: Check for the C1E bug post ACPI subsystem init
  x86/bugs: Separate AMD E400 erratum and C1E bug
  x86/cpufeature: Provide helper to set bugs bits
  x86/idle: Remove enter_idle(), exit_idle()
  x86: Remove x86_test_and_clear_bit_percpu()
  x86/idle: Remove is_idle flag
  x86/idle: Remove idle_notifier
  i7300_idle: Remove this driver
2016-12-12 14:55:04 -08:00
Linus Torvalds 535b2f73f6 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU updates from Ingo Molnar:
 "The changes in this development cycle were:

   - AMD CPU topology enhancements that are cleanups on current CPUs but
     which enable future Fam17 hardware. (Yazen Ghannam)

   - unify bugs.c and bugs_64.c (Borislav Petkov)

   - remove the show_msr= boot option (Borislav Petkov)

   - simplify a boot message (Borislav Petkov)"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature
  x86/cpu: Get rid of the show_msr= boot option
  x86/cpu: Merge bugs.c and bugs_64.c
  x86/cpu: Remove the printk format specifier in "CPU0: "
2016-12-12 14:25:21 -08:00
Linus Torvalds 5645688f9d Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this development cycle were:

   - a large number of call stack dumping/printing improvements: higher
     robustness, better cross-context dumping, improved output, etc.
     (Josh Poimboeuf)

   - vDSO getcpu() performance improvement for future Intel CPUs with
     the RDPID instruction (Andy Lutomirski)

   - add two new Intel AVX512 features and the CPUID support
     infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela,
     He Chen)

   - more copy-user unification (Borislav Petkov)

   - entry code assembly macro simplifications (Alexander Kuleshov)

   - vDSO C/R support improvements (Dmitry Safonov)

   - misc fixes and cleanups (Borislav Petkov, Paul Bolle)"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  scripts/decode_stacktrace.sh: Fix address line detection on x86
  x86/boot/64: Use defines for page size
  x86/dumpstack: Make stack name tags more comprehensible
  selftests/x86: Add test_vdso to test getcpu()
  x86/vdso: Use RDPID in preference to LSL when available
  x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl()
  x86/cpufeatures: Enable new AVX512 cpu features
  x86/cpuid: Provide get_scattered_cpuid_leaf()
  x86/cpuid: Cleanup cpuid_regs definitions
  x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants
  x86/unwind: Ensure stack grows down
  x86/vdso: Set vDSO pointer only after success
  x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE
  x86/unwind: Detect bad stack return address
  x86/dumpstack: Warn on stack recursion
  x86/unwind: Warn on bad frame pointer
  x86/decoder: Use stderr if insn sanity test fails
  x86/decoder: Use stdout if insn decoder test is successful
  mm/page_alloc: Remove kernel address exposure in free_reserved_area()
  x86/dumpstack: Remove raw stack dump
  ...
2016-12-12 13:49:57 -08:00
Linus Torvalds df5f0f0a02 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Ingo Molnar:
 "The main changes in this development cycle were:

   - more AMD northbridge support work, mostly in preparation for Fam17h
     CPUs (Yazen Ghannam, Borislav Petkov)

   - cleanups/refactorings and fixes (Borislav Petkov, Tony Luck,
     Yinghai Lu)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Include the PPIN in MCE records when available
  x86/mce/AMD: Add system physical address translation for AMD Fam17h
  x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h
  x86/amd_nb: Add Fam17h Data Fabric as "Northbridge"
  x86/amd_nb: Make all exports EXPORT_SYMBOL_GPL
  x86/amd_nb: Make amd_northbridges internal to amd_nb.c
  x86/mce/AMD: Reset Threshold Limit after logging error
  x86/mce/AMD: Fix HWID_MCATYPE calculation by grouping arguments
  x86/MCE: Correct TSC timestamping of error records
  x86/RAS: Hide SMCA bank names
  x86/RAS: Rename smca_bank_names to smca_names
  x86/RAS: Simplify SMCA HWID descriptor struct
  x86/RAS: Simplify SMCA bank descriptor struct
  x86/MCE: Dump MCE to dmesg if no consumers
  x86/RAS: Add TSC timestamp to the injected MCE
  x86/MCE: Do not look at panic_on_oops in the severity grading
2016-12-12 12:58:50 -08:00
Thomas Gleixner 34bc3560c6 x86: Remove empty idle.h header
One include less is always a good thing(tm). Good riddance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:22 +01:00
Borislav Petkov 07c94a3812 x86/amd: Simplify AMD E400 aware idle routine
Reorganize the E400 detection now that we have everything in place:
switch the CPUs to broadcast mode after the LAPIC has been initialized
and remove the facilities that were used previously on the idle path.

Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
because alternatives have been applied when the actual detection happens,
so the static switching does not take effect and the test will stay
false. Use boot_cpu_has_bug() instead which is definitely an improvement
over the RDMSR and the cpumask handling.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:21 +01:00
Thomas Gleixner 3344ed3079 x86/bugs: Separate AMD E400 erratum and C1E bug
The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E
state) is a two step process:

 - Selection of the E400 aware idle routine

 - Detection whether the platform is affected

The idle routine selection happens for possibly affected CPUs depending on
family/model/stepping information. These range of CPUs is not necessarily
affected as the decision whether to enable the C1E feature is made by the
firmware. Unfortunately there is no way to query this at early boot.

The current implementation polls a MSR in the E400 aware idle routine to
detect whether the CPU is affected. This is inefficient on non affected
CPUs because every idle entry has to do the MSR read.

There is a better way to detect this before going idle for the first time
which requires to seperate the bug flags:

  X86_BUG_AMD_E400 	- Selects the E400 aware idle routine and
  			  enables the detection
			  
  X86_BUG_AMD_APIC_C1E  - Set when the platform is affected by E400

Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400
bug bit to select the idle routine which currently does an unconditional
detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches
to remove the MSR polling and simplify the handling of this misfeature.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:20 +01:00
Shaohua Li 76ae054c69 x86/intel_rdt: Implement show_options() for resctrlfs
Implement show_options() callback for intel resource control filesystem
to expose the active mount options in /proc/

Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/7dce7c1886ac9289442d254ea18322c92bd968da.1480717072.git.shli@fb.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 14:12:18 +01:00
Fenghua Yu 74fcdae1a7 x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
intel_rdt_sched_in() must be called with preemption disabled because the
function accesses percpu variables (pqr_state and closid).

If a task moves itself via move_myself() preemption is enabled, which
violates the calling convention and can result in incorrect closid
selection when the task gets preempted or migrated.

Add the required protection and a comment about the calling convention.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Marcelo Tosatti" <mtosatti@redhat.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1480625714-54246-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-02 01:13:02 +01:00
Fenghua Yu 0efc89be94 x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
When removing a sub directory/rdtgroup by rmdir or umount, closid in a
task in the sub directory is set to default rdtgroup's closid which is 0.
If the task is running on a CPU, the PQR_ASSOC MSR is only updated
when the task runs through a context switch. Up to the context switch,
the task runs with the wrong closid.

Make the change immediately effective by invoking a smp function call on
all CPUs which are running moved task. If one of the affected tasks was
moved or scheduled out before the function call is executed on the CPU the
only damage is the extra interruption of the CPU.

[ tglx: Reworked it to avoid blindly interrupting all CPUs and extra loops ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1479511084-59727-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-28 11:07:50 +01:00
Fenghua Yu 2659f46da8 x86/intel_rdt: Fix setting of closid when adding CPUs to a group
There was a cut & paste error when adding code to update the per-cpu
closid when changing the bitmask of CPUs to an rdt group.

The update erronously assigns the closid of the default group to the CPUs
which are moved to a group instead of assigning the closid of their new
group. Use the proper closid.

Fixes: f410770293 ("x86/intel_rdt: Update percpu closid immeditately on CPUs affected by change")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1479511084-59727-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-28 11:07:50 +01:00
Tony Luck 3f5a7896a5 x86/mce: Include the PPIN in MCE records when available
Intel Xeons from Ivy Bridge onwards support a processor identification
number set in the factory. To the user this is a handy unique number to
identify a particular CPU. Intel can decode this to the fab/production
run to track errors. On systems that have it, include it in the machine
check record. I'm told that this would be helpful for users that run
large data centers with multi-socket servers to keep track of which CPUs
are seeing errors.

Boris:
* Add some clarifying comments and spacing.
* Mask out [63:2] in the disabled-but-not-locked case
* Call the MSR variable "val" for more readability.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20161123114855.njguoaygp3qnbkia@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-23 16:51:52 +01:00
Thomas Gleixner 33d97302eb x86/mce/therm_throt: Move hotplug callbacks to online
No point to have the sysfs files around before the cpu is online and no
point to have them around until the cpu is dead. Get rid of the explicit
state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
2016-11-22 23:34:38 +01:00
Sebastian Andrzej Siewior d6526e73db x86/mce/therm_throt: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linuxtronix.de
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161117183541.8588-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-22 23:34:38 +01:00
Yazen Ghannam f5382de9d4 x86/mce/AMD: Add system physical address translation for AMD Fam17h
The Unified Memory Controllers (UMCs) on Fam17h log a normalized address
in their MCA_ADDR registers. We need to convert that normalized address
to a system physical address in order to support a few facilities:

1) To offline poisoned pages in DRAM proactively in the deferred error
   handler.

2) To print sysaddr and page info for DRAM ECC errors in EDAC.

[ Boris: fixes/cleanups ontop:

  * hi_addr_offset = 0 - no need for that branch. Stick it all under the
    HiAddrOffsetEn case. It confines hi_addr_offset's declaration too.

  * Move variables to the innermost scope they're used at so that we save
    on stack and not blow it up immediately on function entry.

  * Do not modify *sys_addr prematurely - we want to not exit early and
    have modified *sys_addr some, which callers get to see. We either
    convert to a sys_addr or we don't do anything. And we signal that with
    the retval of the function.

  * Rename label out -> out_err - because it is the error path.

  * No need to pr_err of the conversion failed case: imagine a
    sparsely-populated machine with UMCs which don't have DIMMs. Callers
    should look at the retval instead and issue a printk only when really
    necessary. No need for useless info in dmesg.

  * s/temp_reg/tmp/ and other variable names shortening => shorter code.

  * Use BIT() everywhere.

  * Make error messages more informative.

  *  Small build fix for the !CONFIG_X86_MCE_AMD case.

  * ... and more minor cleanups.
]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic
[ Typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22 12:30:16 +01:00
Borislav Petkov 254fe9c7a4 x86/MCE/AMD: Fix thinko about thresholding_en
So adding thresholding_en et al was a good thing for removing the
per-CPU thresholding callback, i.e., threshold_cpu_callback.

But, in order for it to work and especially that test in
mce_threshold_create_device() so that all thresholding banks get
properly created and not the whole thing to fail with a NULL ptr
dereference at mce_cpu_pre_down() when we offline the CPUs, we need to
set the thresholding_en flag *before* we start creating the devices.

Yap, it failed because thresholding_en wasn't set at the time
we were creating the banks so we didn't create any and then at
mce_cpu_pre_down() -> mce_threshold_remove_device() time, we would blow
up.

And the fix is actually easy: we have thresholding on the system when we
have managed to set the thresholding vector to amd_threshold_interrupt()
earlier in mce_amd_feature_init() while we were picking apart the
thresholding banks and what is set and what not.

So let's do that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Fixes: 4d7b02d58c ("x86/mcheck: Split threshold_cpu_callback into two callbacks")
Link: http://lkml.kernel.org/r/20161119103402.5227-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 11:02:12 +01:00
Thomas Gleixner 7ce7f35b33 Merge branch 'x86/cpufeature' into x86/cache
Resolve the cpu/scattered conflict.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 14:19:34 +01:00
He Chen 47bdf3378d x86/cpuid: Provide get_scattered_cpuid_leaf()
Sparse populated CPUID leafs are collected in a software provided leaf to
avoid bloat of the x86_capability array, but there is no way to rebuild the
real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID
leaf from the CPU. While this is possible it is problematic as it does not
take software disabled features into account. If a feature is disabled on
the host it should not be exposed to a guest either.

Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered
cpuid table information and the active CPU features.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Piotr Luc <Piotr.Luc@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 11:13:09 +01:00
He Chen 47f10a3600 x86/cpuid: Cleanup cpuid_regs definitions
cpuid_regs is defined multiple times as structure and enum. Rename the enum
and move all of it to processor.h so we don't end up with more instances.

Rename the misnomed register enumeration from CR_* to the obvious CPUID_*.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Piotr Luc <Piotr.Luc@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1478856336-9388-2-git-send-email-he.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 11:13:09 +01:00
Yazen Ghannam 18807ddb7f x86/mce/AMD: Reset Threshold Limit after logging error
The error count field in MCA_MISC does not get reset by hardware when the
threshold has been reached. Software is expected to reset it. Currently,
the threshold limit only gets reset during init or when a user writes to
sysfs.

If the user is not monitoring threshold interrupts and resetting
the limit then the user will only see 1 interrupt when the limit is first
hit. So if, for example, the limit is set to 10 then only 1 interrupt will
be recorded after 10 errors even if 100 errors have occurred. The user may
then assume that only 10 errors have occurred.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479244433-69267-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:57:11 +01:00
Sebastian Andrzej Siewior 0e285d36bd x86/mcheck: Move CPU_DEAD to hotplug state machine
This moves the last piece of the old hotplug notifier code in MCE to the
new hotplug state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-8-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:18 +01:00
Sebastian Andrzej Siewior 8c0eeac819 x86/mcheck: Move CPU_ONLINE and CPU_DOWN_PREPARE to hotplug state machine
The CPU_ONLINE and CPU_DOWN_PREPARE look fully symmetrical and could be move
to the hotplug state machine.
On a failure during registration we have the tear down callback invoked
(mce_cpu_pre_down()) so there should be no timer around and so no need to need
keep notifier installed (this was the reason according to the comment why the
notifier was registered despite of errors).

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-7-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:18 +01:00
Sebastian Andrzej Siewior 39f152ffbf x86/mcheck: Reorganize the hotplug callbacks
Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let it
become an independent early hotplug callback. The main problem here was that
the init on the boot CPU may happen too late
(device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiving
and MCE event at boot time leading to a shutdown (if the MCE feature is not yet
enabled).

Here is attempt two: the timming stays as-is but the ordering of the functions
is changed:
- mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer
  struct but won't fire the timer. This is moved to CPU_ONLINE since its
  cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer early
  in the shutdown phase, it should be okay to start it late in the bring up phase.

- CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in
  mce_disable_cpu(). If a failure occures it would be re-enabled on all vendor
  CPUs (including Intel where it was not disabled during shutdown). To keep this
  working I am moving it to CPU_ONLINE. smp_call_function_single() is dropped
  beause the notifier runs nowdays on the target CPU.

- CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device()
  but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and
  mce_device_remove()). In order to keep this symmetrical I am moving the clean
  up from CPU_DEAD to CPU_DOWN_PREPARE.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-6-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:18 +01:00
Sebastian Andrzej Siewior 4d7b02d58c x86/mcheck: Split threshold_cpu_callback into two callbacks
The threshold_cpu_callback callbacks looks like one of the notifier and
its arguments are almost the same. Split this out and have one ONLINE
and one DEAD callback. This will come handy later once the main code
gets changed to use the callback mechanism.
Also, handle threshold_cpu_callback_online() return value so we don't
continue if the function fails.

Boris Petkov removed the callback pointer and replaced it with proper
functions.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-5-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior 7f34b935e8 x86/mcheck: Be prepared for a rollback back to the ONLINE state
If we try a CPU down and fail in the middle then we roll back to the
online state. This means we would perform CPU_ONLINE / mce_device_create()
without invoking CPU_DEAD / mce_device_remove() for the cleanup of what was
allocated in CPU_ONLINE.

Be prepared for this and don't allocate the struct if we have it
already.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-4-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior ec553abb31 x86/mcheck: Explicit cleanup on failure in mce_amd
If the ONLINE callback fails, the driver does not any clean up right
away instead it waits to get to the DEAD stage to do it. Yes, it waits.
Since we don't pass the error code back to the caller, no one knows.

Do the clean up right away so it does not look like a leak.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-3-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior 0943637293 x86/mcheck: Move threshold_create_device()
Move the threshold_create_device() so it can use
threshold_remove_device() without a forward declaration.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:16 +01:00
Fenghua Yu f410770293 x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
If CPUs are moved to or removed from a rdtgroup, the percpu closid storage
is updated. If tasks running on an affected CPU use the percpu closid then
the PQR_ASSOC MSR is only updated when the task runs through a context
switch. Up to the context switch the CPUs operate on the wrong closid. This
state is potentially unbound.
    
Make the change immediately effective by invoking a smp function call on
the affected CPUs which stores the new closid in the perpu storage and
calls the rdt_sched_in() function which updates the MSR, if the current
task uses the percpu closid.

[ tglx: Made it work and massaged changelog once more ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1478912558-55514-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15 18:35:50 +01:00
Fenghua Yu c7cc0cc10c x86/intel_rdt: Reset per cpu closids on unmount
All CPUs in a rdtgroup are given back to the default rdtgroup before the
rdtgroup is removed during umount. After umount, the default rdtgroup
contains all online CPUs, but the per cpu closids are not cleared. As a
result the stale closid value will be used immediately after the next
mount.

Move all cpus to the default group and update the percpu closid storage.

[ tglx: Massaged changelong ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1478912558-55514-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15 18:35:50 +01:00
Thomas Gleixner a2584e1d5a x86/intel_rdt: Prevent deadlock against hotplug lock
The cpu online/offline callbacks of intel_rdt lock rdtgroup_mutex nested
inside of cpu hotplug lock. rdtgroup_cpus_write() does it in reverse order.

Remove the get/put_online_cpus() calls from rdtgroup_cpus_write(). This is
safe against cpu hotplug as the resource group cpumasks are protected by
rdtgroup_mutex.

Found by review, but should have been found if authors would have bothered
to test cpu hotplug with lockdep enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
2016-11-15 18:35:49 +01:00
Fenghua Yu f57b308728 x86/intel_rdt: Protect info directory from removal
The info directory and the per-resource subdirectories of the info
directory have no reference to a struct rdtgroup in kn->priv. An attempt to
remove one of those directories results in a NULL pointer dereference.

Protect the directories from removal and return -EPERM instead of -ENOENT.

[ tglx: Massaged changelog ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1478912558-55514-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15 18:35:49 +01:00
Borislav Petkov 54467353a9 x86/MCE: Correct TSC timestamping of error records
We did have logic in the MCE code which would TSC-timestamp an error
record only when it is exact - i.e., when it wasn't detected by polling.
This isn't the case anymore. So let's fix that:

We have a valid TSC timestamp in the error record only when it has been
a precise detection, i.e., either in the #MC handler or in one of the
interrupt handlers (thresholding, deferred, ...).

All other error records still have mce.time which contains the wall
time in order to be able to place the error record in time at least
approximately.

Also, this fixes another bug where machine_check_poll() would clear
mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
logging.

The proper fix would be to generate timestamp only when it has been
requested and not always. But that would require a more thorough code
audit of all mce_gather_info/mce_setup() users. Add a FIXME for now.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony <tony.luck@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: kernel test robot <xiaolong.ye@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-11 08:08:24 +01:00
Sudeep Holla fac5148257 drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
With CONFIG_OF enabled on x86, we get the following error on boot:
"
	Failed to find cpu0 device node
 	Unable to detect cache hierarchy from DT for CPU 0
"
and the cacheinfo fails to get populated in the corresponding sysfs
entries. This is because cache_setup_of_node looks for of_node for
setting up the shared cpu_map without checking that it's already
populated in the architecture specific callback.

In order to indicate that the shared cpu_map is already populated, this
patch introduces a boolean `cpu_map_populated` in struct cpu_cacheinfo
that can be used by the generic code to skip cache_shared_cpu_map_setup.

This patch also sets that boolean for x86.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10 17:30:53 +01:00
Thomas Gleixner d49597fd3b x86/cpu: Deal with broken firmware (VMWare/XEN)
Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>,
Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 21:05:01 +01:00
Yazen Ghannam b6a50cddbc x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature
These changes do not affect current hw - just a cleanup:

Currently, we assume that a system has a single Last Level Cache (LLC)
per node, and that the cpu_llc_id is thus equal to the node_id. This no
longer applies since Fam17h can have multiple last level caches within a
node.

So group the cpu_llc_id assignment by topology feature and family in
order to make the computation of cpu_llc_id on the different families
more clear.

Here is how the LLC ID is being computed on the different families:

The NODEID_MSR feature only applies to Fam10h in which case the LLC is
at the node level.

The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only
see multiple last level caches if L3 caches are available. Otherwise,
the cpu_llc_id will default to be the phys_proc_id.

We have L3 caches only on families 15h and 17h:

 - on Fam15h, the LLC is at the node level.

 - on Fam17h, the LLC is at the core complex level and can be found by
   right shifting the APIC ID. Also, keep the family checks explicit so that
   new families will fall back to the default, which will be node_id for
   TOPOEXT systems.

Single node systems in families 10h and 15h will have a Node ID of 0
which will be the same as the phys_proc_id, so we don't need to check
for multiple nodes before using the node_id.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Rewrote the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:07:43 +01:00
Ingo Molnar ca4b2df651 Merge branch 'x86/urgent' into x86/cpu, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:07:11 +01:00
Yazen Ghannam b0b6e86846 x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v4.4..
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3849e91f57 ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:06:08 +01:00
Borislav Petkov c09a8c40e0 x86/RAS: Hide SMCA bank names
Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:15 +01:00
Borislav Petkov a9a1c0ee04 x86/RAS: Rename smca_bank_names to smca_names
Make it differ more from struct smca_bank_name for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov 1ce9cd7f9f x86/RAS: Simplify SMCA HWID descriptor struct
Call it simply smca_hwid and call local variables "hwid". More readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov 79349f529a x86/RAS: Simplify SMCA bank descriptor struct
Call the struct simply smca_bank, it's instance ID can be simply ->id.
Makes the code much more readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov cd9c57cad3 x86/MCE: Dump MCE to dmesg if no consumers
When there are no error record consumers registered with the kernel, the
only thing that appears in dmesg is something like:

  [  300.000326] mce: [Hardware Error]: Machine check events logged

and the error records are gone. Which is seriously counterproductive.

So let's dump them to dmesg instead, in such a case.

Requested-by: Eric Morton <Eric.Morton@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20161101120911.13163-4-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:13 +01:00
Yinghai Lu f5e886ef9b x86/MCE: Do not look at panic_on_oops in the severity grading
The MCE tolerance levels control whether we panic on a machine check or do
something else like generating a signal and logging error information. This
is controlled by the mce=<level> command line parameter.

However, if panic_on_oops is set, it will force a panic for such an MCE
even though the user didn't want to.

So don't check panic_on_oops in the severity grading anymore.

One of the use cases for that is recovery from uncorrectable errors with
mce=2.

 [ Boris: rewrite commit message. ]

Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20160916202325.4972-1-yinghai@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:12 +01:00
Shaohua Li 53a114a690 x86/intel_rdt: Export the minimum number of set mask bits in sysfs
The minimum number of bits set for a cache mask is checked by the kernel
when writing a mask, but there is no way for the user to retrieve this
information.

Add a new file 'min_cbm_bits' to the info directory and export the
information to user space.

[ tglx: Massaged changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/e69b1ffa206d0353eea58101e1bf9b677d9732f7.1478207143.git.shli@fb.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07 12:20:52 +01:00
Shaohua Li 7bff0af510 x86/intel_rdt: Propagate error in rdt_mount() properly
gcc complains:
"warning: ‘dentry’ may be used uninitialized in this function"

The error exit path in rdt_mount(), which deals with a failure in
rdtgroup_create_info_dir(), does not set the error code in dentry and
returns the uninitialized dentry value.

Add the missing error propagation.

[tglx: Massaged changelog ]
Fixes: 4e978d06de ("x86/intel_rdt: Add "info" files to resctrl file system")
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/a56a556f6768dc12cadbf97f49e000189056f90e.1478207143.git.shli@fb.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07 12:20:52 +01:00
Fenghua Yu 4f341a5e48 x86/intel_rdt: Add scheduler hook
Hook the x86 scheduler code to update closid based on whether the current
task is assigned to a specific closid or running on a CPU assigned to a
specific closid.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:16 -06:00
Tony Luck 60ec2440c6 x86/intel_rdt: Add schemata file
Last of the per resource group files. Also mode 0644. This one shows
the resources available to the group. Syntax depends on whether the
"cdp" mount option was given. With code/data prioritization disabled
it is simply a list of masks for each cache domain. Initial value
allows access to all of the L3 cache on all domains. E.g. on a 2 socket
Broadwell:
        L3:0=fffff;1=fffff
With CDP enabled, separate masks for data and instructions are provided:
        L3DATA:0=fffff;1=fffff
        L3CODE:0=fffff;1=fffff

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:16 -06:00
Fenghua Yu e02737d5b8 x86/intel_rdt: Add tasks files
The root directory all subdirectories are automatically populated with a
read/write (mode 0644) file named "tasks". When read it will show all the
task IDs assigned to the resource group. Tasks can be added (one at a time)
to a group by writing the task ID to the file.  E.g.

Membership in a resource group is indicated by a new field in the
task_struct "int closid" which holds the CLOSID for each task. The default
resource group uses CLOSID=0 which means that all existing tasks when the
resctrl file system is mounted belong to the default group.

If a group is removed, tasks which are members of that group are moved to
the default group.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:15 -06:00
Tony Luck 12e0110c11 x86/intel_rdt: Add cpus file
Now we populate each directory with a read/write (mode 0644) file
named "cpus". This is used to over-ride the resources available
to processes in the default resource group when running on specific
CPUs.  Each "cpus" file reads as a cpumask showing which CPUs belong
to this resource group. Initially all online CPUs are assigned to
the default group. They can be added to other groups by writing a
cpumask to the "cpus" file in the directory for the resource group
(which will remove them from the previous group to which they were
assigned). CPU online/offline operations will delete CPUs that go
offline from whatever group they are in and add new CPUs to the
default group.

If there are CPUs assigned to a group when the directory is removed,
they are returned to the default group.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:15 -06:00
Fenghua Yu 60cf5e101f x86/intel_rdt: Add mkdir to resctrl file system
Resource control groups are represented as directories in the resctrl
file system. The root directory describes the default resources available
to tasks that have not been assigned specific resources. Other directories
can be created at the root level to make new resource groups. It is not
permitted to make directories within other directories.

Hardware uses a CLOSID (Class of service ID) to determine which resource
limits are currently in effect. The exact number available is enumerated
by CPUID leaf 0x10, but on current implementations it is a small number.
We implement a simple bitmask allocator for CLOSIDs.

Each resource control group uses one CLOSID, which limits the total number
of directories that can be created.

Resource groups can be removed using rmdir.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Fenghua Yu 4e978d06de x86/intel_rdt: Add "info" files to resctrl file system
For the convenience of applications we make the decoded values of some
of the CPUID values available in read-only (0444) files.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Fenghua Yu 5ff193fbde x86/intel_rdt: Add basic resctrl filesystem support
Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP).  The file system is not populated yet in this patch.

[ tglx: Fixed up a few nits and added cdp handling in case of error ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Tony Luck 2264d9c74d x86/intel_rdt: Build structures for each resource based on cache topology
We use the cpu hotplug notifier to catch each cpu in turn and look at
its cache topology w.r.t each of the resource groups. As we discover
new resources, we initialize the bitmask array for each to the default
(full access) value.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:13 -06:00
Alexey Makhalov 80e9a4f21f x86/vmware: Add paravirt sched clock
The default sched_clock() implementation is native_sched_clock(). It
contains code to handle non constant frequency TSCs, which creates
overhead for systems with constant frequency TSCs.

The vmware hypervisor guarantees a constant frequency TSC, so
native_sched_clock() is not required and slower than a dedicated function
which operates with one time calculated conversion factors.

Calculate the conversion factors at boot time from the tsc frequency and
install an optimized sched_clock() function via paravirt ops.

The paravirtualized clock can be disabled on the kernel command line with
the new 'no-vmw-sched-clock' option.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: linux-doc@vger.kernel.org
Cc: pv-drivers@vmware.com
Cc: corbet@lwn.net
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161028075432.90579-4-amakhalov@vmware.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 08:57:08 +01:00
Alexey Makhalov 91d1e54ebd x86/vmware: Add basic paravirt ops support
Add basic paravirt support:

 1. Set pv_info.name to "VMware hypervisor" to have proper boot log message
	Booting paravirtualized kernel on VMware hypervisor
    instead of "... on bare hardware"

 2. Set pv_cpu_ops.io_delay() to empty function - paravirt_noop() to
    avoid vm-exits on IO delays because io delays they are not required.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: linux-doc@vger.kernel.org
Cc: pv-drivers@vmware.com
Cc: corbet@lwn.net
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161028075432.90579-3-amakhalov@vmware.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 08:57:07 +01:00
Alexey Makhalov 687bca8d66 x86/vmware: Use tsc_khz value for calibrate_cpu()
Commit aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via
CPUID") separated the calibration mechanisms for cpu_khz and tsc_khz.

Since the vmware hypervisor provides a constant frequency TSC to the guest,
this change can lead to divergence between the tsc and the cpu frequency
after vmotion, which might confuse the user.

Solve this by overriding the x86 platform cpu calibration callback with the
vmware specific tsc calibration function.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: linux-doc@vger.kernel.org
Cc: pv-drivers@vmware.com
Cc: corbet@lwn.net
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161028075432.90579-2-amakhalov@vmware.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 08:57:07 +01:00
Borislav Petkov 1c27f646b1 x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
We needed the physical address of the container in order to compute the
offset within the relocated ramdisk. And we did this by doing __pa() on
the virtual address.

However, __pa() does checks whether the physical address is within
PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
which *doesn't* have the randomization offset into a function which uses
PAGE_OFFSET which *does* have that offset.

This makes this check fire:

	VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
			^^^^^^

due to the randomization offset.

The fix is as simple as using __pa_nodebug() because we do that
randomization offset accounting later in that function ourselves.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm <linux-mm@kvack.org>
Cc: stable@vger.kernel.org # 4.9
Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 10:29:59 +02:00
Fenghua Yu c1c7c3f9d6 x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
Define struct rdt_resource to hold all the parameterized values for an RDT
resource and fill in the CPUID enumerated values from leaf 0x10 if
available. Hard code them for the MSR detected Haswells.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:39 +02:00
Fenghua Yu 113c60970c x86/intel_rdt: Add Haswell feature discovery
Some Haswell generation CPUs support RDT, but they don't enumerate this via
CPUID.  Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63
(INTEL_FAM6_HASWELL_X)

Move the relevant defines into a common header file which is shared between
RDT/CQM and RDT/Allocation to avoid duplication.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:38 +02:00
Fenghua Yu 78e99b4a2b x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
control inclusion of Resource Director Technology in the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature for now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:38 +02:00
Fenghua Yu 4ab1586488 x86/cpufeature: Add RDT CPUID feature bits
Check CPUID leaves for all the Resource Director Technology (RDT)
Cache Allocation Technology (CAT) bits.

Presence of allocation features:
  CPUID.(EAX=7H, ECX=0):EBX[bit 15]	X86_FEATURE_RDT_A

L2 and L3 caches are each separately enabled:
  CPUID.(EAX=10H, ECX=0):EBX[bit 1]	X86_FEATURE_CAT_L3
  CPUID.(EAX=10H, ECX=0):EBX[bit 2]	X86_FEATURE_CAT_L2

L3 cache may support independent control of allocation for
code and data (CDP = Code/Data Prioritization):
  CPUID.(EAX=10H, ECX=1):ECX[bit 2]	X86_FEATURE_CDP_L3

[ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: "Borislav Petkov" <bp@suse.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:38 +02:00
Fenghua Yu d57e3ab7e3 x86/intel_cacheinfo: Enable cache id in cache info
Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.

For more details please see the section on "Cache ID Extraction
Parameters" in "Intel 64 Architecture Processor Topology Enumeration".

Also the documentation of the CPUID instruction in the "Intel 64 and
IA-32 Architectures Software Developer's Manual"

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:37 +02:00
Borislav Petkov 14cfbe55c7 x86/microcode: Bump driver version, update copyrights
Let's increment that number finally: it is long overdue.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-13-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:59 +02:00