Commit Graph

1286 Commits

Author SHA1 Message Date
Mauro Carvalho Chehab 66c222a02f edac: fix kernel-doc tags at the drivers/edac_*.h
Some kernel-doc tags don't provide good descriptions or use
a different style. Adjust them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:58:10 -02:00
Mauro Carvalho Chehab 6634fbb6b6 driver-api: create an edac.rst file with EDAC documentation
Currently, there's no device driver documentation for the EDAC
subsystem at the driver-api book. Fill in the blanks for the
structures and functions that misses documentation, uniform
the word on the existing ones, and add a new edac.rst file at
driver-api, in order to document the EDAC subsystem.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab e01aa14cf2 edac: move documentation from edac_mc.c to edac_core.h
Several functions are documented at edac_mc.c.

As we'll be including edac_core.h at drivers-api book, move
those, in order for the kernel-doc markups be part of the API
documentation book.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:52 -02:00
Mauro Carvalho Chehab fdaf0b3505 edac: move documentation from edac_pci*.c to edac_pci.h
Several functions are documented at edac_pci.c and edac_pci_sysfs.c.

As we'll be including edac_pci.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 5336f75499 edac: move documentation from edac_device to edac_core.h
Several functions are documented at edac_device.c.

As we'll be including edac_core.h at drivers-api book, move those,
in order for the kernel-doc markups be part of the API
documentation book.

As several of those kernel-doc macros are not in the right format,
fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 78d88e8a3d edac: rename edac_core.h to edac_mc.h
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 6d8ef24724 edac: move EDAC device definitions to drivers/edac/edac_device.h
The edac_core.h header contain data structures and function
definitions for both EDAC MC and EDAC device.

Let's move the devices ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:51 -02:00
Mauro Carvalho Chehab 0b892c7177 edac: move EDAC PCI definitions to drivers/edac/edac_pci.h
The edac_core.h header contain data structures and function
definitions for the 3 parts of EDAC: MC, PCI and device.

Let's move the PCI ones to a separate header file, as part
of a header reorganization.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:50 -02:00
Mauro Carvalho Chehab 7230617537 edac: edac_core.h: remove prototype for edac_pci_reset_delay_period()
This function doesn't exist. So, remove its prototype.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:49 -02:00
Mauro Carvalho Chehab a2c223b5ed edac: edac_core.h: get rid of unused kobj_complete
This element of struct edac_pci_ctl_info is never used. So,
get rid of it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15 08:54:49 -02:00
Pan Bian 0de2788447 EDAC, amd64: Fix improper return value
When the call to zalloc_cpumask_var() fails, returning "false" seems
improper. The real value of macro "false" is 0, and 0 means no error.
Return -ENOMEM instead.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189071

Signed-off-by: Pan Bian <bianpan2016@163.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1480831638-5361-1-git-send-email-bianpan201604@163.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-04 10:51:42 +01:00
Borislav Petkov 5246c54007 EDAC, amd64: Improve amd64-specific printing macros
Prefix the warn and error macros with the respective string so that
callers don't have to say "Error" or "Warning". We save us string length
this way in the actual calls.

While at it, shorten the calls in reserve_mc_sibling_devs().

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
2016-12-01 11:35:07 +01:00
Yazen Ghannam 95d3af6bd1 EDAC, amd64: Autoload amd64_edac_mod on Fam17h systems
Add Fam17h to the list of families to autoload amd64_edac_mod.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-18-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:49 +01:00
Yazen Ghannam 713ad54675 EDAC, amd64: Define and register UMC error decode function
How we need to decode UMC errors is different from how we decode bus
errors, so let's define a new function for this. We also need a way to
determine the UMC channel since we're not guaranteed that there is a
fixed relation between channel and MCA bank.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480359593-80369-1-git-send-email-Yazen.Ghannam@amd.com
[ Fold in decode_synd_reg(), simplify. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:48 +01:00
Yazen Ghannam d27f3a348e EDAC, amd64: Determine EDAC capabilities on Fam17h systems
We need to determine the EDAC capabilities from all UMCs on the node. We
should only check UMCs that are enabled and make sure they all agree.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-15-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:05:47 +01:00
Yazen Ghannam 2d09d8f301 EDAC, amd64: Determine EDAC MC capabilities on Fam17h
The UMCs on Fam17h are independent memory controllers so we need to
read the capabilities from all UMCs and make sure they agree. Once
we determine what capabilities are available we should save them for
convenience.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480431116-94683-1-git-send-email-Yazen.Ghannam@amd.com
[ Simplify f17h_determine_edac_ctl_cap(), preinit edac_mode in init_csrows(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 18:04:54 +01:00
Yazen Ghannam 07ed82ef93 EDAC, amd64: Add Fam17h debug output
Read a few more UMC registers and provide debug output in order to be as
similar as possible to older AMD systems.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1480344621-14966-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded K8 check and comments, fixup others. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-29 17:16:09 +01:00
Yazen Ghannam 8051c0af3c EDAC, amd64: Add Fam17h scrubber support
Fam17h has new register offsets and fields for setting up the DRAM
scrubber so add support for this.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-17-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:12 +01:00
Yazen Ghannam a6c14dce85 EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4
MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank
since Fam15h except for Fam15h, bank 4 in which case it's defined as
part of the McaStatSubCache bitfield.

Filter out that case.

Reported-by: Dean Liberty <Dean.Liberty@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com
[ Split an almost unparseable ternary conditional, add a comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:12 +01:00
Yazen Ghannam b64ce7cd7f EDAC, amd64: Read MC registers on AMD Fam17h
Fam17h has a different set of registers and bitfields. Most of these
registers are read through SMN (System Management Network) rather
than PCI config space. Also, the derivation of various values is now
different.

Update amd64_edac to read the appropriate registers and extract the
correct values for Fam17h.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-12-git-send-email-Yazen.Ghannam@amd.com
[ Save us the indentation level in read_mc_regs(), add defines ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:50:11 +01:00
Yazen Ghannam 936fc3afaa EDAC, amd64: Reserve correct PCI devices on AMD Fam17h
Fam17h needs PCI device functions 0 and 6 instead of 1 and 2 as on older
systems. Update struct amd64_pvt to hold the new functions and reserve
them if on Fam17h.

Also, allocate an array of UMC structs within our newly allocated PVT
struct.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-11-git-send-email-Yazen.Ghannam@amd.com
[ init_one_instance() error handling, shorten lines, unbreak >80 cols lines. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-28 17:49:40 +01:00
Yazen Ghannam f1cbbec9fc EDAC, amd64: Add AMD Fam17h family type and ops
Add a family type and associated ops for Fam17h. Define a struct to hold
all the UMC registers that we need. Make this a part of struct amd64_pvt
in order to maximize code reuse in the rest of the driver.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-10-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24 21:24:09 +01:00
Yazen Ghannam 196b79fcc8 EDAC, amd64: Extend ecc_enabled() to Fam17h
Update the ecc_enabled() function to work on Fam17h. This entails
reading a different set of registers and using the SMN (System
Management Network) rather than PCI devices.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-9-git-send-email-Yazen.Ghannam@amd.com
[ Fixup ecc_en assignment and get_umc_base(). ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-24 21:07:03 +01:00
Borislav Petkov 627bc29ed9 Merge tip:ras/core to pick up dependent changes
tip:ras/core contains the respective Fam17h x86 RAS bits which
amd64_edac is going to use. So merge it into the EDAC branch.

Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23 21:13:40 +01:00
Yazen Ghannam 044e7a414b EDAC, amd64: Don't force-enable ECC checking on newer systems
It's not recommended for the OS to try and force-enable ECC checking.
This is considered a firmware task since it includes memory training,
etc, so don't change ECC settings on Fam17h or newer systems and inform
the user.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479850816-1595-1-git-send-email-Yazen.Ghannam@amd.com
[ Put the "forcing" message in an else branch. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-23 19:04:11 +01:00
Yazen Ghannam d12a969ebb EDAC, amd64: Add Deferred Error type
Currently, deferred errors are classified as correctable in EDAC. Add a
new error type for deferred errors so that they are correctly reported
to the user.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:57:19 +01:00
Yazen Ghannam e70984d9eb EDAC, amd64: Rename __log_bus_error() to be more specific
We only use __log_bus_error() to log DRAM ECC errors, so let's change
the name to reflect this. We'll also use this function for DRAM ECC
errors on Fam17h, but we'll call it from a different function than
decode_bus_error().

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-6-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:42:55 +01:00
Yazen Ghannam e7934b70d7 EDAC, amd64: Change target of pci_name from F2 to F3
AMD Fam17h will not be using PCI function 2 for EDAC, but will continue
to use function 3. So let's get the name of F3 instead of F2 to support
Fam17h and previous families.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-5-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 10:24:20 +01:00
Yazen Ghannam 5c332202f8 EDAC, mce_amd: Rename nb_bus_decoder to dram_ecc_decoder
nb_bus_decoder() is only used for DRAM ECC errors so rename it so that
the name is more generic and descriptive.

Also, call it for DRAM ECC errors on SMCA systems.

[ Boris: rename it to real function name with a verb in it. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21 09:43:15 +01:00
Yanjiang Jin 27bda205ba EDAC, mpc85xx: Implement remove method for the platform driver
If we execute the below steps without this patch:

  modprobe mpc85xx_edac [The first insmod, everything is well.]
  modprobe -r mpc85xx_edac
  modprobe mpc85xx_edac [insmod again, error happens.]

We would get the error messages as below:

  BUG: recent printk recursion!
  Oops: Kernel access of bad area, sig: 11 [#48]
  Modules linked in: mpc85xx_edac edac_core softdog [last unloaded: mpc85xx_edac]
  CPU: 5 PID: 14773 Comm: modprobe Tainted: G D C 4.8.3-rt2
   .vsnprintf
   .vscnprintf
   .vprintk_emit
   .printk
   .edac_pci_add_device
   .mpc85xx_pci_err_probe
   .platform_drv_probe
   .driver_probe_device
   .__driver_attach
   .bus_for_each_dev
   .driver_attach
   .bus_add_driver
   .driver_register
   .__platform_register_drivers
   .mpc85xx_mc_init
   .do_one_initcall
   .do_init_module
   .load_module
   .SyS_finit_module
   system_call

Address this by cleaning up properly when removing the platform driver.

Tested on a T4240QDS board.

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: york.sun@nxp.com
Link: http://lkml.kernel.org/r/1479351380-17109-2-git-send-email-yanjiang.jin@windriver.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-17 11:19:50 +01:00
Colin Ian King 8176170e03 EDAC, xgene: Fix spelling mistake in error messages
Trivial fix to spelling mistake "Mutilple" to "Multiple" in error
messages.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Loc Ho <lho@apm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161114231104.5585-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-15 11:10:02 +01:00
Borislav Petkov c73e8833be EDAC, mc: Fix locking around mc_devices list
When accessing the mc_devices list of memory controller descriptors, we
need to hold mem_ctls_mutex. This was not always the case, fix that.

Make all external callers call a version which grabs the mutex since the
last is local to edac_mc.c.

Reported-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-14 13:26:11 +01:00
Borislav Petkov c09a8c40e0 x86/RAS: Hide SMCA bank names
Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:15 +01:00
Borislav Petkov a9a1c0ee04 x86/RAS: Rename smca_bank_names to smca_names
Make it differ more from struct smca_bank_name for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov 1ce9cd7f9f x86/RAS: Simplify SMCA HWID descriptor struct
Call it simply smca_hwid and call local variables "hwid". More readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Thor Thayer 90e493d7d5 EDAC, altera: Disable IRQs while injecting SDRAM errors
Disable IRQs while injecting SDRAM errors. The RT patches exposed
a spinlock deadlock where the spinlock taken for the regmap write
deadlocked with the IRQ clear regmap write.

Error injection is not normally enabled for ECC but only for testing.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1476906827-9412-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-22 20:14:03 +02:00
Wei Yongjun 240ea9214a EDAC, skx_edac: Fix non static symbol warnings
Fix the following sparse warnings:

  drivers/edac/skx_edac.c:266:25: warning:
   symbol 'skx_cpuids' was not declared. Should it be static?
  drivers/edac/skx_edac.c:1040:12: warning:
   symbol 'skx_init' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477147098-2842-1-git-send-email-weiyj.lk@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-22 20:10:38 +02:00
Piotr Luc 9a9260ca92 EDAC, sb_edac: Add Knights Mill support
Add Knights Mill (KNM) to the list of CPU models supported by sb_edac.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-19 12:37:44 +02:00
Dave Hansen 20f4d69243 EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them
We now have symbolic names for a bunch of Intel CPU models via
asm/intel-family.h. The original conversion missed the EDAC drivers.
Convert them.

Signed-off-by: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com
[ Remove comment, macro name is descriptive enough. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-10-19 12:32:40 +02:00
Linus Torvalds 19fe416532 * Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
buffers (Thor Thayer)
 
 * Split the memory controller part out of mpc85xx and share it with a
 * new Freescale ARM Layerscape driver (York Sun)
 
 * amd64_edac fixes (Yazen Ghannam)
 
 * Misc cleanups, refactoring and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30
 8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb
 6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U
 4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea
 t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1
 I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a
 iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm
 Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq
 1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus
 WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq
 qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6
 dmDAtQbKUUwiltB8QzQd
 =pcI8
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:
 "A lot of movement in the EDAC tree this time around, coarse summary
  below:

   - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO
     buffers (Thor Thayer)

   - split the memory controller part out of mpc85xx and share it with a
     new Freescale ARM Layerscape driver (York Sun)

   - amd64_edac fixes (Yazen Ghannam)

   - misc cleanups, refactoring and fixes all over the place"

* tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits)
  EDAC, altera: Add IRQ Flags to disable IRQ while handling
  EDAC, altera: Correct EDAC IRQ error message
  EDAC, amd64: Autoload module using x86_cpu_id
  EDAC, sb_edac: Remove NULL pointer check on array pci_tad
  EDAC: Remove NO_IRQ from powerpc-only drivers
  EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe()
  EDAC, fsl_ddr: Add entry to MAINTAINERS
  EDAC: Move Doug Thompson to CREDITS
  EDAC, I3000: Orphan driver
  EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul()
  EDAC, layerscape: Add Layerscape EDAC support
  EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed
  EDAC, fsl_ddr: Add support for little endian
  EDAC, fsl_ddr: Add missing DDR DRAM types
  EDAC, fsl_ddr: Rename macros and names
  EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx
  EDAC, mpc85xx: Replace printk() with pr_* format
  EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1
  EDAC, altera: Rename MC trigger to common name
  EDAC, altera: Rename device trigger to common name
  ...
2016-10-04 12:06:26 -07:00
Thor Thayer a29d64a45e EDAC, altera: Add IRQ Flags to disable IRQ while handling
Add the IRQF_ONESHOT and IRQF_TRIGGER_HIGH flags to disable the IRQ
while executing the IRQ handler. Remove the IRQF_SHARED because these
are not shared IRQs in the domain. Exposed when flooding IRQs.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-2-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-23 12:03:34 +02:00
Thor Thayer 3763569f4c EDAC, altera: Correct EDAC IRQ error message
Correct the error message sent out in the case of a single bit error IRQ
allocation.

Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1474582419-7053-1-git-send-email-tthayer@opensource.altera.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-23 11:52:39 +02:00
Yazen Ghannam d6efab74f6 EDAC, amd64: Autoload module using x86_cpu_id
Reinstate driver autoloading now that PCI dependency is gone.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1473984445-1726-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-21 12:48:15 +02:00
Yazen Ghannam a884675b87 x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly
Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
records. However, broken hardware and software is not something unheard
of so warn about bank 4 errors. They shouldn't be coming from bank 4
naturally but users can still use mce_amd_inj to simulate errors from it
for testing purposed.

Also, avoid special handling in the injector mce_amd_inj like it is
being done on the older families.

[ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Aravind Gopalakrishnan  <aravindksg.lkml@gmail.com>
Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:14 +02:00
Yazen Ghannam 4b711f92c9 x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems
The MCA_SYND and MCA_IPID registers contain valuable information and
should be included in MCE output. The MCA_SYND register contains
syndrome and other error information, and the MCA_IPID register will
uniquely identify the MCA bank's type without having to rely on system
software.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:13 +02:00
Yazen Ghannam 5896820e0a x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types
Scalable MCA defines a number of IP types. An MCA bank on an SMCA
system is defined as one of these IP types. A bank's type is uniquely
identified by the combination of the HWID and MCATYPE values read from
its MCA_IPID register.

Add the required tables in order to be able to lookup error descriptions
based on a bank's type and the error's extended error code.

[ bp: Align comments, simplify a bit. ]

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:10 +02:00
Yazen Ghannam 856095b179 EDAC/mce_amd: Use SMCA prefix for error descriptions arrays
The error descriptions defined for Fam17h can be reused for other SMCA
systems, so their names should reflect this.

Change f17h prefix to smca for error descriptions.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:09 +02:00
Yazen Ghannam c019b951e1 EDAC/mce_amd: Add missing SMCA error descriptions
Add missing SMCA error descriptions to the error descriptions arrays.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-3-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:09 +02:00
Yazen Ghannam b300e87300 EDAC/mce_amd: Print syndrome register value on SMCA systems
Print SyndV bit status and print the raw value of the MCA_SYND register.
Further decoding of the syndrome from struct mce.synd can be done in
other places where appropriate, e.g. DRAM ECC.

Boris: make the error stanza more compact by putting the error address
and syndrome on the same line:

  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-|CE|-|PCC|AddrV|-|-|SyndV|CECC]: 0x96204100001e0117
  [Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000
  [Hardware Error]: Invalid IP block specified.
  [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13 15:23:07 +02:00
Colin Ian King c7c35407cd EDAC, sb_edac: Remove NULL pointer check on array pci_tad
pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and
hence cannot be NULL, so the NULL pointer check on pci_tad is redundant.
Remove it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2016-09-12 20:15:43 +02:00