linux

Commit Graph

Author	SHA1	Message	Date
Qiuxu Zhuo	8807e15597	EDAC, {skx,i10nm}: Use CPU stepping macro to pass configurations Use the X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro to pass CPU stepping specific configurations to {skx,i10nm}_init(), so can delete the CPU stepping check from 10nm_init(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200509010822.76331-1-qiuxu.zhuo@intel.com	2020-06-15 14:50:39 -07:00
Zhenzhong Duan	e9ff6636d3	EDAC/mc: Call edac_inc_ue_error() before panic By calling edac_inc_ue_error() before panic, we get a correct UE error count for core dump analysis. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-2-zhenzhong.duan@gmail.com	2020-06-15 11:19:52 -07:00
Zhenzhong Duan	30bf38e434	EDAC, pnd2: Set MCE_PRIO_EDAC priority for pnd2_mce_dec notifier Avoid giving it MCE_PRIO_LOWEST priority by default. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-1-zhenzhong.duan@gmail.com	2020-06-15 11:19:39 -07:00
Linus Torvalds	6adc19fd13	Kbuild updates for v5.8 (2nd) - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7lBuYVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGHvIP/3iErjPshpg/phwH8NTCS4SFkiti BZRM+2lupSn7Qs53BTpVzIkXoHBJQZlJxlQ5HY8ScO+fiz28rKZr+b40us+je1Q+ SkvSPfwZzxjEg7lAZutznG4KgItJLWJKmDyh9T8Y8TAuG4f8WO0hKnXoAp3YorS2 zppEIxso8O5spZPjp+fF/fPbxPjIsabGK7Jp2LpSVFR5pVDHI/ycTlKQS+MFpMEx 6JIpdFRw7TkvKew1dr5uAWT5btWHatEqjSR3JeyVHv3EICTGQwHmcHK67cJzGInK T51+DT7/CpKtmRgGMiTEu/INfMzzoQAKl6Fcu+vMaShTN97Hk9DpdtQyvA6P/h3L 8GA4UBct05J7fjjIB7iUD+GYQ0EZbaFujzRXLYk+dQqEJRbhcCwvdzggGp0WvGRs 1f8/AIpgnQv8JSL/bOMgGMS5uL2dSLsgbzTdr6RzWf1jlYdI1i4u7AZ/nBrwWP+Z iOBkKsVceEoJrTbaynl3eoYqFLtWyDau+//oBc2gUvmhn8ioM5dfqBRiJjxJnPG9 /giRj6xRIqMMEw8Gg8PCG7WebfWxWyaIQwlWBbPok7DwISURK5mvOyakZL+Q25/y 6MBr2H8NEJsf35q0GTINpfZnot7NX4JXrrndJH8NIRC7HEhwd29S041xlQJdP0rs E76xsOr3hrAmBu4P =1NIT -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues	2020-06-13 13:29:16 -07:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Thomas Gleixner	f77d26a9fc	Merge branch 'x86/entry' into ras/core to fixup conflicts in arch/x86/kernel/cpu/mce/core.c so MCE specific follow up patches can be applied without creating a horrible merge conflict afterwards.	2020-06-11 15:17:57 +02:00
Borislav Petkov	2a02ca0428	Merge branches 'edac-i10nm' and 'edac-misc' into edac-updates-for-5.8 Signed-off-by: Borislav Petkov <bp@suse.de>	2020-06-01 11:39:15 +02:00
Colin Ian King	f00eb5ff2f	EDAC/amd64: Remove redundant assignment to variable ret in hw_info_get() The variable ret is being assigned with a value that is never read and it is being updated later with a new value. The initialization is redundant so remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200429154847.287001-1-colin.king@canonical.com	2020-05-29 15:15:02 +02:00
Alexander Monakov	b6bea24d41	EDAC/amd64: Add AMD family 17h model 60h PCI IDs Add support for AMD Renoir (4000-series Ryzen CPUs). Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20200510204842.2603-4-amonakov@ispras.ru	2020-05-22 18:43:13 +02:00
Qiuxu Zhuo	1032095053	EDAC/skx: Use the mcmtr register to retrieve close_pg/bank_xor_enable The skx_edac driver wrongly uses the mtr register to retrieve two fields close_pg and bank_xor_enable. Fix it by using the correct mcmtr register to get the two fields. Cc: <stable@vger.kernel.org> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reported-by: Matthew Riley <mattdr@google.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200515210146.1337-1-tony.luck@intel.com	2020-05-19 15:11:29 -07:00
Qiuxu Zhuo	ce20670828	EDAC/i10nm: Update driver to support different bus number config register offsets The i10nm_edac driver failed to load on Ice Lake and Tremont/Jacobsville servers if their CPU stepping >= 4 and failed on Ice Lake-D servers from stepping 0. The root cause was that for Ice Lake and Tremont/Jacobsville servers with CPU stepping >=4, the offset for bus number configuration register was updated from 0xcc to 0xd0. For Ice Lake-D servers, all the steppings use the updated 0xd0 offset. Fix the issue by using the appropriate offset for bus number configuration register according to the CPU model number and stepping. Reported-by: Jerry Chen <jerry.t.chen@intel.com> Reported-and-tested-by: Jin Wen <wen.jin@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/linux-edac/20200427084022.GC11036@zn.tnic	2020-04-27 09:40:49 -07:00
Qiuxu Zhuo	ee5340abab	EDAC, {skx,i10nm}: Make some configurations CPU model specific The device ID for configuration agent PCI device and the offset for bus number configuration register can be CPU model specific. So add a new structure res_config to make them configurable and pass res_config to {skx,i10nm}_init() and skx_get_all_bus_mappings() for use. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200427083246.GB11036@zn.tnic	2020-04-27 09:29:41 -07:00
Jason Yan	b2f9fb0d67	EDAC/amd8131: Remove defined but not used bridge_str Fix the following gcc warning: drivers/edac/amd8131_edac.c:47:21: warning: ‘bridge_str’ defined but not used [-Wunused-const-variable=] static char * const bridge_str[] = { ^~~~~~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Robert Richter <rrichter@marvell.com> Link: https://lkml.kernel.org/r/20200415085006.6732-1-yanaijie@huawei.com	2020-04-24 09:08:47 +02:00
Zou Wei	58d66175d4	EDAC/thunderx: Make symbols static Make a couple of symbols static, as reported by sparse. [ bp: Massage. ] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1587624744-97240-1-git-send-email-zou_wei@huawei.com	2020-04-23 12:07:24 +02:00
Tony Luck	7fc0b9b995	EDAC: Drop the EDAC report status checks When acpi_extlog was added, we were worried that the same error would be reported more than once by different subsystems. But in the ensuing years I've seen complaints that people could not find an error log (because this mechanism suppressed the log they were looking for). Rip it all out. People are smart enough to notice the same address from different reporting mechanisms. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-8-tony.luck@intel.com	2020-04-14 16:01:01 +02:00
Tony Luck	23ba710a08	x86/mce: Fix all mce notifiers to update the mce->kflags bitmask If the handler took any action to log or deal with the error, set a bit in mce->kflags so that the default handler on the end of the machine check chain can see what has been done. Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers skip over errors already processed by CEC. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com	2020-04-14 15:59:26 +02:00
Borislav Petkov	3e0fdec858	x86/mce/amd, edac: Remove report_gart_errors ... because no one should be interested in spurious MCEs anyway. Make the filtering unconditional and move it to amd_filter_mce(). Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de	2020-04-14 15:53:46 +02:00
Jason Yan	87a4eca891	EDAC/xgene: Remove set but not used address local var Fix the following gcc warning: drivers/edac/xgene_edac.c:1486:7: warning: variable ‘address’ set but not used [-Wunused-but-set-variable] u32 address; ^~~~~~~ Remove the unused macro RBERRADDR_RD while at it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200409093259.20069-1-yanaijie@huawei.com	2020-04-14 14:35:19 +02:00
Christophe JAILLET	493362dd7b	EDAC/armada_xp: Fix some log messages Fix spelling (s/Aramda/Armada/) in a log message and in a comment. While at it, add a trailing '\n' in messages. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lkml.kernel.org/r/20200413041556.3514-1-christophe.jaillet@wanadoo.fr	2020-04-14 11:28:09 +02:00
Linus Torvalds	9b82f05f86	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - A couple of x86/cpu cleanups and changes were grandfathered in due to patch dependencies. These clean up the set of CPU model/family matching macros with a consistent namespace and C99 initializer style. - A bunch of updates to various low level PMU drivers: * AMD Family 19h L3 uncore PMU * Intel Tiger Lake uncore support * misc fixes to LBR TOS sampling - optprobe fixes - perf/cgroup: optimize cgroup event sched-in processing - misc cleanups and fixes Tooling side changes are to: - perf {annotate,expr,record,report,stat,test} - perl scripting - libapi, libperf and libtraceevent - vendor events on Intel and S390, ARM cs-etm - Intel PT updates - Documentation changes and updates to core facilities - misc cleanups, fixes and other enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits) cpufreq/intel_pstate: Fix wrong macro conversion x86/cpu: Cleanup the now unused CPU match macros hwrng: via_rng: Convert to new X86 CPU match macros crypto: Convert to new CPU match macros ASoC: Intel: Convert to new X86 CPU match macros powercap/intel_rapl: Convert to new X86 CPU match macros PCI: intel-mid: Convert to new X86 CPU match macros mmc: sdhci-acpi: Convert to new X86 CPU match macros intel_idle: Convert to new X86 CPU match macros extcon: axp288: Convert to new X86 CPU match macros thermal: Convert to new X86 CPU match macros hwmon: Convert to new X86 CPU match macros platform/x86: Convert to new CPU match macros EDAC: Convert to new X86 CPU match macros cpufreq: Convert to new X86 CPU match macros ACPI: Convert to new X86 CPU match macros x86/platform: Convert to new CPU match macros x86/kernel: Convert to new CPU match macros x86/kvm: Convert to new CPU match macros x86/perf/events: Convert to new CPU match macros ...	2020-03-30 16:40:08 -07:00
Borislav Petkov	41dac9a2ad	Merge branches 'edac-mc-cleanup', 'edac-misc', 'edac-drivers' and 'edac-urgent' into edac-updates-for-5.7 Signed-off-by: Borislav Petkov <bp@suse.de>	2020-03-30 10:07:58 +02:00
Ingo Molnar	629b3df7ec	Merge branch 'x86/cpu' into perf/core, to resolve conflict Conflicts: arch/x86/events/intel/uncore.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-03-25 15:20:44 +01:00
Thomas Gleixner	298426211c	EDAC: Convert to new X86 CPU match macros The new macro set has a consistent namespace and uses C99 initializers instead of the grufty C89 ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200320131509.673579000@linutronix.de	2020-03-24 21:32:28 +01:00
Takashi Iwai	215a423cc0	EDAC/armada_xp: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lkml.kernel.org/r/20200311071728.4541-1-tiwai@suse.de	2020-03-17 19:26:09 +01:00
Sherry Sun	2fb3f6e125	EDAC/synopsys: Do not dump uninitialized pinf->col On the ZynqMP platform, zynqmp_get_error_info() is used to read out error information. In this function, the pinf->col parameter is not used (it is only used by the Zynq platform's zynq_get_error_info()). So there's no need to print pinf->col on ZynqMP. In order to differentiate on which platform handle_error() is executed, use DDR_ECC_INTR_SUPPORT as the check condition to distinguish between Zynq and ZynqMP platforms. [ bp: Massage. ] Fixes: `b500b4a029` ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Manish Narani <manish.narani@xilinx.com> Link: https://lkml.kernel.org/r/1584365679-27443-1-git-send-email-sherry.sun@nxp.com	2020-03-17 14:32:31 +01:00
Sherry Sun	dfc6014e3b	EDAC/synopsys: Do not print an error with back-to-back snprintf() calls handle_error() currently calls snprintf() a couple of times in succession to output the message for a CE/UE, therefore overwriting each part of the message which was formatted with the previous snprintf() call. As a result, only the part of the message from the last snprintf() call will be printed. The simplest and most effective way to fix this problem is to combine the whole string into one which to supply to a single snprintf() call. [ bp: Massage. ] Fixes: `b500b4a029` ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Cc: Manish Narani <manish.narani@xilinx.com> Link: https://lkml.kernel.org/r/1582792452-32575-1-git-send-email-sherry.sun@nxp.com	2020-02-27 16:44:25 +01:00
Lei Wang	1088750d78	EDAC: Add EDAC driver for DMC520 The driver supports error detection and correction on devices with an ARM DMC-520 memory controller. Signed-off-by: Lei Wang <leiwang_git@outlook.com> Signed-off-by: Shiping Ji <shiping.linux@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lkml.kernel.org/r/83b48c70-dc06-d0d4-cae9-a2187fca628b@gmail.com	2020-02-19 21:00:27 +01:00
Prarit Bhargava	52cff04a81	EDAC/mce_amd: Print !SMCA processor warning only once This warning is output for every virtual CPU in a guest on an EPYC 2 system because kvm doesn't enable SMCA. Once is enough too. [ bp: Massage. ] Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200217134627.19765-1-prarit@redhat.com	2020-02-18 17:20:41 +01:00
Robert Richter	4aa92c8646	EDAC/mc: Remove per layer counters Looking at how mci->{ue,ce}_per_layer[EDAC_MAX_LAYERS] is used, it turns out that only the leaves in the memory hierarchy are consumed (in sysfs), but not the intermediate layers, e.g.: count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx]; These unused counters only add complexity, remove them. The error counter values are directly stored in struct dimm_info now. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-11-rrichter@marvell.com	2020-02-17 13:37:00 +01:00
Robert Richter	1853ee7299	EDAC/mc: Remove detail[] string and cleanup error string generation The error descriptor is passed to the error reporting functions, so the error details can be directly generated there. Move string generation from edac_raw_mc_handle_error() to edac_ce_error() and edac_ue_error(). The intermediate detail[] string can be removed then. Also, cleanup the string generation by switching to a single variant only using the ternary operator. [ bp: put ternary operators on a separate line for better readability and use the short-form "inline if" in edac_mc_handle_error(). ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-10-rrichter@marvell.com	2020-02-17 13:36:28 +01:00
Robert Richter	6ab76179ad	EDAC/mc: Pass the error descriptor to error reporting functions Most arguments of error reporting functions are already stored in the struct edac_raw_error_desc error descriptor. Pass the error descriptor to the functions and reduce the functions' argument list. [ bp: Sort function args in reverse fir tree order. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-9-rrichter@marvell.com	2020-02-17 13:35:30 +01:00
Robert Richter	67792cf958	EDAC/mc: Remove enable_per_layer_report function argument Many functions carry the enable_per_layer_report argument. This is a bool value indicating the error information contains some location data where the error occurred. This can easily being determined by checking the pos[] array for values. Negative values indicate there is no location available. So if the top layer is negative, the error location is unknown. Just check if the top layer is negative and remove enable_per_layer_report as function argument and also from struct edac_raw_error_desc. [ bp: Reflow comments to 80 columns, while at it. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-8-rrichter@marvell.com	2020-02-17 13:13:16 +01:00
Robert Richter	65bb4d1af9	EDAC/mc: Report "unknown memory" on too many DIMM labels found There is a limitation to report only EDAC_MAX_LABELS in e->label of the error descriptor. This is to prevent a potential string overflow. The current implementation falls back to "any memory" in this case and also stops all further processing to find a unique row and channel of the possible error location. Reporting "any memory" is wrong as the memory controller reported an error location for one of the layers. Instead, report "unknown memory" and also do not break early in the loop to further check row and channel for uniqueness. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-7-rrichter@marvell.com	2020-02-17 13:10:14 +01:00
Robert Richter	6334dc4e3f	EDAC/mc: Carve out error increment into a separate function Carve out the error_count increment into a separate function edac_inc_csrow(). This better separates code and reduces the indentation level. Implementation note: The function edac_inc_csrow() counts the same as before, ->ce_count is only incremented if row >= 0. This is esp. true for the case of (!e->enable_per_layer_report). Here, a DIMM was not found, variable row still has a value of -1 and ->ce_count is not incremented. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200214141757.8976-1-rrichter@marvell.com	2020-02-17 13:07:50 +01:00
Robert Richter	91b327f672	EDAC/mc: Determine mci pointer from the error descriptor Each struct mci has its own error descriptor. Create a function error_desc_to_mci() to determine the corresponding mci from an error descriptor. This removes @mci from the parameter list of edac_raw_mc_handle_error() as the mci pointer does not need to be passed any longer. [ bp: Massage commit message. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-5-rrichter@marvell.com	2020-02-17 13:05:10 +01:00
Robert Richter	672ef0e568	EDAC: Store error type in struct edac_raw_error_desc Store the error type in struct edac_raw_error_desc. This makes the type parameter of edac_raw_mc_handle_error() obsolete. [ kernel-doc typo ] Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com	2020-02-17 13:02:30 +01:00
Robert Richter	1f27c79062	EDAC/mc: Reorder functions edac_mc_alloc*() Reorder the new created functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further code changes. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-3-rrichter@marvell.com	2020-02-17 12:57:18 +01:00
Robert Richter	aad28c6f6b	EDAC/mc: Split edac_mc_alloc() into smaller functions edac_mc_alloc() is huge. Factor out code by moving it to the two new functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not move code yet for better review. [ bp: sort local args in reversed fir tree order. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-2-rrichter@marvell.com	2020-02-17 12:47:50 +01:00
Robert Richter	bea1bfd5b7	EDAC/mc: Change mci device removal to use put_device() There are dimm and csrow devices linked to the mci device esp. to show up in sysfs. It must be granted that children devices are removed before its mci parent. Thus, the release functions must be called in the correct order and may not miss any child before releasing its parent. In the current implementation this is only granted by the correct order of release functions. A much better approach is to use put_device() that releases the device only after all users are gone. It is the recommended way to release a device and free its memory. The function uses the device's refcount and only frees it if there are no users of it anymore such as children. So implement a mci_release() function to remove mci devices, use put_device() to free them and early initialize the mci device right after its struct has been allocated. Change the release function so that it can be universally used no matter if the device is registered or not. Since subsequent dimm and csrow sysfs links are implemented as children devices, their refcounts will keep the parent mci device from being removed as long as sysfs entries exist and until all users have been unregistered in edac_remove_sysfs_mci_device(). Remove edac_unregister_sysfs() and merge mci sysfs removal into edac_remove_sysfs_mci_device(). There is only a single instance now that removes the sysfs entries. The function can now be used in the error paths for cleanup. Also, create device release functions for all involved devices (dev->release), remove device_type release functions (dev_type-> release) and also use dev->init_name instead of dev_set_name(). [ bp: Massage commit message and comments. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com	2020-02-17 12:32:44 +01:00
Robert Richter	4d59588c09	EDAC/sysfs: Remove csrow objects on errors All created csrow objects must be removed in the error path of edac_create_csrow_objects(). The objects have been added as devices. They need to be removed by doing a device_del() and put_device() call to also free their memory. The missing put_device() leaves a memory leak. Use device_unregister() instead of device_del() which properly unregisters the device doing both. Fixes: `7adc05d2dc` ("EDAC/sysfs: Drop device references properly") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com	2020-02-13 13:29:41 +01:00
Robert Richter	216aa145aa	EDAC/mc: Fix use-after-free and memleaks during device removal A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device: 1) Use-after-free: On 27.11.19 17:07:33, John Garry wrote: > [ 22.104498] BUG: KASAN: use-after-free in > edac_remove_sysfs_mci_device+0x148/0x180 The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with `c498afaf7d` ("EDAC: Introduce an mci_for_each_dimm() iterator"). The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already. The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed. This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs. 2) Memory leaks: Following memory leaks have been detected: # grep edac /sys/kernel/debug/kmemleak \| sort \| uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc() All leaks are from memory allocated by edac_mc_alloc(). Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump. The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free(). There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise. The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()). [ bp: Massage commit message. ] Fixes: `c498afaf7d` ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: `faa2ad09c0` ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: `7a623c0390` ("edac: rewrite the sysfs code to use struct device") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com	2020-02-13 13:28:52 +01:00
Linus Torvalds	6a1000bd27	ioremap changes for 5.6 - remove ioremap_nocache given that is is equivalent to ioremap everywhere -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl4vKHwLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMPGBAAuVNUZaZfWYHpiVP2oRcUQUguFiD3NTbknsyzV2oH J9P0GfeENSKwE9OOhZ7XIjnCZAJwQgTK/ppQY5yiQ/KAtYyyXjXEJ6jqqjiTDInr +3+I3t/LhkgrK7tMrb7ylTGa/d7KhaciljnOXC8+b75iddvM9I1z2pbHDbppZMS9 wT4RXL/cFtRb85AfOyPLybcka3f5P2gGvQz38qyimhJYEzHDXZu9VO1Bd20f8+Xf eLBKX0o6yWMhcaPLma8tm0M0zaXHEfLHUKLSOkiOk+eHTWBZ3b/w5nsOQZYZ7uQp 25yaClbameAn7k5dHajduLGEJv//ZjLRWcN3HJWJ5vzO111aHhswpE7JgTZJSVWI ggCVkytD3ESXapvswmACSeCIDMmiJMzvn6JvwuSMVB7a6e5mcqTuGo/FN+DrBF/R IP+/gY/T7zIIOaljhQVkiEIIwiD/akYo0V9fheHTBnqcKEDTHV4WjKbeF6aCwcO+ b8inHyXZSKSMG//UlDuN84/KH/o1l62oKaB1uDIYrrL8JVyjAxctWt3GOt5KgSFq wVz1lMw4kIvWtC/Sy2H4oB+RtODLp6yJDqmvmPkeJwKDUcd/1JKf0KsZ8j3FpGei /rEkBEss0KBKyFAgBSRO2jIpdj2epgcBcsdB/r5mlhcn8L77AS6mHbA173kY4pQ/ Kdg= =TUCJ -----END PGP SIGNATURE----- Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap	2020-01-27 13:03:00 -08:00
Linus Torvalds	30f5a75640	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Misc fixes to the MCE code all over the place, by Jan H. Schönherr. - Initial support for AMD F19h and other cleanups to amd64_edac, by Yazen Ghannam. - Other small cleanups. * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/mce_amd: Make fam_ops static global EDAC/amd64: Drop some family checks for newer systems EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh x86/amd_nb: Add Family 19h PCI IDs EDAC/mce_amd: Always load on SMCA systems x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType x86/mce: Fix use of uninitialized MCE message string x86/mce: Fix mce=nobootlog x86/mce: Take action on UCNA/Deferred errors again x86/mce: Remove mce_inject_log() in favor of mce_log() x86/mce: Pass MCE message to mce_panic() on failed kernel recovery x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused	2020-01-27 09:19:35 -08:00
Linus Torvalds	b62061b82a	A garden variety of small fixes all over the place. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl4uvKUACgkQEsHwGGHe VUoG5RAAoSdq9LvBPWTtJUMckxZ124hAqvtDOqvAbCnde3V2yICZLfOuGZSuWBy0 +ED2Ebwnf6mxzB2vEEI1OQ0+LmmRXvc+UbI2z7SnC/4sLZnS7h9wDrxN41nR0dyB cMx0JB0ClyMW71WJo4Jbp9B+fiB4rBu/gmrXc0kC3L6ShmYvWzGdrPIV31qXPPYn zjqOn+tBENi7J+8omVsFg/KtAK9BTEk7FLuAz2mevLJzZGwycQBk/sigDHTuXJC2 AACPmaQAqZgOYEOWID+F1USgD7x3yfYEZUBQWokqmDJDkN6agvcScQWoYh2PGobR wxJhGZR+Uq3NU0BsdoKlI4OCI/Hf+WYBAhpSdlu1/Wle3Wq4wGc91wisFtnpXU6s 2VfiUtpf9WTROUeyHvhXcKgnjTwmRxSOBmaIhSYSL/aQ34AJaMmaeS8Vj1vpamrp 1dotRIQKmjI8yDRs4JBfsJX3KtMjJ+u1qiz7SfwqBv1dqypSamTkvjpbPhItUrAz Hv9QHWQAOmCLBqWw3Jx2W9gwd7Z1gYmpvo9mqyUy81eF/BDNeJTi1zRkNL2jqElU gu6gaOheYr7UumA9LDsZUqh3cGgGii73pp2Fc1A2QvL9HeD++UVoXOlwjo8VEs1b +u+DRDKNy3OwgCLMGfK0zuWfvTRqgV+OnlPnBH2/2PnYpE9tawU= =fR3M -----END PGP SIGNATURE----- Merge tag 'edac_for_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "A totally boring branch this time around: a garden variety of small fixes all over the place" * tag 'edac_for_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Do not warn when removing instances EDAC/sifive: Fix return value check in ecc_register() EDAC/aspeed: Remove unneeded semicolon EDAC: remove set but not used variable 'ecc_loc' EDAC: skx_common: downgrade message importance on missing PCI device EDAC/Kconfig: Fix Kconfig indentation	2020-01-27 09:16:22 -08:00
Borislav Petkov	7e5d6cf353	EDAC/amd64: Do not warn when removing instances On machines which do not populate all nodes with DIMMs, the driver doesn't initialize an instance there. However, the instance removal remove_one_instance() path will warn unconditionally, which is wrong. Remove the WARN_ON() even if the warning is innocent because it causes a splat in dmesg. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200117115939.5524-1-bp@alien8.de	2020-01-17 13:00:06 +01:00
Wei Yongjun	6cd18453b6	EDAC/sifive: Fix return value check in ecc_register() In case of error, the function edac_device_alloc_ctl_info() returns a NULL pointer, not ERR_PTR(). Replace the IS_ERR() test in the return value check with a NULL test. Fixes: `91abaeaaff` ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200115150303.112627-1-weiyongjun1@huawei.com	2020-01-17 01:37:51 +01:00
Borislav Petkov	86e9f9d60e	EDAC/mce_amd: Make fam_ops static global ... and do not kmalloc a three-pointer struct. Which simplifies mce_amd_init() a bit. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200116163403.GF27148@zn.tnic	2020-01-16 21:52:48 +01:00
Yazen Ghannam	dcd01394ce	EDAC/amd64: Drop some family checks for newer systems In general, "pvt->umc != NULL" is used to check if the system is Family 17h+. However, there are a few places that are using direct family checks. Replace the remaining family checks with a check for "pvt->umc != NULL". Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-6-Yazen.Ghannam@amd.com	2020-01-16 17:09:29 +01:00
Yazen Ghannam	2eb61c91c3	EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh Add family ops to support AMD Family 19h systems. Existing Family 17h functions can be used. Also, add Family 19h to the list of families to automatically load the module. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-5-Yazen.Ghannam@amd.com	2020-01-16 17:09:23 +01:00
Yazen Ghannam	9f6aef8631	EDAC/mce_amd: Always load on SMCA systems MCA error decoding on SMCA systems is not dependent on family. Return success early if the system supports the SMCA feature. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-3-Yazen.Ghannam@amd.com	2020-01-16 17:09:13 +01:00

1 2 3 4 5 ...

1647 Commits