linux_old1

Commit Graph

Author	SHA1	Message	Date
Aristeu Rozanski	cf40f80cbe	sb_edac: use "event" instead of "exception" when MC wasnt signaled Corrected Errors are MC events, not exceptions and reporting as the later might confuse users. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2014-03-13 09:52:01 -03:00
Linus Torvalds	fab5669d55	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure	2014-01-20 12:10:27 -08:00
Linus Torvalds	09854dde94	* mpc85xx PCIe error interrupt support * misc small enhancements/fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJS3PEWAAoJEBLB8Bhh3lVKziAP/3nNouipUvIcRpL3o8U3jX19 Z38tD0KPjIIX46rgexjoXbI/bKEy6whE1eDc0CG9GPpOaN9p9WWe81hfBIQc9DYK epFNdH3Wk83Ml+3+01CBKfLfiIV25c7rc9IOM7g+nMOg65Z++iC1Cea7i7Ul7T5a C+CIDdsmco9VrDBNjHHsrdYVZ1Ufx4NxPb5xhemQAiGCXseBtoD1IDS/gOM1F+aP UBDSJobjDrFq5JasEkp83U62VBTDndEcxb6yHROVahIPRn4miroc0NMHJ3XEd1/H K9u24iXmHgjjG3Euzd3mq8Axsscz8dUksKQT+lL2YIXz+AM8PcrCDrXK6TiVF9/A k5Vj91svnWUDrEU0OF1HysOvn0Z/9Y0+Fwyt6//HMyl/7ehCZBINwxsZ4sc4whtd nVHV2Ddcc/LjZzQEAvHuyWdf2pfxi3M1VRhNvuGyWtdHBPf5P8tqVUteoztELPM+ bKr7wkpvKzKajX6xldNNrVP9xNps8hrYdETjhP5FpUYpF4HKntAe+VJvIue+nOja K2AxB25qGHyK5rgh69DkSuatZmImflG4brKWbOE+4La0E8OajrM790YdqP2hRnDQ 3zpuIRr/htIAy7C2vGbmWITRhxXnGx5FhDI7AQRr1VjvgJprWM5zgIacDuH+5C9K ftouZ1hFTDI1WEEMDxRF =EJ+3 -----END PGP SIGNATURE----- Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: - mpc85xx PCIe error interrupt support - misc small enhancements/fixes all over the place. * tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Don't try to cancel workqueue when it's never setup e752x_edac: Fix pci_dev usage count sb_edac: Mark get_mci_for_node_id as static EDAC: Mark edac_create_debug_nodes as static amd64_edac: Remove "amd64" prefix from static functions amd64_edac: Simplify code around decode_bus_error amd64_edac: Mark amd64_decode_bus_error as static EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro amd64_edac: Fix condition to verify max channels allowed for F15 M30h edac/85xx: Add PCIe error interrupt edac support	2014-01-20 09:26:15 -08:00
Ingo Molnar	014952270e	* Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant, from Gong Chen. * PCIe AER tracepoint severity levels fix, from Rui Wang. * Error path correction for the mce device init, from Levente Kurusa. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJSrCysAAoJEBLB8Bhh3lVK1ikP/0hKY1Kk4tjbSta9A9Z8LdQG 9F5JzEny47DpTrLaKij7MqAlbYFO8sSm7Zw0CEztTF7Ou/H37GAuxhMlB8ECMGOm Dzu53X1rySTna9mB+1gyXXd+pJypp/oe18/o16rw1QKjI9o2Kfgwfj7lKvytR549 kDM1dhxEImQIS5cpJPkOPbcpVlSqYN7BnK9/Qx3h0W70httT/8qrr9xVtVL7wjOT auTA0R5/TkV06FtxyfHUNULEWTSP+2yNP/iJbusR6f4Jk1j0XmyCFr0BYOkPA1UO 9+wC9+2R+r7rJw8MBfMzNmPrRzDJHdaiHPwYqse05yewRHfRHe5cgZWJYbL8Qv0u 2WOX+fY12EfDYlihcOYtlupRzhGfGKRsaRpSuG1zX87ctDxAfNZencv4hnaJvfqG Xk6ggIX6tHKEivO2gmaPsmhoKveh0zcozUs+wgh/tvV5QB6ioFCjzHfSEsix5+BH ryyg1ri7IZnh92g3UuSUpE0OCbAquMfI7XIJo+kFs0u79dZTL/kD3wVu6oYazwdy yTrvIq7Bq5cMWnnni5w7dIU09ef2uvDgyHyAS6+RiqaQxhYFsW8/yx2zJrIloWRs 7txz6t3CVmWFiejIg2gw6KyjaG6pXRBkDkI1XU6T+bKLb31ojx2+i9UKIIUeRZTB iisWAOI6ZSdt4eAkgeaI =r//I -----END PGP SIGNATURE----- Merge tag 'ras_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS updates from Borislav Petkov: * Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant, from Gong Chen. * PCIe AER tracepoint severity levels fix, from Rui Wang. * Error path correction for the mce device init, from Levente Kurusa. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-16 14:33:17 +01:00
Rashika Kheria	8112c0cdf7	sb_edac: Mark get_mci_for_node_id as static This patch marks the function get_mci_for_node_id() as static because it is not used outside of sb_edac.c. Thus, it also eliminates the following warning: drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2013-12-15 18:02:13 +01:00
Chen, Gong	fd52103966	EDAC, sb_edac: Modify H/W event reporting policy Newer Intel platforms support more than one method to report H/W event. On this kind of platform, H/W event report can adopt new method and traditional EDAC method should be disabled. Moreover, if EDAC event report method is set to force, it means event must be reported via EDAC interface. IOW, it overrides the default event report policy. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit and error messages ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-12-11 18:19:25 +01:00
Jingoo Han	ba935f4097	EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-12-06 10:23:41 +01:00
Aristeu Rozanski	bd4b968362	sb_edac: Shut up compiler warning when EDAC_DEBUG is enabled Fix this: In file included from drivers/edac/sb_edac.c:27:0: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(level "EDAC " prefix ": " fmt, ##arg) ^ drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here u64 ch_addr, offset, limit, prv = 0; Limit can be initialized to 0. The only way limit wouldn't be initialized is if there are no DIMMs present (which would be a bug of course) and it'd fail on the next test. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>	2013-11-30 12:26:36 +01:00
Linus Torvalds	cdd278db0e	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC driver updates from Mauro Carvalho Chehab: - sb_edac: add support for Ivy Bridge support - cell_edac: add a missing of_node_put() call * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: cell_edac: fix missing of_node_put sb_edac: add support for Ivy Bridge sb_edac: avoid decoding the same error multiple times sb_edac: rename mci_bind_devs() sb_edac: enable multiple PCI id tables to be used sb_edac: rework sad_pkg sb_edac: allow different interleave lists sb_edac: allow different dram_rule arrays sb_edac: isolate TOHM retrieval sb_edac: rename pci_br sb_edac: isolate TOLM retrieval sb_edac: make RANK_CFG_A value part of sbridge_info	2013-11-18 14:51:52 -08:00
Aristeu Rozanski	4d715a805b	sb_edac: add support for Ivy Bridge Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 17:13:22 -02:00
Aristeu Rozanski	be3036d220	sb_edac: avoid decoding the same error multiple times Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:50:03 -02:00
Aristeu Rozanski	ea779b5a09	sb_edac: rename mci_bind_devs() This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:47 -02:00
Aristeu Rozanski	5153a0f94c	sb_edac: enable multiple PCI id tables to be used This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:27 -02:00
Aristeu Rozanski	cc311991a7	sb_edac: rework sad_pkg This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:04 -02:00
Aristeu Rozanski	ef1ce51e7b	sb_edac: allow different interleave lists This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:48:42 -02:00
Aristeu Rozanski	464f1d829a	sb_edac: allow different dram_rule arrays This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:48:16 -02:00
Aristeu Rozanski	8fd6a43ac9	sb_edac: isolate TOHM retrieval This is preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:43:17 -02:00
Aristeu Rozanski	5f8a1b8a7b	sb_edac: rename pci_br Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:43:06 -02:00
Aristeu Rozanski	fb79a50926	sb_edac: isolate TOLM retrieval This is in preparation for the Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:25:27 -02:00
Aristeu Rozanski	ef1e8d03b1	sb_edac: make RANK_CFG_A value part of sbridge_info This is in preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:20:37 -02:00
Chen, Gong	10ef6b0dff	bitops: Introduce a more generic BITMASK macro GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-21 15:12:01 -07:00
Luck, Tony	de4772c621	edac: sb_edac.c should not require prescence of IMC_DDRIO device The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR space to determine the type of DIMMs (registered or unregistered). But this device does not exist on some single socket Sandy Bridge servers. While the type of DIMMs is nice to know, it is not essential for this driver's other functions. So it seems harsh to have it refuse to load at all when it cannot find this device. Make the check for this device be optional. If it isn't present just report the memory type as "MEM_UNKNOWN". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-04-29 10:32:40 -03:00
Mauro Carvalho Chehab	1339730e73	Linux 3.8-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+ Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M= =jjBc -----END PGP SIGNATURE----- Merge tag 'v3.8-rc7' into next Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...	2013-02-20 15:45:52 -03:00
Greg Kroah-Hartman	9b3c6e85c2	Drivers: edac: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:03 -08:00
Mauro Carvalho Chehab	da14d93d95	sb_edac: add a missing /n on a debug message [ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-12-21 08:01:18 -02:00
Mauro Carvalho Chehab	deb09ddaff	sb_edac: Avoid overflow errors at memory size calculation Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Cc: stable@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-09-25 07:38:20 -03:00
Mauro Carvalho Chehab	c2078e4c91	Merge branch 'devel' * devel: (33 commits) edac i5000, i5400: fix pointer math in i5000_get_mc_regs() edac: allow specifying the error count with fake_inject edac: add support for Calxeda highbank L2 cache ecc edac: add support for Calxeda highbank memory controller edac: create top-level debugfs directory sb_edac: properly handle error count i7core_edac: properly handle error count edac: edac_mc_handle_error(): add an error_count parameter edac: remove arch-specific parameter for the error handler amd64_edac: Don't pass driver name as an error parameter edac_mc: check for allocation failure in edac_mc_alloc() edac: Increase version to 3.0.0 edac_mc: Cleanup per-dimm_info debug messages edac: Convert debugfX to edac_dbg(X, edac: Use more normal debugging macro style edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Edac: Add ABI Documentation for the new device nodes edac: move documentation ABI to ABI/testing/sysfs-devices-edac i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy edac: change the mem allocation scheme to make Documentation/kobject.txt happy ...	2012-07-29 21:11:05 -03:00
Mauro Carvalho Chehab	c10538396b	sb_edac: properly handle error count Instead of reporting the error count via driver-specific details, use the new way provided by edac_mc_handle_error. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:49 -03:00
Mauro Carvalho Chehab	9eb07a7fb8	edac: edac_mc_handle_error(): add an error_count parameter In order to avoid loosing error events, it is desirable to group error events together and generate a single trace for several identical errors. The trace API already allows reporting multiple errors. Change the handle_error function to also allow that. The changes at the drivers were made by this small script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:47 -03:00
Mauro Carvalho Chehab	03f7eae80f	edac: remove arch-specific parameter for the error handler Remove the arch-dependent parameter, as it were not used, as the MCE tracepoint weren't implemented. It probably doesn't make sense to have an MCE-specific tracepoint, as this will cost more bytes at the tracepoint, and tracepoint is not free. The changes at the EDAC drivers were done by this small perl script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s$([^\;]+)\,([^\,$]+)\s\)/$1($2)/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:52 -03:00
Joe Perches	956b9ba156	edac: Convert debugfX to edac_dbg(X, Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:49 -03:00
Mauro Carvalho Chehab	dd23cd6eb1	edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs The debug macro already adds that. Most of the work here was made by this small script: $f .=$_ while (<>); $f =~ s/(debugf[0-9]\s$\s)__FILE__\s": /\1"/g; $f =~ s/(debugf[0-9]\s\(\s)__FILE__\s/\1/g; $f =~ s/(debugf[0-9]\s\(\s)__FILE__\s"MC: /\1"/g; $f =~ s/(debugf[0-9]\s\(\")\%s[\:\,\($]\s([^\"]\s[^\)]+)__func__\s\,\s/\1\2/g; $f =~ s/(debugf[0-9]\s$\")\%s[\:\,\($]\s([^\"]\s[^\)]+),\s__func__\s\)/\1\2)/g; $f =~ s/(debugf[0-9]\s$\"MC\:\s)\%s[\:\,\($]\s([^\"]\s[^\)]+)__func__\s\,\s/\1\2/g; $f =~ s/(debugf[0-9]\s$\"MC\:\s)\%s[\:\,\($]\s([^\"]\s[^\)]+),\s__func__\s*\)/\1\2)/g; $f =~ s/\"MC\: \\n\"/"MC:\\n"/g; print $f; After running the script, manual cleanups were done to fix it the remaining places. While here, removed the __LINE__ on most places, as it doesn't actually give useful info on most places. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:47 -03:00
Mauro Carvalho Chehab	fd687502dc	edac: Rename the parent dev to pdev As EDAC doesn't use struct device itself, it created a parent dev pointer called as "pdev". Now that we'll be converting it to use struct device, instead of struct devsys, this needs to be fixed. No functional changes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:56:06 -03:00
Chen Gong	2cbb587d3b	edac: fix the error about memory type detection on SandyBridge On SandyBridge, DDRIOA(Dev: 17 Func: 0 Offset: 328) is used to detect whether DIMM is RDIMM/LRDIMM, not TA(Dev: 15 Func: 0). Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
Chen Gong	e35fca4791	edac: avoid mce decoding crash after edac driver unloaded Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8102b936>] mce_log+0x46/0x180 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210 [<ffffffff812e2066>] ghes_proc+0x46/0x70 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36 [<ffffffff81069dc2>] process_one_work+0x132/0x450 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120 [<ffffffff81070aee>] kthread+0x9e/0xb0 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81514720>] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80 RSP <ffff88042868fb20> CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [<ffffffff810705b0>] kthread_data+0x10/0x20 PGD `1a0d067` PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Cc: <stable@vger.kernel.org> # 3.2 and upper Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
Linus Torvalds	87a5af24e5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC internal API changes from Mauro Carvalho Chehab: "This changeset is the first part of a series of patches that fixes the EDAC sybsystem. On this set, it changes the Kernel EDAC API in order to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and Intel E5-xxxx memory controllers. The EDAC core used to assume that: - the DRAM chip select pin is directly accessed by the memory controller - when multiple channels are used, they're all filled with the same type of memory. None of the above premises is true on Intel memory controllers since 2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory Buffer or by some similar technologies hides the direct access to the DRAM pins. So, the existing drivers for those chipsets had to lie to the EDAC core, in general telling that just one channel is filled. That produces some hard to understand error messages like: EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1 The location information there (row3 channel 0) is completely bogus: it has no physical meaning, and are just some random values that the driver uses to talk with the EDAC core. The error actually happened at CPU socket 0, channel 0, slot 1, but this is not reported anywhere, as the EDAC core doesn't know anything about the memory layout. So, only advanced users that know how the EDAC driver works and that tests their systems to see how DIMMs are mapped can actually benefit for such error logs. This patch series fixes the error report logic, in order to allow the EDAC to expose the memory architecture used by them to the EDAC core. So, as the EDAC core now understands how the memory is organized, it can provide an useful report: EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4) The location of the DIMM where the error happened is reported by "MC0" (cpu socket #0), at "channel:0 slot:1" location, and matches the physical location of the DIMM. There are two remaining issues not covered by this patch series: - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. Those are on a second series of patches (also at -next), but will probably miss the train for 3.5, due to the slow review process." Fix up trivial conflict (due to spelling correction of removed code) in drivers/edac/edac_device.c * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits) i7core: fix ranks information at the per-channel struct i5000: Fix the fatal error handling i5100_edac: Fix a warning when compiled with 32 bits i82975x_edac: Test nr_pages earlier to save a few CPU cycles e752x_edac: provide more info about how DIMMS/ranks are mapped i5000_edac: Fix the logic that retrieves memory information i5400_edac: improve debug messages to better represent the filled memory edac: Cleanup the logs for i7core and sb edac drivers edac: Initialize the dimm label with the known information edac: Remove the legacy EDAC ABI x38_edac: convert driver to use the new edac ABI tile_edac: convert driver to use the new edac ABI sb_edac: convert driver to use the new edac ABI r82600_edac: convert driver to use the new edac ABI ppc4xx_edac: convert driver to use the new edac ABI pasemi_edac: convert driver to use the new edac ABI mv64x60_edac: convert driver to use the new edac ABI mpc85xx_edac: convert driver to use the new edac ABI i82975x_edac: convert driver to use the new edac ABI i82875p_edac: convert driver to use the new edac ABI ...	2012-05-29 18:32:37 -07:00
Mauro Carvalho Chehab	e17a2f42a4	edac: Cleanup the logs for i7core and sb edac drivers Remove some information that it is duplicated at the MCE log, and don't have much usage for the error. Those data will be added again, when creating a trace function that outputs both memory errors and MCE fields. Cc: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:51 -03:00
Mauro Carvalho Chehab	ca0907b9e4	edac: Remove the legacy EDAC ABI Now that all drivers got converted to use the new ABI, we can drop the old one. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab	c36e3e7768	sb_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:48 -03:00
Mauro Carvalho Chehab	a895bf8b1e	edac: move nr_pages to dimm struct The number of pages is a dimm property. Move it to the dimm struct. After this change, it is possible to add sysfs nodes for the DIMM's that will properly represent the DIMM stick properties, including its size. A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when the memory controller represents the memory via chip select rows. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	5e2af0c09e	edac: Don't initialize csrow's first_page & friends when not needed Almost all edac drivers initialize csrow_info->first_page, csrow_info->last_page and csrow_info->page_mask. Those vars are used inside the EDAC core, in order to calculate the csrow affected by an error, by using the routine edac_mc_find_csrow_by_page(). However, very few drivers actually use it: e752x_edac.c e7xxx_edac.c i3000_edac.c i82443bxgx_edac.c i82860_edac.c i82875p_edac.c i82975x_edac.c r82600_edac.c There also a few other drivers that have their own calculus formula internally using those vars. All the others are just wasting time by initializing those data. While initializing data without using them won't cause any troubles, as those information is stored at the wrong place (at csrows structure), it is better to remove what is unused, in order to simplify the next patch. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	084a4fccef	edac: move dimm properties to struct dimm_info On systems based on chip select rows, all channels need to use memories with the same properties, otherwise the memories on channels A and B won't be recognized. However, such assumption is not true for all types of memory controllers. Controllers for FB-DIMM's don't have such requirements. Also, modern Intel controllers seem to be capable of handling such differences. So, we need to get rid of storing the DIMM information into a per-csrow data, storing it, instead at the right place. The first step is to move grain, mtype, dtype and edac_mode to the per-dimm struct. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	a7d7d2e1a0	edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:57 -03:00
David Mackey	15ed103a98	edac: Fix spelling errors. Signed-off-by: David Mackey <tdmackey@twitter.com> Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-04-30 13:28:41 +02:00
Linus Torvalds	f0f3680e50	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes from Mauro Carvalho Chehab: "A series of EDAC driver fixes. It also has one core fix at the documentation, and a rename patch, fixing the name of the struct that contains the rank information." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: rename channel_info to rank_info i5400_edac: Avoid calling pci_put_device() twice edac: i5100 ack error detection register after each read edac: i5100 fix erroneous define for M1Err edac: sb_edac: Fix a wrong value setting for the previous value edac: sb_edac: Fix a INTERLEAVE_MODE() misuse edac: sb_edac: Let the driver depend on PCI_MMCONFIG edac: Improve the comments to better describe the memory concepts edac/ppc4xx_edac: Fix compilation Fix sb_edac compilation with 32 bits kernels	2012-03-28 14:24:40 -07:00
Hui Wang	7fae0db439	edac: sb_edac: Fix a wrong value setting for the previous value >From the driver design, the variable limit wants to compare with its previous value, we should set the value of limit instead of the value of tmp_mb to the variable prev. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:11 -03:00
Hui Wang	ad9c40b7dd	edac: sb_edac: Fix a INTERLEAVE_MODE() misuse We can identify dram interleave mode from the Dram Rule register rather than Dram Interleave list register. In this context, the reg of INTERLEAVE_MODE(reg) contains the Dram Interleave list register, we can't get interleave mode from the reg, while the variable interleave_mode saves the the mode got from the Dram Rule register, so we use the variable to replace INTERLEAVE_MDDE(reg) here. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:02 -03:00
Mauro Carvalho Chehab	5b889e379f	Fix sb_edac compilation with 32 bits kernels As reported by Josh Boyer <jwboyer@redhat.com>: > drivers/edac/sb_edac.c: In function 'get_memory_error_data': > drivers/edac/sb_edac.c:861:2: warning: left shift count >= width of type > [enabled by default] > <snip> > ERROR: "__udivdi3" [drivers/edac/sb_edac.ko] undefined! > make[1]: * [__modpost] Error 1 > make: * [modules] Error 2 PS.: compile-tested only Reported-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:19:38 -03:00
Lionel Debroux	36c46f31df	EDAC: Make pci_device_id tables __devinitconst. These const tables are currently marked __devinitdata, but Documentation/PCI/pci.txt says: "o The ID table array should be marked __devinitconst; this is done automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()." So use DEFINE_PCI_DEVICE_TABLE(x). Based on PaX and earlier work by Andi Kleen. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 12:04:54 +01:00
Linus Torvalds	edf7c8148e	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: add IRQ context simulation in module mce-inject x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog x86, MCE: Drain mcelog buffer x86, mce: Add wrappers for registering on the decode chain	2012-01-06 15:02:37 -08:00

1 2

55 Commits