linux

Commit Graph

Author	SHA1	Message	Date
Mauro Carvalho Chehab	276b824c30	i7core_edac: some fixes at error injection code Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	17cb7b0cf7	i7core_edac: Some cleanups at displayed info Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	086271a037	i7core: remove some uneeded noisy debug messages Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	3a7dde7fcd	i7core: add socket info at the debug msg Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	ec6df24c15	i7core: better document i7core_get_active_channels() Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	c77720b954	i7core: fix get_devices routine for Xeon55xx i7core_get_devices() were preparet to get just the first found device of each type. Due to that, on Xeon 55xx, only socket 1 were retrived. Rework i7core_get_devices() to clean it and to properly support Xeon 55xx. While here, fix a small typo. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	a639539fa2	i7core: enrich error information based on memory transaction type Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	c5d3452869	i7core: check if the memory error is fatal or non-fatal Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	310cbb7284	i7core: fix probing on Xeon55xx Xeon55xx fails to probe with this error message: EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init() EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41 i7core_edac: probe of 0000:00:14.0 failed with error -22 This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has PCI ID 8086:2c40. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	f237fcf2b7	i7core_edac: some fixes at memory error parser m->bank is not related to the memory bank but, instead, to the MCA Error register bank. Fix it accordingly. While here, improves the comments for Nehalem bank. A later fix is needed, in order to get bank/rank information from MCA error log. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	8a2f118e3a	i7core_edac: decode mcelog error and send it via edac interface Enriches mcelog error by using the encoded information at MCE status and misc registers (IA32_MCx_STATUS, IA32_MCx_MISC). Some fixes are still needed here, in order to properly fill the EDAC fields. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	ba6c5c62ee	i7core_edac: maps all sockets as if ther are one MC controller Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	67166af4ab	i7core_edac: add support for more than one MC socket Some Nehalem architectures have more than one MC socket. Socket 0 is located at bus 255. Currently, it is using up to 2 sockets, but increasing it to a larger number is just a matter of increasing MAX_SOCKETS definition. This seems to be required for properly support of Xeon 55xx. Still needs testing with Xeon 55xx. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	d1fd4fb69e	i7core_edac: Add a code to probe Xeon 55xx bus This code changes the detection procedure of i7core_edac. Instead of directly probing for MC registers, it probes for another register found on Nehalem. If found, it tries to pick the first MC PCI BUS. This should work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255 that are not properly detected by the non-legacy PCI methods. The new detection code scans specifically at buses 254 and 255 for the Xeon 55xx devices. This code has not tested yet. After working, a change at the code will be needed, since the i7core is not yet ready for working with 2 sets of MC. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	e9bd2e7379	i7core_edac: Adds write unlock to MC registers The public Intel Xeon 5500 volume 2 datasheet describes, on page 53, session 2.6.7 a register that can lock/unlock Memory Controller the configuration register, called MC_CFG_CONTROL. Adds support for it in the hope that software error injection would work. With my tests with Xeon 35xx, there's still something missing. With a program that does sequencial bit writes at dev 0.0, sometimes, it produces error injection, after unblocking the MC_CFG_CONTROL (and, sometimes, it just locks my testing machine). I'll try later to discover by trial and error what's the register that solves this issue on Xeon 35xx. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	d5381642ab	i7core_edac: Add edac_mce glue Adds a glue code to allow i7core to work with mcelog. With the glue, i7core registers itself on edac_mce. At mce, when an error is detected, it calls all registered drivers (in this case, i7core), for EDAC error handling. TODO: It currently just prints the MCE error log using about the same format as mce panic messages. The error message should be enhanced with mcelog userspace info and converted into the proper EDAC format, to feed the EDAC error counts. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	963c5ba359	edac/Kconfig: edac_mce can't be module Since mcelog is bool, edac_mce glue should also be bool, or otherwise will not work. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	696e409dbd	edac_mce: Add an interface driver to report mce errors via edac edac_mce module is an interface module that gets mcelog data and forwards to any registered edac module that expects to receive data via mce. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:49 -03:00
Mauro Carvalho Chehab	41fcb7feed	i7core_edac: CodingStyle fixes Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab	eb94fc402f	i7core_edac: fill csrows edac sysfs info csrows is still fake, since we can't identify its representation with Nehalem registers. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab	5566cb7c91	i7core_edac: Memory info fixes and preparation for properly filling cswrow data Now, memory size is properly displayed: EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400 EDAC i7core: Memory channel configuration: EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 EDAC i7core: dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 Still, as the way to retrieve csrows info is not known, it does a mapping of what's available to csrows basic unit at edac core. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab	854d334997	i7core_edac: Get more info about the memory DIMMs Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab	7dd6953c5f	i7core_edac: Add more information about each active dimm Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab	b7c761512c	i7core_edac: Improve error handling Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab	1c6fed808f	i7core_edac: Properly fill struct csrow_info Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab	ef708b53b9	i7core_edac: Add additional tests for error detection Properly check the number of channels and improve probing error detection Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab	442305b152	i7core_edac: Add a memory check routine, based on device 3 function 4 This function appears only on Xeon 5500 datasheet. Yet, testing with a Xeon 3503 showed that this is also implemented on other Nehalem processors. At the first read, MC_TEST_ERR_RCV1 and MC_TEST_ERR_RCV0 can contain any value. Modify CE error logic to update the error count only after the second read. An alternative approach would be to do a write at rcv0 and rcv1 registers, but it seemed better to keep they untouched, since BIOS might eventually assume that they are exclusive for their usage. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab	87d1d272ba	i7core_edac: need mci->edac_check, otherwise module removal doesn't work There are some locking troubles with edac_core: if you don't declare an edac_check, module may suffer from soft lock. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab	7b029d03c3	i7core_edac: A few fixes at error injection code Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab	f122a89222	i7core_edac: Show read/write virtual/physical channel association Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab	8f33190757	i7core_edac: Registers all supported MC functions Now, it will try to register on all supported Memory Controller functions. It should be noticed that dev3, function 2 is present only on chips with Registered DIMM's, according to the datasheet. So, the driver doesn't return -ENODEV is all functions but this one were successfully registered and enabled: EDAC i7core: Registered device 8086:2c18 fn=3 0 EDAC i7core: Registered device 8086:2c19 fn=3 1 EDAC i7core: Device not found: PCI ID 8086:2c1a (dev 3, func 2) EDAC i7core: Registered device 8086:2c1c fn=3 4 EDAC i7core: Registered device 8086:2c20 fn=4 0 EDAC i7core: Registered device 8086:2c21 fn=4 1 EDAC i7core: Registered device 8086:2c22 fn=4 2 EDAC i7core: Registered device 8086:2c23 fn=4 3 EDAC i7core: Registered device 8086:2c28 fn=5 0 EDAC i7core: Registered device 8086:2c29 fn=5 1 EDAC i7core: Registered device 8086:2c2a fn=5 2 EDAC i7core: Registered device 8086:2c2b fn=5 3 EDAC i7core: Registered device 8086:2c30 fn=6 0 EDAC i7core: Registered device 8086:2c31 fn=6 1 EDAC i7core: Registered device 8086:2c32 fn=6 2 EDAC i7core: Registered device 8086:2c33 fn=6 3 EDAC i7core: Driver loaded. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:45 -03:00
Mauro Carvalho Chehab	0b2b7b7ec0	i7core_edac: Add more status functions to EDAC driver This patch were co-authored with Aristeu Rozanski. Signed-off-by: Aristeu Sergio <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:45 -03:00
Mauro Carvalho Chehab	194a40feab	i7core_edac: Add error insertion code for Nehalem Implements set_inject_error() with the low-level code needed to inject memory errors at Nehalem, and adds some sysfs nodes to allow error injection The next patch will add an API for error injection. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:45 -03:00
Mauro Carvalho Chehab	a0c36a1f0f	i7core_edac: Add an EDAC memory controller driver for Nehalem chipsets This driver is meant to support i7 core/i7core extreme desktop processors and Xeon 35xx/55xx series with integrated memory controller. It is likely that it can be expanded in the future to work with other processor series based at the same Memory Controller design. For now, it has just a few MCH status reads. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:45 -03:00
Borislav Petkov	35d824b28f	edac, mce: Fix wrong mask and macro usage Correct two mishaps which prevented reporting error type (CECC vs UECC) and extended error description. Cc: <stable@kernel.org> # 32.x, 33.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-04-30 10:15:39 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Borislav Petkov	5b89d2f9ac	edac, mce: Filter out invalid values Print the CPU associated with the error only when the field is valid. Cc: <stable@kernel.org> # .32.x .33.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-03-22 16:33:31 +01:00
Peter Tyser	8004fd2ad6	edac: e752x: add dram scrubbing support Add support to scrub DRAM using the e752x integrated memory scrubbing engine. The e7320/7520/e7525 chipsets support scrubbing at one rate while the i3100 chipset supports a normal and fast rate. A similar patch was originally sent back in 2008: http://sourceforge.net/mailarchive/forum.php?thread_name=1204835866.25206.70.camel@localhost.localdomain&forum_name=bluesmoke-devel This version has the following updates: - Use 16-bit PCI config cycles to access MCHSCRB register e7320/7520/e7525 docs say register is 16bits wide, i3100 says 8. I tested 16bits on the i3100 to be safe. - Recalcuate and round actual scrub rates The changes have been tested on an i3100-based board. Signed-off-by: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-12 15:52:40 -08:00
Konstantin Olifer	8de5c1a165	edac: e752x fsb ecc FSB parity is only supported on the Xeon processor. Previously it was incorrectly enabled for the Celeron as well. Signed-off-by: Konstantin Olifer <kolifer@gmail.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-12 15:52:40 -08:00
H Hartley Sweeten	66ed3f7516	edac: mpc85xx use resource_size instead of raw math Use resource_size() instead of arithmetic. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Acked-by: Dave Jiang <djiang@mvista.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Kumar Gala <galak@gate.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-12 15:52:40 -08:00
Peter Tyser	dcca7c3d00	edac: mpc85xx improve SDRAM error reporting Add the ability to detect the specific data line or ECC line which failed when printing out SDRAM single-bit errors. An example of a single-bit SDRAM ECC error is below: EDAC MPC85xx MC1: Err Detect Register: 0x80000004 EDAC MPC85xx MC1: Faulty data bit: 59 EDAC MPC85xx MC1: Expected Data / ECC: 0x7f80d000_409effa0 / 0x6d EDAC MPC85xx MC1: Captured Data / ECC: 0x7780d000_409effa0 / 0x6d EDAC MPC85xx MC1: Err addr: 0x00031ca0 EDAC MPC85xx MC1: PFN: 0x00000031 Knowning which specific data or ECC line caused an error can be useful in tracking down hardware issues such as improperly terminated signals, loose pins, etc. Note that this feature is only currently enabled for 64-bit wide data buses, 32-bit wide bus support should be added. I don't have any 32-bit wide systems to test on. If someone has one and is willing to give this patch a shot with the check for a 64-bit data bus removed it would be much appreciated and I can re-submit with both 32 and 64 bit buses supported. Signed-off-by: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Dave Jiang <djiang@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-12 15:52:40 -08:00
Peter Tyser	21768639be	edac: mpc85xx mask ecc syndrome correctly With a 64-bit wide data bus only the lowest 8-bits of the ECC syndrome are relevant. With a 32-bit wide data bus only the lowest 16-bits are relevant on most architectures. Without this change, the ECC syndrome displayed can be mildly confusing, eg: EDAC MPC85xx MC1: syndrome: 0x25252525 When in reality the ECC syndrome is 0x25. A variety of Freescale manuals say a variety of different things about how to decode the CAPTURE_ECC (syndrome) register. I don't have a system with a 32-bit bus to test on, but I believe the change is correct. It'd be good to get an ACK from someone at Freescale about this change though. Signed-off-by: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Dave Jiang <djiang@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-12 15:52:40 -08:00
Emese Revfy	52cf25d0ab	Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:49 -08:00
Linus Torvalds	eaa5eec739	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: Simplify ECC override handling	2010-03-03 09:25:37 -08:00
Linus Torvalds	0a135ba14d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: add __percpu sparse annotations to what's left percpu: add __percpu sparse annotations to fs percpu: add __percpu sparse annotations to core kernel subsystems local_t: Remove leftover local.h this_cpu: Remove pageset_notifier this_cpu: Page allocator conversion percpu, x86: Generic inc / dec percpu instructions local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c module: Use this_cpu_xx to dynamically allocate counters local_t: Remove cpu_local_xx macros percpu: refactor the code in pcpu_[de]populate_chunk() percpu: remove compile warnings caused by __verify_pcpu_ptr() percpu: make accessors check for percpu pointer in sparse percpu: add __percpu for sparse. percpu: make access macros universal percpu: remove per_cpu__ prefix.	2010-03-03 07:34:18 -08:00
Borislav Petkov	d95cf4de6a	amd64_edac: Simplify ECC override handling No need for clearing ecc_enable_override and checking it in two places. Instead, simply check it during probing and act accordingly. Also, rename the flag bitfields according to the functionality they actually represent. What is more, make sure original BIOS ECC settings are restored when the module is unloaded. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-03-01 19:25:12 +01:00
Tejun Heo	a29d8b8e2d	percpu: add __percpu sparse annotations to what's left Add __percpu sparse annotations to places which didn't make it in one of the previous patches. All converions are trivial. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Neil Brown <neilb@suse.de>	2010-02-17 11:17:38 +09:00
Linus Torvalds	676ad58553	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: Do not falsely trigger kerneloops	2010-02-11 14:07:13 -08:00
Peter Tyser	f8c63345b4	edac: mpc85xx fix build regression by removing unused debug code Some unused, unsupported debug code existed in the mpc85xx EDAC driver that resulted in a build failure when CONFIG_EDAC_DEBUG was defined: drivers/edac/mpc85xx_edac.c: In function 'mpc85xx_mc_err_probe': drivers/edac/mpc85xx_edac.c:1031: error: implicit declaration of function 'edac_mc_register_mcidev_debug' drivers/edac/mpc85xx_edac.c:1031: error: 'debug_attr' undeclared (first use in this function) drivers/edac/mpc85xx_edac.c:1031: error: (Each undeclared identifier is reported only once drivers/edac/mpc85xx_edac.c:1031: error: for each function it appears in.) Signed-off-by: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-02-11 13:59:42 -08:00
Peter Tyser	cff9279e4e	edac: mpc85xx fix bad page calculation Commit `b484625172` ("edac: mpc85xx add mpc83xx support") accidentally broke how a chip select's first and last page addresses are calculated. The page addresses are being shifted too far right by PAGE_SHIFT. This results in errors such as: EDAC MPC85xx MC1: Err addr: 0x003075c0 EDAC MPC85xx MC1: PFN: 0x00000307 EDAC MPC85xx MC1: PFN out of range! EDAC MC1: INTERNAL ERROR: row out of range (4 >= 4) EDAC MC1: CE - no information available: INTERNAL ERROR The vaule of PAGE_SHIFT is already being taken into consideration during the calculation of the 'start' and 'end' variables, thus it is not necessary to account for it again when setting a chip select's first and last page address. Signed-off-by: Peter Tyser <ptyser@xes-inc.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Kumar Gala <galak@gate.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-02-11 13:59:42 -08:00
Borislav Petkov	cab4d27764	amd64_edac: Do not falsely trigger kerneloops An unfortunate "WARNING" in the message amd64_edac dumps when the system doesn't support DRAM ECC or ECC checking is not enabled in the BIOS used to trigger kerneloops which qualified the message as an OOPS thus misleading the users. See, e.g. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/422536 http://bugzilla.kernel.org/show_bug.cgi?id=15238 Downgrade the message level to KERN_NOTICE and fix the formulation. Cc: stable@kernel.org # .32.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-02-11 20:32:14 +01:00
Tamas Vincze	118f3e1afd	edac: i5000_edac critical fix panic out of bounds EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. This happens because FERR_NF_FBD bit 28 is not updated on i5000. Due to that, both bits 28 and 29 may be equal to one, returning channel = 3. As this value is invalid, EDAC core generates the panic. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14568 Signed-off-by: Tamas Vincze <tom@vincze.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-16 12:15:38 -08:00
Roel Kluin	926311fd7d	amd64_edac: Ensure index stays within bounds in amd64_get_scrub_rate Add a missing iterator variable thus fixing the conditional of the for-loop in amd64_get_scrub_rate(). Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-01-15 10:45:58 +01:00
Borislav Petkov	5213c32f9d	edac, pci: remove pesky debug printk Do not spam the logs needlessly with the sole info that edac_pci_dev_parity_clear is being called. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:09 +01:00
Borislav Petkov	92389102b6	amd64_edac: restrict PCI config space access Do not access F2x19[0,4] on K8 since they're undefined there. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:08 +01:00
Borislav Petkov	43f5e68733	amd64_edac: fix forcing module load/unload Clear the override flag after force-loading the module. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:08 +01:00
Borislav Petkov	56b34b91e2	amd64_edac: make driver loading more robust Currently, the module does not initialize fully when the DIMMs aren't ECC but remains still loaded. Propagate the error when no instance of the driver is properly initialized and prevent further loading. Reorganize and polish error handling in amd64_edac_init() while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:07 +01:00
Borislav Petkov	8f68ed9728	amd64_edac: fix driver instance freeing Fix use-after-free errors by pushing all memory-freeing calls to the end of amd64_remove_one_instance(). Reported-by: Darren Jenkins <darrenrjenkins@gmail.com> LKML-Reference: <1261370306.11354.52.camel@ICE-BOX> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:07 +01:00
Borislav Petkov	603adaf6b3	amd64_edac: fix K8 chip select reporting Fix the case when amd64_debug_display_dimm_sizes() reports only half the amount of DRAM on it because it doesn't account for when the single DCT operates in 128-bit mode and merges chip selects from different DIMMs. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:07 +01:00
Linus Torvalds	661e338f72	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: edac, mce, amd: silence GART TLB errors edac, mce: correct corenum reporting	2009-12-16 10:09:43 -08:00
Borislav Petkov	256f7276af	edac, mce, amd: silence GART TLB errors Although reporting of benign GART TLB errors is disabled in __mcheck_cpu_apply_quirks, those are still being logged, and, as a result, trip up amd64_edac. Pull up reporting check so that machines with loaded edac module bail out early and don't spit fragments into dmesg. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-16 17:48:39 +01:00
Nils Carlson	bbead2104e	edac: i5100 add 6 ranks per channel Add support for 6 ranks per channel to the i5100 chipset. I have tested the patch as far as possible with correctible errors and things appear good. The DIMM mapping is correct for our board, but boards may differ. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Acked-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Nils Carlson	295439f2a3	edac: i5100 add scrubbing Addscrubbing to the i5100 chipset. The i5100 chipset only supports one scrubbing rate, which is not constant but dependent on memory load. The rate returned by this driver is an estimate based on some experimentation, but is substantially closer to the truth than the speed supplied in the documentation. Also, scrubbing is done once, and then a done-bit is set. This means that to accomplish continuous scrubbing a re-enabling mechanism must be used. I have created the simplest possible such mechanism in the form of a work-queue which will check every five minutes. This interval is quite arbitrary but should be sufficient for all sizes of system memory. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Nils Carlson	b18dfd05f9	edac: i5100 clean controller to channel terms The i5100 driver uses the word controller instead of channel in a lot of places, this is simply a cleanup of the patch. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Borislav Petkov	35d8069234	edac, mce: correct corenum reporting Fix core number reporting with NB MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-15 15:52:13 +01:00
Borislav Petkov	505422517d	x86, msr: Add support for non-contiguous cpumasks The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-11 10:59:21 -08:00
Borislav Petkov	df5b1606bd	amd64_edac: bump driver version This was long overdue ... Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:38:14 +01:00
Andrew Morton	18ba54ac12	amd64_edac: fix use-uninitialised bug drivers/edac/amd64_edac.c: In function 'amd64_edac_init': drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:38:13 +01:00
Borislav Petkov	bdc30a0c8c	amd64_edac: correct sys address to chip select mapping The routine does the reverse mapping of the error address of a CECC back to the node id, DRAM controller and chip select of the DIMM which caused the error. We should lookup the channel using the syndromes _only_ when the DCTs are ganged so fix that. Also, add an early exit when there's an error while scanning for the csrow thus decreasing indentation levels for better readability. Finally, fixup comments. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:38:12 +01:00
Borislav Petkov	bfc04aec7d	amd64_edac: add a leaner syndrome decoding algorithm Instead of using the whole syndrome tables for channel decoding, use a set of eigenvectors with which the tables can be generated to search for the syndrome in error. The algorithm operates independently of symbol size and can be used for both x4 and x8 syndromes. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:37:59 +01:00
Borislav Petkov	986a42a250	amd64_edac: remove early hw support check The .probe_valid_hardware low_ops member checked whether the DCTs are in DDR3 mode and bailed out if so. Now that all the needed changes for DDR3 support is in place, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:31 +01:00
Borislav Petkov	6b4c0bdeb0	amd64_edac: detect DDR3 memory type Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:31 +01:00
Borislav Petkov	239642fe19	edac: add memory types strings for debugging Instead of using deeply-nested conditionals for dumping the DIMM type in debug mode, add a strings array of the supported DIMM types. This is useful in cases where an edac driver supports multiple DRAM types and is only defined in debug builds. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:31 +01:00
Borislav Petkov	cec7924f56	edac, mce: update AMD F10h revD check F10h revD start with model number 8. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:30 +01:00
Borislav Petkov	1f6bcee75e	amd64_edac: remove unneeded extract_error_address wrapper Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:30 +01:00
Borislav Petkov	44e9e2ee21	amd64_edac: rename StinkyIdentifier SystemAddress -> sys_addr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:30 +01:00
Borislav Petkov	ad858bfa14	amd64_edac: remove superfluous dbg printk Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:29 +01:00
Borislav Petkov	1433eb9903	amd64_edac: enhance address to DRAM bank mapping Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10 and all K8 flavors and remove klugdy table of pseudo values. Add a low_ops->dbam_to_cs member which is family-specific and replaces low_ops->dbam_map_to_pages since the pages calculation is a one liner now. Further cleanups, while at it: - shorten family name defines - align amd64_family_types struct members Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:29 +01:00
Borislav Petkov	d16149e8c3	amd64_edac: cleanup f10_early_channel_count Do not read DCLR[01] again since this is done in amd64_read_mc_registers() earlier. There can be more than two physical DIMMs present so clamp the channels value to max 2. Also, do not report DCT data width - it is also done earlier. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:29 +01:00
Borislav Petkov	8566c4df16	amd64_edac: dump DIMM sizes on K8 too Extend f10_debug_display_dimm_sizes to dump the logical DIMMs configuration on K8 revF too. Remove the ganged arg since we print the DCT operating mode (ganged vs unganged) earlier. Also, DCT csrow configuration is relevant therefore dump it as KERN_DEBUG instead of only on debug builds. Remove misleading DIMM output since there's no reliable way of mapping of chip selects to actual physical DIMMs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:28 +01:00
Borislav Petkov	8de1d91e62	amd64_edac: cleanup rest of amd64_dump_misc_regs Clarify bitfields description, add PCI config function/offset names to registers for easy reference, simplify code layout, remove unneeded info. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:28 +01:00
Borislav Petkov	68798e1760	amd64_edac: cleanup DRAM cfg low debug output Carve out the register-specific debug statements into a separate function, clarify meanings of the single bitfields in the register, remove irrelevant output and macros. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:28 +01:00
Borislav Petkov	6ba5dcdc44	amd64_edac: wrap-up pci config read error handling Add a pci config read wrapper for signaling pci config space access errors instead of them being visible only on a debug build. This is important on amd64_edac since it uses all those pci config register values to access the DRAM/DIMM configuration of the nodes. In addition, the wrapper makes a _lot_ (look at the diffstat!) of error handling code superfluous and improves much of the overall code readability by removing error handling details out of the way. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Borislav Petkov	f6d6ae9657	amd64_edac: unify MCGCTL ECC switching Unify almost identical code into one function and remove NUMA-specific usage (specifically cpumask_of_node()) in favor of generic topology methods. Remove unused defines, while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Rusty Russell	ba578cb34a	cpumask: use modern cpumask style in drivers/edac/amd64_edac.c cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Borislav Petkov	e97f8bb8ce	amd64_edac: make DRAM regions output more human-readable Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits are only a subset of the 64bit register, the values are correct since the remaining bits are Read-As-Zero and no shifting is needed. Also, cleanup DRAM base/limit debug output. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Borislav Petkov	72381bd55e	amd64_edac: clarify DRAM CTL debug reporting Make debug info formulations about the DRAM and DCT configuration of the machine more human readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:26 +01:00
Ingo Molnar	26fb20d008	Merge branch 'perf/mce' into perf/core Merge reason: It's ready for v2.6.33. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 20:11:06 +01:00
Borislav Petkov	17adea01b9	amd64_edac: fix CECCs reporting Shift error type bits properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-11-04 14:04:06 +01:00
Li Hong	a3c4c58085	amd64_edac: fix a wrong goto clause in amd64_edac.c In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is called before pci_register_driver. If it fails, should exit with err directly. Signed-off-by: Li Hong <lihong.hi@gmail.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-11-04 14:02:32 +01:00
Keith Mannthey	c2494ace99	edac: i5100 fix initialization code Allow csrows to properly initialize when the topology only has active channels on 2 and 3. This new check allows proper detection and initialization in this topology. Only checking the first mrt that represented channels 0 and 1 is not sufficient. I also fixed up the related debug information path. I can submit as a 2nd patch if needed. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Acked-by: Aristeu Rozanski <aris@ruivo.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:30 -07:00
Ira W. Snyder	0616fb003d	edac: i5400 fix missing CONFIG_PCI define When building without CONFIG_PCI the edac_pci_idx variable is unused, causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI, just like the rest of the PCI support. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:30 -07:00
Jeff Roberson	156edd4aaa	edac: i5400 fix csrow mapping The i5400 EDAC driver has several bugs with chip-select row computation which most likely lead to bugs in detailed error reporting. Attempts to contact the authors have gone mostly unanswered so I am presenting my diff here. I do not subscribe to lkml and would appreciate being kept in the cc. The most egregious problem was miscalculating the addresses of MTR registers after register 0 by assuming they are 32bit rather than 16. This caused the driver to miss half of the memories. Most motherboards tend to have only 8 dimm slots and not 16, so this may not have been noticed before. Further, the row calculations multiplied the number of dimms several times, ultimately ending up with a maximum row of 32. The chipset only supports 4 dimms in each of 4 channels, so csrow could not be higher than 4 unless you use a row per-rank with dual-rank dimms. I opted to eliminate this behavior as it is confusing to the user and the error reporting works by slot and not rank. This gives a much clearer view of memory by slot and channel in /sys. Signed-off-by: Jeff Roberson <jroberson@jroberson.net> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:30 -07:00
Borislav Petkov	4997811e3b	amd64_edac: fix DRAM base and limit extraction masks, v2 This is a proper fix as a follow-up to `66216a7` and `916d11b`. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-16 18:51:22 +02:00
Borislav Petkov	fb2531953f	mce, edac: Use an atomic notifier for MCEs decoding Add an atomic notifier which ensures proper locking when conveying MCE info to EDAC for decoding. The actual notifier call overrides a default, negative priority notifier. Note: make sure we register the default decoder only once since mcheck_init() runs on each CPU. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091003065752.GA8935@liondog.tnic> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 12:24:45 +02:00
Linus Torvalds	624235c5b3	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pci: Correct spelling in a comment x86: Simplify bound checks in the MTRR code x86: EDAC: carve out AMD MCE decoding logic initcalls: Add early_initcall() for modules x86: EDAC: MCE: Fix MCE decoding callback logic	2009-10-08 12:06:36 -07:00
Borislav Petkov	94baaee494	amd64_edac: beef up DRAM error injection When injecting DRAM ECC errors (F3xBC_x8), EccVector[15:0] is a bitmask of which bits should be error injected when written to and holds the payload of 16-bit DRAM word when read, respectively. Add /sysfs members to show the DRAM ECC section/word/vector. Fail wrong injection values entered over /sysfs instead of truncating them. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:51:28 +02:00
Borislav Petkov	66216a7a15	amd64_edac: fix DRAM base and limit extraction On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers which specify the destination node of a DRAM address. Those address boundaries are being extracted into ->dram_base[] and ->dram_limit[]. Correct the extraction masks to match the respective address bits. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:51:15 +02:00
Borislav Petkov	9d858bb10a	amd64_edac: fix chip select handling Different processor families support a different number of chip selects. Handle this in a family-dependent way with the proper values assigned at init time (see amd64_set_dct_base_and_mask). Remove _DCSM_COUNT defines since they're used at one place and originate from public documentation. CC: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:50:50 +02:00
Keith Mannthey	2cff18c22c	amd64_edac: simple fix to allow reporting of CECC errors This allows the errors to be further decoded and mapped to csrows. Tested with ECC debug dimms and an Rev F cpu based system. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:49:58 +02:00

1 2 3 4 5 ...

418 Commits