linux

Commit Graph

Author	SHA1	Message	Date
Mauro Carvalho Chehab	224e871f36	i7core_edac: Fix oops when trying to inject errors Error injection needs the pci device 0:0. So, we need to revert this changeset: `79daef2099`. Tests need to be made to be sure that refcount won't be wrong as noticed before. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-10-31 15:10:04 -02:00
David Sterba	80b8ce89eb	i7core_edac: fix misuse of logical operation in place of bitop CC: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-10-31 15:10:04 -02:00
Mathias Krause	8cf2d2399a	i7core_edac: fixed typo in error count calculation Based on a patch from the PaX Team, found during a clang analysis pass. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: PaX Team <pageexec@freemail.hu> Cc: stable@kernel.org [v2.6.35+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-18 14:07:15 -07:00
Linus Torvalds	b7c2f03628	Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: gfs2: Drop __TIME__ usage isdn/diva: Drop __TIME__ usage atm: Drop __TIME__ usage dlm: Drop __TIME__ usage wan/pc300: Drop __TIME__ usage parport: Drop __TIME__ usage hdlcdrv: Drop __TIME__ usage baycom: Drop __TIME__ usage pmcraid: Drop __DATE__ usage edac: Drop __DATE__ usage rio: Drop __DATE__ usage scsi/wd33c93: Drop __TIME__ usage scsi/in2000: Drop __TIME__ usage aacraid: Drop __TIME__ usage media/cx231xx: Drop __TIME__ usage media/radio-maxiradio: Drop __TIME__ usage nozomi: Drop __TIME__ usage cyclades: Drop __TIME__ usage	2011-05-26 13:19:00 -07:00
Michal Marek	152ba39422	edac: Drop __DATE__ usage The kernel already prints its build timestamp during boot, no need to repeat it in random drivers and produce different object files each time. Cc: Doug Thompson <dougthompson@xmission.com> Cc: bluesmoke-devel@lists.sourceforge.net Cc: linux-edac@vger.kernel.org Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Michal Marek <mmarek@suse.cz>	2011-04-19 00:23:22 +02:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
David Sterba	e7bf068aa3	i7core_edac: fix typos in comments Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-12-28 01:20:51 +01:00
Mauro Carvalho Chehab	76a7bd8113	i7core_edac: return -ENODEV when devices were already probed Due to the nature of i7core, we need to probe and attach all PCI devices used by this driver during the first time probe is called. However, PCI core will call the probe routine one time for each CPU socket. If we return -EINVAL to those calls, it would seem that the driver fails, when, in fact, there's no more devices left to initialize. Changing the return code to -ENODEV solves this issue. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:36:19 -02:00
Mauro Carvalho Chehab	3c52cc57cc	i7core_edac: properly terminate pci_dev_table At pci_xeon_fixup(), it waits for a null-terminated table, while at i7core_get_all_devices, it just do a for 0..ARRAY_SIZE. As other tables are zero-terminated, change it to be terminate with 0 as well, and fixes a bug where it may be running out of the table elements. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:31:50 -02:00
Mauro Carvalho Chehab	a3e1541637	i7core_edac: Avoid PCI refcount to reach zero on successive load/reload That's a nasty bug that took me a lot of time to track, and whose solution took just one line to solve. The best fragrances and the worse poisons are shipped on the smalest bottles. The drivers/pci/quick.c implements the pci_get_device function. The normal behavior is that you call it, the function returns you a pdev pointer and increment pdev->kobj.kref.refcount of the pci device. However, if you want to keep searching an object, you need to pass the previous pdev function to the search. When you use a not null pointer to pdev "from" field, pci_get_device will decrement pdev->kobj.kref.refcount, assuming that the driver won't be using the previous pdev. The solution is simple: we just need to call pci_dev_get() manually, for the pdev's that the driver will actually use. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab	79daef2099	i7core_edac: Fix refcount error at PCI devices Probably due to a bug or some testing logic at PCI level, device refcount for <bus>:00.0 device is decremented at the end of the pci_get_device, made by i7core_get_all_devices(). The fact is that the first versions of the driver relied on those devices to probe for Nehalem, but the current versions don't use it at all. So, let's just remove those devices from the driver, making it simpler and fixing the bug. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab	88ef5ea976	i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL i7core_unregister_mci() checks internally when mci=NULL. There's no need to test it outside. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab	6d37d240f2	i7core_edac: Fix an oops at i7core probe changeset c91d57ba9ce5b5c93a7077e2f72510eb1f9131c4 moved the init of the priv pointer to the end of the probe routine. However, we need them before that, otherwise, we hit an OOPS: [ 67.743453] EDAC DEBUG: mci_bind_devs: Associated fn 0.0, dev = ffff88011b46e000, socket 0 [ 67.751861] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 67.759685] IP: [<ffffffffa017e484>] i7core_probe+0x979/0x130c [i7core_edac] [ 67.766721] PGD 10bd38067 PUD 10bd37067 PMD 0 [ 67.771178] Oops: 0000 [#1] SMP [ 67.774414] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map [ 67.782213] CPU 1 [ 67.784042] Modules linked in: i7core_edac(+) edac_core cpufreq_ondemand binfmt_misc dm_multipath video output pci_slot snd_hda_codd Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:41 -02:00
Hidetoshi Seto	21b6806a8c	i7core_edac: Remove unused member channels in i7core_pvt Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:41 -02:00
Hidetoshi Seto	2e5185f7ff	i7core_edac: Remove unused arg csrow from get_dimm_config A local is enough. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:41 -02:00
Hidetoshi Seto	aace42831a	i7core_edac: Reduce args of i7core_register_mci We can check the number of channels in i7core_register_mci. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:40 -02:00
Hidetoshi Seto	1c6edbbe25	i7core_edac: Introduce i7core_unregister_mci In i7core_probe, when setup of mci for 2nd or later socket failed, we should cleanup prepared mci for 1st socket or so before "put" of all devices. So let have i7core_unregister_mci that can be shared between here and i7core_remove. While here fix a typo "hanler". Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:40 -02:00
Hidetoshi Seto	73589c80cd	i7core_edac: Use saved pointers We already have saved pointers. Use shorter ones. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:40 -02:00
Hidetoshi Seto	71fe01706d	i7core_edac: Check probe counter in i7core_remove Prevent i7core_remove from running multiple times. Otherwise value proved will be negative and something will be wrong. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:40 -02:00
Hidetoshi Seto	2896637b86	i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:40 -02:00
Hidetoshi Seto	628c5ddfb0	i7core_edac: Fix error path of i7core_register_mci Release resources properly. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:40 -02:00
Hidetoshi Seto	5939813b9c	i7core_edac: Fix order of lines in i7core_register_mci The flag is_registered is not initialized until mci_bind_devs() is called. Refer it properly. The mci->dev and mci->edac_check is required in edac_mc_add_mc(), so prepare them just before the call. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:39 -02:00
Hidetoshi Seto	64c10f6e0e	i7core_edac: Always do get/put for all devices We already do 'get' for all sockets at once. So do 'put' in the same way. And let args of the 'get' function to void since it handles only the single, static and known size table pci_dev_table[]. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:39 -02:00
Hidetoshi Seto	a3aa0a4ab5	i7core_edac: Introduce i7core_pci_ctl_create/release Have a couple of method. while here sort out lines in the i7core_register_mci() a bit. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:39 -02:00
Hidetoshi Seto	2aa9be448d	i7core_edac: Introduce free_i7core_dev Have a method to make a couple with alloc_i7core_dev() previously introduced. Using in pair will help proper resource handling. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:39 -02:00
Hidetoshi Seto	848b2f7ed6	i7core_edac: Introduce alloc_i7core_dev It's nice to have a method for a single purpose. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:38 -02:00
Hidetoshi Seto	b197cba071	i7core_edac: Reduce args of i7core_get_onedevice Since we need to pass the index of the entry, pass the table itself instead of passing individual members of the table. While here make it static. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:38 -02:00
Hidetoshi Seto	45b7c981ae	i7core_edac: Fix the logic in i7core_remove() commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed the logic for unexplained reasons. It looks strange that it can release i7core_dev without calling i7core_put_devices() that releases i7core_dev->pdev. Fix the part. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab	54a08ab153	i7core_edac: Don't do the legacy PCI probe by default The legacy PCI probe sometimes cause hangs. Better to have it disabled by default, and have a parameter to enable it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab	accf74fff3	i7core_edac: don't use a freed mci struct This is a nasty bug. Since kobject count will be reduced by zero by edac_mc_del_mc(), and this triggers the kobj release method, the mci memory will be freed automatically. So, all we have left is ctl_name, as shown by enabling debug: [ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link [ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance [ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing [ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0 [ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs [ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free() [ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj() [ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices() Also, kfree(mci) shouldn't happen at the kobj.release, as it happens when edac_remove_sysfs_mci_device() is called, but the logic is: edac_remove_sysfs_mci_device(mci); edac_printk(KERN_INFO, EDAC_MC, "Removed device %d for %s %s: DEV %s\n", mci->mc_idx, mci->mod_name, mci->ctl_name, edac_dev_name(mci)); So, as the edac_printk() needs the mci struct, this generates an OOPS. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab	bbc560ae67	edac_core: Print debug messages at release calls This is important to track a nasty bug at the free logic. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab	39300e7143	i7core_edac: explicitly remove PCI devices from the devices list Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab	41ba6c1058	i7core_edac: MCE NMI handling should stop first Otherwise, a NMI may happen causing a race condition and a panic. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab	6ee7dd5044	i7core_edac: Initialize all priv vars before start polling Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab	3cfd01468b	i7core_edac: Improve debug to seek for register/remove errors Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab	e9144601d3	i7core_edac: move #if PAGE_SHIFT to edac_core.h Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:36 -02:00
Mauro Carvalho Chehab	1288c18f48	i7core_edac: Properly mark const static vars as such There are two groups of sysfs attributes: one for rdimm and another for udimm. Instead of changing dynamically the unique static struct for handling udimm's, declare two vars and make them constant. This avoids the risk of having two or more memory controllers, each needing a different set of attributes. While here, use const on all places where it is applicable. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> edac_core: use const for constant sysfs arguments Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:14 -02:00
Mauro Carvalho Chehab	18c29002f9	i7core_edac: move static vars to the beginning of the file While here, don't initialize probed with 0. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:12 -02:00
Mauro Carvalho Chehab	939747bd68	i7core_edac: Be sure that the edac pci handler will be properly released With multi-sockets, more than one edac pci handler is enabled. Be sure to un-register all instances. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-10-24 11:20:12 -02:00
Marcin Slusarz	64aab720bd	i7core_edac: fix panic in udimm sysfs attributes registration Array of udimm sysfs attributes was not ended with NULL marker, leading to dereference of random memory. EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0 EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1 EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2 BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4 IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1 Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name RIP: 0010:[<ffffffff81330b36>] [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1 (...) Call Trace: [<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1 [<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2 [<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557 [<ffffffff81428901>] i7core_probe+0xccf/0xec0 RIP [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1 ---[ end trace 20de320855b81d78 ]--- Kernel panic - not syncing: Attempted to kill init! Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-01 10:50:58 -07:00
Daniel J Blueman	ab08937400	quiesce EDAC initialisation on desktop/mobile i7 Don't print failure to detect Core i7 EDAC facilities to the console at boot time, most often occurring on Core i7 desktops and laptops. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-26 08:17:44 -07:00
Mauro Carvalho Chehab	2d95d8158b	i7core_edac: Avoid doing multiple probes for the same card As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same functionality (memory controller), the default way of proping devices doesn't work. So, instead of a per-device probe, all devices should be probed at once. This means that we should block any new attempt of probe, otherwise, it will try to register the same device several times. Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-07-02 18:04:29 -03:00
Mauro Carvalho Chehab	bda142890e	i7core_edac: Properly discover the first QPI device On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus. The last bus is generally at 0x3f or 0xff, but there are also other systems using different setups. For example, HP Z800 has 0x7f as the last bus. This patch adds a logic to discover the last bus, dynamically detecting it at runtime. Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-07-02 18:04:05 -03:00
Mauro Carvalho Chehab	52707f918c	i7core_edac: Better describe the supported devices Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 20:43:52 -03:00
Vernon Mauery	bd9e19ca46	Add support for Westmere to i7core_edac driver This adds new PCI IDs for the Westmere's memory controller devices and modifies the i7core_edac driver to be able to probe both Nehalem and Westmere processors. Signed-off-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 20:23:56 -03:00
Tony Luck	d4d1ef4515	i7core_edac: don't free on success Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 14:47:31 -03:00
Mauro Carvalho Chehab	ac1ececea9	i7core_edac: Add support for X5670 As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a different register for one of the uncore PCI devices. Add support for it. Those are the PCI ID's on this new chipset: fe:00.0 0600: 8086:2c70 (rev 02) fe:00.1 0600: 8086:2d81 (rev 02) fe:02.0 0600: 8086:2d90 (rev 02) fe:02.1 0600: 8086:2d91 (rev 02) fe:02.2 0600: 8086:2d92 (rev 02) fe:02.3 0600: 8086:2d93 (rev 02) fe:02.4 0600: 8086:2d94 (rev 02) fe:02.5 0600: 8086:2d95 (rev 02) fe:03.0 0600: 8086:2d98 (rev 02) fe:03.1 0600: 8086:2d99 (rev 02) fe:03.2 0600: 8086:2d9a (rev 02) fe:03.4 0600: 8086:2d9c (rev 02) fe:04.0 0600: 8086:2da0 (rev 02) fe:04.1 0600: 8086:2da1 (rev 02) fe:04.2 0600: 8086:2da2 (rev 02) fe:04.3 0600: 8086:2da3 (rev 02) fe:05.0 0600: 8086:2da8 (rev 02) fe:05.1 0600: 8086:2da9 (rev 02) fe:05.2 0600: 8086:2daa (rev 02) fe:05.3 0600: 8086:2dab (rev 02) fe:06.0 0600: 8086:2db0 (rev 02) fe:06.1 0600: 8086:2db1 (rev 02) fe:06.2 0600: 8086:2db2 (rev 02) fe:06.3 0600: 8086:2db3 (rev 02) (as usual, the same PCI devices repeat at ff: bus) The PCI device 8086:2c70 is shown as: fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic Non-core Registers (rev 02) So, for this device to be recognized, it is only a matter of adding this new PCI ID to the driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 13:15:42 -03:00
Vernon Mauery	8a311e179e	Always call i7core_[ur]dimm_check_mc_ecc_err This fixes an error in function i7core_check_error In commit `ca9c90ba09` which converts the driver to use double buffering, there is a change in the logic. Before, if mce_count was zero, it skipped over a couple of statements and finished out with a call to the check_mc_ecc_err function. The current code checks to see if mce_count is 0 and then exits. This change reverts the behavior back to the original where if there are no errors to report, we skip to the end and call the check_mc_ecc_err function. This fix allows the driver to work again on my Nehalem based blades again. Signed-off-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 12:43:23 -03:00
Alexander Beregalov	2a6fae3267	i7core_edac: fix memory leak of i7core_dev Free already allocated i7core_dev. Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 11:45:20 -03:00
Jiri Slaby	71753e0141	EDAC: add __init to i7core_xeon_pci_fixup It's called only from an __init function and is the only user of pcibios_scan_specific_bus which will be marked as __devinit in the next patch. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 11:45:19 -03:00
Mauro Carvalho Chehab	508fa179f8	i7core_edac: Fix wrong device id for channel 1 devices Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 12:18:31 -03:00
Mauro Carvalho Chehab	f05da2f785	i7core: add support for Lynnfield alternate address Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 12:18:29 -03:00
Mauro Carvalho Chehab	52a2e4fc37	i7core_edac: Add initial support for Lynnfield Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 12:18:28 -03:00
Randy Dunlap	3b918c12df	edac: fix i7core build Fix build warning (missing header file) and build error when CONFIG_SMP=n. drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep' drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:32 -03:00
Alan Cox	486dd09f12	edac: i7core_edac produces undefined behaviour on 32bit Fix the shifts up Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:32 -03:00
Mauro Carvalho Chehab	de06eeef58	i7core_edac: Use a more generic approach for probing PCI devices Currently, only one PCI set of tables is allowed. This prevents using the driver for other devices like Lynnfield, with have a different set of PCI ID's. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	fd3826549d	i7core_edac: PCI device is called NONCORE, instead of NOCORE Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	321ece4dda	i7core_edac: Fix ringbuffer maxsize Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	6e103be1c7	i7core_edac: First store, then increment Fix ringbuffer store logic. While here, add a few comments to the code and remove the undesired printk that could otherwise be called during NMI time. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	4f87fad1d3	i7core_edac: Better parse "any" addrmask Instead of accepting just "any", accept also "any\n" Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab	ca9c90ba09	i7core_edac: Use a lockless ringbuffer Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab	f338d73691	i7core_edac: Convert UDIMM error counters into a proper sysfs group Instead of displaying 3 values at the same var, break it into 3 different sysfs nodes: /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2 For registered dimms, however, the error counters are already being displayed at: /sys/devices/system/edac/mc/mc0/csrow*/ce_count So, there's no need to add any extra sysfs nodes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab	cc301b3ae3	edac: store/show methods for device groups weren't working Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab	a5538e531f	i7core_edac: Add support for sysfs addrmatch group Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab	4af91889e0	i7core_edac: Avoid printing a warning when debug is disabled Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	4253868034	i7core_edac: We need to use list_for_each_entry_safe to avoid errors Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	22e6bcbdcf	i7core_edac: change remove module strategy The old remove module stragegy didn't work on devices with multiple cores, since only one PCI device is used to open all mc's, due to Nehalem nature. Also, it were based at pdev value. However, this doesn't point to the pci device used at mci->dev. So, instead, it unregisters all devices at once, deleting them from the device list. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	0f062792b4	i7core_edac: remove static counter for max sockets The number of sockets is now fully dynamic. Get rid of this obsolete var. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	13d6e9b653	i7core_edac: at remove, don't remove all pci devices at once Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab	d88b85072f	i7core_edac: Fix a bug when printing error counts with RDIMMs Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab	d4c277957f	i7core_edac: a few fixes for multiple mc's Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab	6c6aa3afdb	i7core_edac: sanity check: print a warning if a mcelog is ignored In thesis, the other mc controller should handle it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	f47429494f	i7core_edac: create one mc per socket/QPI Instead of creating just one memory controller, create one per socket (e. g. per Quick Link Path Interconnect). This better reflects the Nehalem architecture. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	66607706ce	Dynamically allocate memory for PCI devices Instead of using a static table assuming always 2 CPU sockets, allocate space dynamically for Nehalem PCI devs. This patch is part of a series of patches that changes i7core_edac to allow more than 2 sockets and to properly report one memory controller per socket.	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	a55456f344	i7core: temporary workaround to allow it to compile against 2.6.30 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	3a3bb4a647	i7core_edac: Improve corrected_error_counts output for RDIMM Just cosmetics. instead of showing something like: socket 0, channel 2dimm0: 1 dimm1: 0 dimm2: 0 socket 1, channel 2dimm0: 0 dimm1: 0 dimm2: 0 Show: socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0 socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0 This is more synthetic and easier to parse. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Keith Mannthey	bc2d7245ff	i7core_edac: Probe on Xeons eariler On the Xeon 55XX series cpus the pci deives are not exposed via acpi so we much explicitly probe them to make the usable as a Linux PCI device. This moves the detection of this state to before pci_register_driver is called. Its present position was not working on my systems, the driver would complain about not finding a specific device. This patch allows the driver to load on my systems. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab	14d2c08343	i7core: Use registered memories per processor Instead of assuming that the entire machine has either registered or unregistered memories, do it at CPU socket based. While here, fix a bug at i7core_mce_output_error(), where the we're using m->cpu directly as if it would represent a socket. Instead, the proper socket_id is given by cpu_data[m->cpu].phys_proc_id. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> ---	2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab	b4e8f0b6ea	i7core_edac: Use Device 3 function 2 to report errors with RDIMM's Nehalem and upper chipsets provide an special device that has corrected memory error counters detected with registered dimms. This device is only seen if there are registered memories plugged. After this patch, on a machine fully equiped with RDIMM's, it will use the Device 3 function 2 to count corrected errors instead on relying at mcelog. For unregistered DIMMs, it will keep the old behavior, counting errors via mcelog. This patch were developed together with Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Keith Mannthey	61053fdedb	i7core_edac: Fix ecc enable shift From: Keith Mannthey <kmannth@us.ibm.com> Simple correction to a shift value. ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c This correctly identifies the state of the ECC at the machine. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab	3ef288a983	i7core_edac: Print an error message if pci register fails Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab	b990538a78	i7core_edac: CodingSyle fixes/cleanups No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab	4157d9f554	i7core_edac: fix error injection There were two stupid error injection bugs introduced by wrong cut-and-paste: one at socket store, and another at the error inject register. The last one were causing the code to not work at all. While here, adds debug messages to allow seeing what registers are being set while sending error injection. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab	2068def56c	i7core_edac: fix error codes for sysfs error injection interface Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab	276b824c30	i7core_edac: some fixes at error injection code Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	17cb7b0cf7	i7core_edac: Some cleanups at displayed info Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	086271a037	i7core: remove some uneeded noisy debug messages Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	3a7dde7fcd	i7core: add socket info at the debug msg Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	ec6df24c15	i7core: better document i7core_get_active_channels() Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	c77720b954	i7core: fix get_devices routine for Xeon55xx i7core_get_devices() were preparet to get just the first found device of each type. Due to that, on Xeon 55xx, only socket 1 were retrived. Rework i7core_get_devices() to clean it and to properly support Xeon 55xx. While here, fix a small typo. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	a639539fa2	i7core: enrich error information based on memory transaction type Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	c5d3452869	i7core: check if the memory error is fatal or non-fatal Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	310cbb7284	i7core: fix probing on Xeon55xx Xeon55xx fails to probe with this error message: EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init() EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41 i7core_edac: probe of 0000:00:14.0 failed with error -22 This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has PCI ID 8086:2c40. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	f237fcf2b7	i7core_edac: some fixes at memory error parser m->bank is not related to the memory bank but, instead, to the MCA Error register bank. Fix it accordingly. While here, improves the comments for Nehalem bank. A later fix is needed, in order to get bank/rank information from MCA error log. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	8a2f118e3a	i7core_edac: decode mcelog error and send it via edac interface Enriches mcelog error by using the encoded information at MCE status and misc registers (IA32_MCx_STATUS, IA32_MCx_MISC). Some fixes are still needed here, in order to properly fill the EDAC fields. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	ba6c5c62ee	i7core_edac: maps all sockets as if ther are one MC controller Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	67166af4ab	i7core_edac: add support for more than one MC socket Some Nehalem architectures have more than one MC socket. Socket 0 is located at bus 255. Currently, it is using up to 2 sockets, but increasing it to a larger number is just a matter of increasing MAX_SOCKETS definition. This seems to be required for properly support of Xeon 55xx. Still needs testing with Xeon 55xx. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	d1fd4fb69e	i7core_edac: Add a code to probe Xeon 55xx bus This code changes the detection procedure of i7core_edac. Instead of directly probing for MC registers, it probes for another register found on Nehalem. If found, it tries to pick the first MC PCI BUS. This should work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255 that are not properly detected by the non-legacy PCI methods. The new detection code scans specifically at buses 254 and 255 for the Xeon 55xx devices. This code has not tested yet. After working, a change at the code will be needed, since the i7core is not yet ready for working with 2 sets of MC. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	e9bd2e7379	i7core_edac: Adds write unlock to MC registers The public Intel Xeon 5500 volume 2 datasheet describes, on page 53, session 2.6.7 a register that can lock/unlock Memory Controller the configuration register, called MC_CFG_CONTROL. Adds support for it in the hope that software error injection would work. With my tests with Xeon 35xx, there's still something missing. With a program that does sequencial bit writes at dev 0.0, sometimes, it produces error injection, after unblocking the MC_CFG_CONTROL (and, sometimes, it just locks my testing machine). I'll try later to discover by trial and error what's the register that solves this issue on Xeon 35xx. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	d5381642ab	i7core_edac: Add edac_mce glue Adds a glue code to allow i7core to work with mcelog. With the glue, i7core registers itself on edac_mce. At mce, when an error is detected, it calls all registered drivers (in this case, i7core), for EDAC error handling. TODO: It currently just prints the MCE error log using about the same format as mce panic messages. The error message should be enhanced with mcelog userspace info and converted into the proper EDAC format, to feed the EDAC error counts. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00

1 2 3 4

166 Commits