Commit Graph

418 Commits

Author SHA1 Message Date
Linus Torvalds 9a9620db07 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits)
  i7core_edac: Better describe the supported devices
  Add support for Westmere to i7core_edac driver
  i7core_edac: don't free on success
  i7core_edac: Add support for X5670
  Always call i7core_[ur]dimm_check_mc_ecc_err
  i7core_edac: fix memory leak of i7core_dev
  EDAC: add __init to i7core_xeon_pci_fixup
  i7core_edac: Fix wrong device id for channel 1 devices
  i7core: add support for Lynnfield alternate address
  i7core_edac: Add initial support for Lynnfield
  i7core_edac: do not export static functions
  edac: fix i7core build
  edac: i7core_edac produces undefined behaviour on 32bit
  i7core_edac: Use a more generic approach for probing PCI devices
  i7core_edac: PCI device is called NONCORE, instead of NOCORE
  i7core_edac: Fix ringbuffer maxsize
  i7core_edac: First store, then increment
  i7core_edac: Better parse "any" addrmask
  i7core_edac: Use a lockless ringbuffer
  edac: Create an unique instance for each kobj
  ...
2010-06-04 15:39:54 -07:00
Anatolij Gustschin a26f95fed3 of/edac: fix build breakage in drivers
Fixes build errors in EDAC drivers caused by the OF
device_node pointer being moved into struct device

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-06-02 21:02:41 -06:00
Joe Perches 63ae96be98 drivers/edac: convert logging messages direct uses of __FILE__ to %s, __FILE
Reduces text by eliminating multiple __FILE__ uses.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Grant Likely cf9b59e9d3 Merge remote branch 'origin' into secretlab/next-devicetree
Merging in current state of Linus' tree to deal with merge conflicts and
build failures in vio.c after merge.

Conflicts:
	drivers/i2c/busses/i2c-cpm.c
	drivers/i2c/busses/i2c-mpc.c
	drivers/net/gianfar.c

Also fixed up one line in arch/powerpc/kernel/vio.c to use the
correct node pointer.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-22 00:36:56 -06:00
Grant Likely 4018294b53 of: Remove duplicate fields from of_platform_driver
.name, .match_table and .owner are duplicated in both of_platform_driver
and device_driver.  This patch is a removes the extra copies from struct
of_platform_driver and converts all users to the device_driver members.

This patch is a pretty mechanical change.  The usage model doesn't change
and if any drivers have been missed, or if anything has been fixed up
incorrectly, then it will fail with a compile time error, and the fixup
will be trivial.  This patch looks big and scary because it touches so
many files, but it should be pretty safe.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Sean MacLennan <smaclennan@pikatech.com>
2010-05-22 00:10:40 -06:00
Mauro Carvalho Chehab 52707f918c i7core_edac: Better describe the supported devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 20:43:52 -03:00
Vernon Mauery bd9e19ca46 Add support for Westmere to i7core_edac driver
This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 20:23:56 -03:00
Roman Fietze ee6583f6e8 PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments
This fixes all occurrences of pci_enable_device and pci_disable_device
in all comments. There are no code changes involved.

Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-18 14:59:08 -07:00
Tony Luck d4d1ef4515 i7core_edac: don't free on success
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 14:47:31 -03:00
Mauro Carvalho Chehab ac1ececea9 i7core_edac: Add support for X5670
As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a
different register for one of the uncore PCI devices. Add support for
it.

Those are the PCI ID's on this new chipset:

fe:00.0 0600: 8086:2c70 (rev 02)
fe:00.1 0600: 8086:2d81 (rev 02)
fe:02.0 0600: 8086:2d90 (rev 02)
fe:02.1 0600: 8086:2d91 (rev 02)
fe:02.2 0600: 8086:2d92 (rev 02)
fe:02.3 0600: 8086:2d93 (rev 02)
fe:02.4 0600: 8086:2d94 (rev 02)
fe:02.5 0600: 8086:2d95 (rev 02)
fe:03.0 0600: 8086:2d98 (rev 02)
fe:03.1 0600: 8086:2d99 (rev 02)
fe:03.2 0600: 8086:2d9a (rev 02)
fe:03.4 0600: 8086:2d9c (rev 02)
fe:04.0 0600: 8086:2da0 (rev 02)
fe:04.1 0600: 8086:2da1 (rev 02)
fe:04.2 0600: 8086:2da2 (rev 02)
fe:04.3 0600: 8086:2da3 (rev 02)
fe:05.0 0600: 8086:2da8 (rev 02)
fe:05.1 0600: 8086:2da9 (rev 02)
fe:05.2 0600: 8086:2daa (rev 02)
fe:05.3 0600: 8086:2dab (rev 02)
fe:06.0 0600: 8086:2db0 (rev 02)
fe:06.1 0600: 8086:2db1 (rev 02)
fe:06.2 0600: 8086:2db2 (rev 02)
fe:06.3 0600: 8086:2db3 (rev 02)
(as usual, the same PCI devices repeat at ff: bus)

The PCI device 8086:2c70 is shown as:

fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic
Non-core Registers (rev 02)

So, for this device to be recognized, it is only a matter of adding this
new PCI ID to the driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 13:15:42 -03:00
Vernon Mauery 8a311e179e Always call i7core_[ur]dimm_check_mc_ecc_err
This fixes an error in function i7core_check_error

In commit ca9c90ba09 which converts the
driver to use double buffering, there is a change in the logic.  Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function.  The current
code checks to see if mce_count is 0 and then exits.

This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.

This fix allows the driver to work again on my Nehalem based blades
again.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 12:43:23 -03:00
Alexander Beregalov 2a6fae3267 i7core_edac: fix memory leak of i7core_dev
Free already allocated i7core_dev.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 11:45:20 -03:00
Jiri Slaby 71753e0141 EDAC: add __init to i7core_xeon_pci_fixup
It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 11:45:19 -03:00
Mauro Carvalho Chehab 508fa179f8 i7core_edac: Fix wrong device id for channel 1 devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:31 -03:00
Mauro Carvalho Chehab f05da2f785 i7core: add support for Lynnfield alternate address
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:29 -03:00
Mauro Carvalho Chehab 52a2e4fc37 i7core_edac: Add initial support for Lynnfield
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:28 -03:00
Randy Dunlap 3b918c12df edac: fix i7core build
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.

drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:32 -03:00
Alan Cox 486dd09f12 edac: i7core_edac produces undefined behaviour on 32bit
Fix the shifts up

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:32 -03:00
Mauro Carvalho Chehab de06eeef58 i7core_edac: Use a more generic approach for probing PCI devices
Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab fd3826549d i7core_edac: PCI device is called NONCORE, instead of NOCORE
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab 321ece4dda i7core_edac: Fix ringbuffer maxsize
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab 6e103be1c7 i7core_edac: First store, then increment
Fix ringbuffer store logic.

While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab 4f87fad1d3 i7core_edac: Better parse "any" addrmask
Instead of accepting just "any", accept also "any\n"

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab ca9c90ba09 i7core_edac: Use a lockless ringbuffer
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab b968759ee7 edac: Create an unique instance for each kobj
Current code only works when there's just one memory
controller, since we need one kobj for each instance.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab f338d73691 i7core_edac: Convert UDIMM error counters into a proper sysfs group
Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:

/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2

For registered dimms, however, the error counters are already being
displayed at:
	/sys/devices/system/edac/mc/mc0/csrow*/ce_count

So, there's no need to add any extra sysfs nodes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab c419d921e6 edac: Don't create csrow entries on instance groups
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab cc301b3ae3 edac: store/show methods for device groups weren't working
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab a5538e531f i7core_edac: Add support for sysfs addrmatch group
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab 9fa2fc2e2d edac_core: Allow the creation of sysfs groups
Currently, all sysfs nodes are stored at /sys/.*/mc. (regex)
However, sometimes it is needed to create attribute groups.

This patch extends edac_core to allow groups creation.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab 4af91889e0 i7core_edac: Avoid printing a warning when debug is disabled
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 4253868034 i7core_edac: We need to use list_for_each_entry_safe to avoid errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 22e6bcbdcf i7core_edac: change remove module strategy
The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.

Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.

So, instead, it unregisters all devices at once, deleting them from the
device list.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 0f062792b4 i7core_edac: remove static counter for max sockets
The number of sockets is now fully dynamic. Get rid of this obsolete
var.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab 13d6e9b653 i7core_edac: at remove, don't remove all pci devices at once
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab d88b85072f i7core_edac: Fix a bug when printing error counts with RDIMMs
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab d4c277957f i7core_edac: a few fixes for multiple mc's
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab 6c6aa3afdb i7core_edac: sanity check: print a warning if a mcelog is ignored
In thesis, the other mc controller should handle it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab f47429494f i7core_edac: create one mc per socket/QPI
Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).

This better reflects the Nehalem architecture.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab 66607706ce Dynamically allocate memory for PCI devices
Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.

This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab a55456f344 i7core: temporary workaround to allow it to compile against 2.6.30
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab 3a3bb4a647 i7core_edac: Improve corrected_error_counts output for RDIMM
Just cosmetics. instead of showing something like:

socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0

Show:

socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0

This is more synthetic and easier to parse.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Keith Mannthey bc2d7245ff i7core_edac: Probe on Xeons eariler
On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.

This moves the detection of this state to before pci_register_driver is
called.  Its present position was not working on my systems, the driver
would complain about not finding a specific device.

This patch allows the driver to load on my systems.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab 14d2c08343 i7core: Use registered memories per processor
Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.

While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab b4e8f0b6ea i7core_edac: Use Device 3 function 2 to report errors with RDIMM's
Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.

After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.

For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.

This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Keith Mannthey 61053fdedb i7core_edac: Fix ecc enable shift
From: Keith Mannthey <kmannth@us.ibm.com>

Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c

This correctly identifies the state of the ECC at the machine.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab 3ef288a983 i7core_edac: Print an error message if pci register fails
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab b990538a78 i7core_edac: CodingSyle fixes/cleanups
No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab 4157d9f554 i7core_edac: fix error injection
There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.

While here, adds debug messages to allow seeing what registers are being
set while sending error injection.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab 2068def56c i7core_edac: fix error codes for sysfs error injection interface
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:55 -03:00