Commit Graph

13899 Commits

Author SHA1 Message Date
Cédric Le Goater e3c5c2e0bc powerpc/powernv: convert codes returned by OPAL calls
OPAL has its own list of return codes. The patch provides a translation
of such codes in errnos for the opal_sensor_read call, and possibly
others if needed.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-31 14:50:33 +11:00
Joe Perches acdb66857f powerpc: Use bool function return values of true/false not 1/0
Use the normal return values for bool functions

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-31 14:19:47 +11:00
Benjamin Herrenschmidt d4ed11aa48 Merge branch 'next-eeh' into next-sriov
Merge in Gavin EEH fixes
2015-03-31 13:11:17 +11:00
Gavin Shan 027fa02f84 powerpc/powernv: Don't map M64 segments using M32DT
If M64 has been supported, the prefetchable 64-bits memory resources
shouldn't be mapped to the corresponding PE# via M32DT. Unfortunately,
we're doing that in pnv_ioda_setup_pe_seg() wrongly. The issue was
introduced by commit 262af55 ("powerpc/powernv: Enable M64 aperatus
for PHB3"). The patch fixes the issue by simply skipping M64 resources
when updating to M32DT.

Cc: <stable@vger.kernel.org>  # v3.17+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:10:40 +11:00
Gavin Shan 433185d2b4 powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()
The function eeh_add_parent_pe() is used to create a PE or add one
edev to its parent PE. Current code checks if PE#0 is valid for the
later case. Actually, we should validate PE#0 for both cases when
EEH core regards PE#0 as invalid one (without flag EEH_VALID_PE_ZERO).
Otherwise, not all EEH devices can be added to its parent PE#0 for
EEH on P7IOC.

The patch fixes the issue by validating PE#0 for the two cases. So far,
we don't have PE#0 for EEH on P7IOC, but it will show up when we enable
M64 for P7IOC. The patch also makes the error message more meaningful.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:10:39 +11:00
Wei Yang 250c7b277c powerpc/pci: Remove unused struct pci_dn.pcidev field
In struct pci_dn, the pcidev field is assigned but not used, so remove it.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 02639b0e13 powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
When IOV BAR is big, each is covered by 4 M64 windows.  This leads to
several VF PE sits in one PE in terms of M64.

Group VF PEs according to the M64 allocation.

[bhelgaas: use dev_printk() when possible]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 5b88ec2284 powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported
M64 aperture size is limited on PHB3.  When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.

Introduce a different mechanism based on the IOV BAR size:

  - if IOV BAR size is smaller than 64MB, expand to total_pe
  - if IOV BAR size is bigger than 64MB, roundup power2

[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 781a868f31 powerpc/powernv: Shift VF resource with an offset
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

    After doing so, there would be a "hole" in the /proc/iomem when offset
    is a positive value. It looks like the device return some mmio back to
    the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:38 +11:00
Wei Yang 5350ab3fd7 powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv
Implement pcibios_iov_resource_alignment() on powernv platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang 6e628c7d33 powerpc/powernv: Reserve additional space for IOV BAR according to the number of total_pe
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
M64 BAR is a type of hardware resource in PHB3, which could map a range of
MMIO to PE numbers on powernv platform. And this range is divided equally
by the number of total_pe with each divided range mapping to a PE number.
Also, the M64 BAR must map a MMIO range with power-of-two size.

The total_pe number is usually different from total_VFs, which can lead to
a conflict between MMIO space and the PE number.

For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 BAR will be part of other PCI device, which may already belong to other
PEs.

This patch prevents the conflict by reserving additional space for the PF
IOV BAR, which is total_pe number of VF's BAR size.

[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang 9e8d4a19ab powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
stage. Since PEs are based on the hardware layout which is static in the
system, they will never get released. This means the iommu_table in the
pnv_ioda_pe will never get released either.

This no longer works for VF PE. VF PEs are created and released dynamically
when VFs are created and released. So we need to assign pnv_ioda_pe to VF
PEs respectively when VFs are enabled and clean up those resources for VF
PE when VFs are disabled. And iommu_table is one of the resources we need
to handle dynamically.

Current iommu_table is a static field in pnv_ioda_pe, which will face a
problem when freeing it. During the disabling of a VF,
pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
iommu_table for this PE. A static iommu_table will fail in
iommu_free_table.

According to these requirement, this patch allocates iommu_table
dynamically.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Wei Yang c3b80fb0f2 powerpc/pci: Don't unset PCI resources for VFs
Flag PCI_REASSIGN_ALL_RSRC is used to ignore resources information setup by
firmware, so that kernel would re-assign all resources of pci devices.

On powerpc arch, this happens in a header fixup function
pcibios_fixup_resources(), which will clean up the resources if this flag
is set. This works fine for PFs, since after clean up, kernel will
re-assign the resources in pcibios_resource_survey().

Below is a simple call flow on how it works:

    pcibios_init
      pcibios_scan_phb
        pci_scan_child_bus
          ...
            pci_device_add
              pci_fixup_device(pci_fixup_header)
                pcibios_fixup_resources                     # header fixup
                  for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
                    dev->resource[i].start = 0
      pcibios_resource_survey                               # re-assign
        pcibios_allocate_resources

However, the VF resources won't be re-assigned, since the VF resources are
completely determined by the PF resources, and the PF resources have
already been reassigned. This means we need to leave VF's resources
un-cleared in pcibios_fixup_resources().

In this patch, we skip the resource unset process in
pcibios_fixup_resources(), if the pci_dev is a VF.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Gavin Shan a8b2f8288a powerpc/pci: Create pci_dn for VFs
pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes and pci_dn, which is required to access
VFs' config spaces.

The patch creates pci_dn for VFs in pcibios_sriov_enable() on their PF,
and removes pci_dn for VFs in pcibios_sriov_disable() on their PF. When
VF's pci_dn is created, it's put to the child list of the pci_dn of PF's
upstream bridge. The pci_dn is linked to pci_dev during early fixup time
to setup the fast path.

[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-31 13:02:37 +11:00
Michael Ellerman 529d235a0e powerpc: Add a proper syscall for switching endianness
We currently have a "special" syscall for switching endianness. This is
syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
exception entry.

That has a few problems, firstly the syscall number is outside of the
usual range, which confuses various tools. For example strace doesn't
recognise the syscall at all.

Secondly it's handled explicitly as a special case in the syscall
exception entry, which is complicated enough without it.

As a first step toward removing the special syscall, we need to add a
regular syscall that implements the same functionality.

The logic is simple, it simply toggles the MSR_LE bit in the userspace
MSR. This is the same as the special syscall, with the caveat that the
special syscall clobbers fewer registers.

This version clobbers r9-r12, XER, CTR, and CR0-1,5-7.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-28 22:03:40 +11:00
Tyrel Datwyler c03e73740d powerpc/pseries: Simplify check for suspendability during suspend/migration
During suspend/migration operation we must wait for the VASI state reported
by the hypervisor to become Suspending prior to making the ibm,suspend-me
RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
that exposes the VASI state to the caller. This is unnecessary as the caller
only really cares about the following three conditions; if there is an error
we should bailout, success indicating we have suspended and woken back up so
proceed to device tree update, or we are not suspendable yet so try calling
rtas_ibm_suspend_me again shortly.

This patch removes the extraneous vasi_state variable and simply uses the
return code to communicate how to proceed. We either succeed, fail, or get
-EAGAIN in which case we sleep for a second before trying to call
rtas_ibm_suspend_me again. The behaviour of ppc_rtas() remains the same,
but migrate_store() now returns the propogated error code on failure.
Previously -1 was returned from migrate_store() in the  failure case which
equates to -EPERM and was clearly wrong.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Nathan Fontenont <nfont@linux.vnet.ibm.com>
Cc: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-28 12:20:39 +11:00
Ingo Molnar b381e63b48 Merge branch 'perf/core' into perf/timer, before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 10:10:47 +01:00
Jan Stancek 68de8867ea powerpc/perf: add missing put_cpu_var in power_pmu_event_init
One path in power_pmu_event_init() calls get_cpu_var(), but is
missing matching call to put_cpu_var(), which causes preemption
imbalance and crash in user-space:

  Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280
  NIP = 3fff9bf2cae0  MSR = 900000014280f032
  Oops: Weird page fault, sig: 11 [#23]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: <snip>
  CPU: 43 PID: 10285 Comm: a.out Tainted: G      D         4.0.0-rc5+ #1
  task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000
  NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0
  REGS: c000001fe835fea0 TRAP: 0401   Tainted: G      D          (4.0.0-rc5+)
  MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI>  CR: 22000028  XER: 00000000
  CFAR: 00003fff9bee4894 SOFTE: 1
   GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068
   GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001
   GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70
   GPR12: 0000000052000022 00003fff9c10b700
  NIP [00003fff9bf2cae0] 0x3fff9bf2cae0
  LR [00003fff9bee4898] 0x3fff9bee4898
  Call Trace:
  ---[ end trace 5d3d952b5d4185d4 ]---

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
  in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out
  INFO: lockdep is turned off.
  CPU: 43 PID: 10285 Comm: a.out Tainted: G      D         4.0.0-rc5+ #1
  Call Trace:
  [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable)
  [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0
  [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110
  [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160
  [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70
  [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450
  [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900
  [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30
  note: a.out[10285] exited with preempt_count 1

Reproducer:
  #include <stdio.h>
  #include <unistd.h>
  #include <syscall.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <linux/perf_event.h>
  #include <linux/hw_breakpoint.h>

  static struct perf_event_attr event = {
          .type = PERF_TYPE_RAW,
          .size = sizeof(struct perf_event_attr),
          .sample_type = PERF_SAMPLE_BRANCH_STACK,
          .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN,
  };

  int main()
  {
          syscall(__NR_perf_event_open, &event, 0, -1, -1, 0);
  }

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-27 20:07:01 +11:00
Ingo Molnar 072e5a1cfa Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:03 +01:00
Andre Przywara af669ac6dc KVM: move iodev.h from virt/kvm/ to include/kvm
iodev.h contains definitions for the kvm_io_bus framework. This is
needed both by the generic KVM code in virt/kvm as well as by
architecture specific code under arch/. Putting the header file in
virt/kvm and using local includes in the architecture part seems at
least dodgy to me, so let's move the file into include/kvm, so that a
more natural "#include <kvm/iodev.h>" can be used by all of the code.
This also solves a problem later when using struct kvm_io_device
in arm_vgic.h.
Fixing up the FSF address in the GPL header and a wrong include path
on the way.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:12 +00:00
Nikolay Nikolaev e32edf4fd0 KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.
This is needed in e.g. ARM vGIC emulation, where the MMIO handling
depends on the VCPU that does the access.

Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-03-26 21:43:11 +00:00
Michael Ellerman df60f57684 Merge branch 'next-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into test
Merge miscellaneous bits from benh. Fix a minor conflict with
OpalMessageType changing names to opal_msg_type.
2015-03-26 20:04:28 +11:00
Preeti U Murthy 605f302053 powerpc/powernv: Avoid explicit endian conversions while parsing device tree
We currently read the information about idle states from the device
tree, so as to find out the CPU idle states supported by the platform.

Use the of_property_read/count_xxx() APIs, which handle endian
conversions for us, and mean we don't need any endian annotations in the
code.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-26 15:23:18 +11:00
Yanjiang Jin e77553cb21 powerpc/mm: Free string after creating kmem cache
kmem_cache_create()->kmem_cache_create_memcg()->kstrdup() allocates new
space and copys name's content, so it is safe to free name memory after
calling kmem_cache_create(). Else kmemleak will report the below
warning:

unreferenced object 0xc0000000f9002160 (size 16):
  comm "swapper/0", pid 0, jiffies 4294892296 (age 1386.640s)
  hex dump (first 16 bytes):
    70 67 74 61 62 6c 65 2d 32 5e 39 00 de ad be ef  pgtable-2^9.....
  backtrace:
    [<c0000000004e03ec>] .kvasprintf+0x5c/0xa0
    [<c0000000004e045c>] .kasprintf+0x2c/0x50
    [<c00000000002e36c>] .pgtable_cache_add+0xac/0x100
    [<c00000000002e3e4>] .pgtable_cache_init+0x24/0x80
    [<c000000000c6c67c>] .start_kernel+0x228/0x4c8
    [<c000000000000594>] .start_here_common+0x24/0x90

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-26 15:23:17 +11:00
Marcelo Tosatti 27bfc6cfda Patch queue for 4.0 - 2015-03-25
A few bug fixes for Book3S HV KVM:
 
   - Fix spinlock ordering
   - Fix idle guests on LE hosts
   - Fix instruction emulation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJVEy9HAAoJECszeR4D/txgPxEQAKMglNb5wHkm0QaQrXZ0sYs0
 w/N6QUM/UWG4x6kFb1JUBwt+Piboxriial9xVUdYwnIZbvWfN6X6+HEs599R4dHm
 m9at/dvOo4/Zd5TRVlV3CJUIkiWtFYAgmBU8oy03bbyiPyT+qk8RPwusH1iTy+iW
 1ul9cuKCZ1EL3zIDe0pkVsF8Z7cB2QO/1ACbuM1LQdn74FoZen9VoKepDV+jG01n
 Dw8zwbdCmCb4aMtYCu42jjvcNlf3qNNNzm31vDXl085lXOOdwVSppUhMIlciNrxK
 MuJ0NhT7zLL2BSLBD9R7Zaiify0Zl/x8ja2g+FIKQRufVFKZkcSBXpUV7uFuJMNA
 BdIZkpKAwLNcUpmOG1eJ1xRbSzhDa3DazbInV2BBaySUgG1cDtWOCVa6rFHA4f5X
 Kgcug1aeB62jgvx69JjM3EOnjwvEzTwMMeCELAsjXgRIUKZj6ietJ8Zz3StrDNRj
 HRsu/yvS/56qOXA4vcMXcqx0Ziztpwv0Ttrk9aqOkwfkTdg5+sFrMqFWbIA+opzX
 Zuw0HF+CpbLzdCqiIalA56WhVfExZq4uApzfKhdPFu2lAznILYbMq+1M+8F6KjXe
 hkUCdXE9J/C2bnrRLR5NlKa/IPJTQcqWttLtphO3+jeZzevN26t268xxO/5IZuZ6
 QUhZ6XGXgbabxYqLIepw
 =fAF8
 -----END PGP SIGNATURE-----

Merge tag 'signed-for-4.0' of git://github.com/agraf/linux-2.6

Patch queue for 4.0 - 2015-03-25

A few bug fixes for Book3S HV KVM:

  - Fix spinlock ordering
  - Fix idle guests on LE hosts
  - Fix instruction emulation
2015-03-25 20:20:31 -03:00
Neelesh Gupta b921e90260 powerpc/powernv: Add OPAL message notifier unregister function
Provide an unregister interface for the opal message notifiers
to be called when not needed like during driver unload/remove.

Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:28 +11:00
Neelesh Gupta 792f96e9a7 powerpc/powernv: Fix the overflow of OPAL message notifiers head array
Fixes the condition check of incoming message type which can
otherwise shoot beyond the message notifiers head array.

Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:28 +11:00
Geert Uytterhoeven 1f8c82ab1b cpufreq/ppc: Add missing #include <asm/smp.h>
If CONFIG_SMP=n, <linux/smp.h> does not include <asm/smp.h>, causing:

drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init':
drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-funcuresh E. Warrier" <warrier@linux.vnet.ibm.com>
X-Patchwork-Id: 443703
Message-Id: <54EE5989.7010800@linux.vnet.ibm.com>
To: linuxppc-dev@ozlabs.org
Date: Wed, 25 Feb 2015 17:23:53 -0600

Export __spin_yield so that the arch_spin_unlock() function can
be invoked from a module. This will be required for modules where
we want to take a lock that is also is acquired in hypervisor
real mode. Because we want to avoid running any lockdep code
(which may not be safe in real mode), this lock needs to be
an arch_spinlock_t instead of a normal spinlock.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:28 +11:00
Geert Uytterhoeven 8f910fd0d9 powerpc/pmac: Fix DT refcount imbalance in pmac_pic_probe_oldstyle
Internally, of_find_node_by_name() calls of_node_put() on its "from"
parameter, which must not be done on "master", as it's still in use, and
will be released manually later.  This may cause a zero kref refcount.

Call of_node_get() before to compensate for this.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:27 +11:00
Benjamin Herrenschmidt 3bf57561d4 powerpc/powernv: Support OPAL requested heartbeat
If OPAL requests it, call it back via opal_poll_events() at a
regular interval. Some versions of OPAL on some machines require
this to operate some internal timeouts properly.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:53:27 +11:00
Vasant Hegde 3f77df7f81 powerpc/powernv: Check image loaded or not before calling flash
Present code checks for update_flash_data in opal_flash_term_callback().
update_flash_data has been statically initialized to zero, and that
is the value of FLASH_IMG_READY. Also code update initialization happens
during subsys init.

So if reboot is issued before the subsys init stage then we endup displaying
"Flashing new firmware" message.. which may confuse end user.

This patch fixes above described issue by initializes update_flash status
to invalid state.

Reported-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-25 16:17:02 +11:00
David Gibson 0eebf9b5d2 powerpc: Remove unused st_le*() and ld_le* functions
The powerpc specific st_le*() and ld_le*() functions in
arch/powerpc/asm/swab.h no longer have any users.  They are also
misleadingly named, since they always byteswap, even on a little-endian
host.

This patch removes them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:33:52 +11:00
David Gibson d078eed35d powerpc: Cleanup KVM emulated load/store endian handling
Sometimes the KVM code on powerpc needs to emulate load or store
instructions from the guest, which can include both normal and byte
reversed forms.

We currently (AFAICT) handle this correctly, but some variable names are
very misleading.  In particular we use "is_bigendian" in several places to
actually mean "is the IO the same endian as the host", but we now support
little-endian powerpc hosts.  This also ties into the misleadingly named
ld_le*() and st_le*() functions, which in fact always byteswap, even on
an LE host.

This patch cleans this up by renaming to more accurate "host_swabbed", and
uses the generic swab*() functions instead of the powerpc specific and
misleadingly named ld_le*() and st_le*() functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:33:51 +11:00
Gavin Shan c6406d8fbb powerpc/eeh: Remove device_node dependency
The patch removes struct eeh_dev::dn and the corresponding helper
functions: eeh_dev_to_of_node() and of_node_to_eeh_dev(). Instead,
eeh_dev_to_pdn() and pdn_to_eeh_dev() should be used to get the
pdn, which might contain device_node on PowerNV platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:53 +11:00
Gavin Shan 0bd785873c powerpc/eeh: Replace device_node with pci_dn in eeh_ops
There are 3 EEH operations whose arguments contain device_node:
read_config(), write_config() and restore_config(). The patch
replaces device_node with pci_dn.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:52 +11:00
Gavin Shan ff57b454dd powerpc/eeh: Do probe on pci_dn
Originally, EEH core probes on device_node or pci_dev to populate
EEH devices and PEs, which conflicts with the fact: SRIOV VFs are
usually enabled and created by PF's driver and they don't have the
corresponding device_nodes. Instead, SRIOV VFs have dynamically
created pci_dn, which can be used for EEH probe.

The patch reworks EEH probe for PowerNV and pSeries platforms to
do probing based on pci_dn, instead of pci_dev or device_node any
more.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:52 +11:00
Gavin Shan e8e9b34cef powerpc/eeh: Create eeh_dev from pci_dn instead of device_node
The patch adds function traverse_pci_dn(), which is similar to
traverse_pci_devices() except it takes pci_dn, not device_node
as parameter. The pci_dev.c has been reworked to create eeh_dev
from pci_dn, instead of device_node.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:51 +11:00
Gavin Shan c035ff1d2e powerpc/pci: Trace more information from pci_dn
Originally, EEH probes on device_node or pci_dev and populates the
corresponding eeh_dev. In the subsequent patches, EEH will probes
on pci_dn and populates the corresponding eeh_dev. So we have to
cache some information in pci_dn, either from device_node or SRIOV
PF's enablement platform hook, to populate the eeh_dev properly.

The motivation to probe pci_dn, instead of device node or pci_dev,
to populate eeh_dev is SRIOV VFs are dynamically created and we
don't have the corresponding device nodes for them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:50 +11:00
Gavin Shan 3532a741f8 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
The PCI config accessors previously relied on device_node.  Unfortunately,
VFs don't have a corresponding device_node, so change the accessors to use
pci_dn instead.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:50 +11:00
Gavin Shan cca87d303c powerpc/pci: Refactor pci_dn
Currently, the PCI config accessors are implemented based on device node.
Unfortunately, SRIOV VFs won't have the corresponding device nodes. pci_dn
will be used in replacement with device node for SRIOV VFs. So we have to
use pci_dn in PCI config accessors.

The patch refactors pci_dn in following aspects to make it ready to be used
in PCI config accessors as we do in subsequent patch:

   * pci_dn is organized as a hierarchy tree.  PCI device's pci_dn is
     put to the child list of pci_dn of its upstream bridge or PHB. VF's
     pci_dn will be put to the child list of pci_dn of PF's bridge.

   * For one particular PCI device (VF or not), its pci_dn can be
     found from pdev->dev.archdata.pci_data, PCI_DN(devnode), or
     parent's list.  The fast path (fetching pci_dn through PCI device
     instance) is populated during early fixup time.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:49 +11:00
Hongtao Jia 9be53cf76f powerpc: Enable power monitor feature in defconfig for supported platforms
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:22 -05:00
Hongtao Jia 76486930f8 powerpc: Enable thermal monitor feature in defconfig for supported platforms
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:21 -05:00
Shruti Kanetkar 2e6e99666d powerpc/corenet: Enable muxing MDIO buses via FPGA
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Shruti Kanetkar <Kanetkar.Shruti@gmail.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:20 -05:00
Andy Fleming a189243cb7 powerpc/corenet: Enable muxing MDIO buses via GPIO
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Shruti Kanetkar <Kanetkar.Shruti@gmail.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:20 -05:00
Kumar Gala 1e8ed06d34 powerpc/mpc85xx: Add FSL QorIQ DPAA BMan support to device tree(s)
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Geoff Thorpe <Geoff.Thorpe@freescale.com>
Signed-off-by: Hai-Ying Wang <Haiying.Wang@freescale.com>
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
[Emil Medve: Sync with the upstream binding]
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:19 -05:00
Benjamin Herrenschmidt cb5915e71f powerpc: Make corenet64_defconfig a bit more useful
CONFIG_BLK_DEV_SD, SR, ... are needed for pretty much any SATA or USB
storage device (corenet32_defconfig has them) and modern any with
systemd needs the CGROUPS gunk.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:18 -05:00
Hongtao Jia ff015659b6 powerpc/85xx: workaround for chips with MSI hardware errata
The MPIC version 2.0 has a MSI errata (errata PIC1 of mpc8544), It causes
that neither MSI nor MSI-X can work fine. This is a workaround to allow
MSI-X to function properly.

Signed-off-by: Liu Shuo <soniccat.liu@gmail.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:18 -05:00
Hongtao Jia 807d38b73b powerpc/mpic: Add get_version API both for internal and external use
MPIC version is useful information for both mpic_alloc() and mpic_init().
The patch provide an API to get MPIC version for reusing the code.
Also, some other IP block may need MPIC version for their own use.
The API for external use is also provided.

This function had been previously added but was removed by commit
5e86bfde9c ("powerpc/mpic: remove unused functions") due to the
lack of a user.  This function will be used by "powerpc/mpic: Add
get_version API both for internal and external use".

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
[scottwood@freescale.com: changelog update]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:17 -05:00
Igal Liberman 7dea9ec5a0 powerpc/mpc85xx: Add FMan platform support
Get the FMan devices/sub-nodes (MAC, MDIO, etc.) auto-probed

Signed-off-by: Igal Liberman <Igal.Liberman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:16 -05:00
Emil Medve eedea67bc9 powerpc/dts: Remove B4860 emulator support
Probably we should have not upstreamed this in the first place

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-03-23 19:51:16 -05:00
Tyrel Datwyler 288a298c05 powerpc/pseries: Introduce api_version to migration sysfs interface
The /sys/kernel/mobility/migration interface was added all the way back
in 2.6.37. However, the drmgr userspace tool was never augmented to use
this interface to perfrom migrations. Instead it has continued using a
faux rtas call coupled with performing the device tree update processing
in userspace and communicating it back to the kernel via the ugly
/proc/ppc64/ofdt interface.

Up until 3.12 the device tree update code in the kernel was badly broken
and bit rotting. This code was fixed in 3.12 and is now utilized by the
kernel suspend code as of 3.15. The kernel is now better suited to
handle the post-mobility fixup of the device tree and drmgr should be
transitioned to using the sysfs migration interface.

This patch introduces the api_version sysfs file to /sys/kernel/mobility
as a means for drmgr to query the current implementation level of the
kernel migration code. This initial versioning indicates it is capable
of perfroming all current PAPR requirements for migration including the
post-mobility firmware activation and device tree update.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-24 11:23:11 +11:00
Mahesh Salgaonkar 44d5f6f590 powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.

This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.

Fixes: 2ba9f0d887 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org  # v3.14+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 17:10:47 +11:00
Scott Wood cc83458d3a powerpc/32: %pF is only for function pointers
Use %pS for actual addresses, otherwise you'll get bad output
on arches like ppc64 where %pF expects a function descriptor.  Even on
other architectures, refrain from setting a bad example that people
copy.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 15:14:48 +11:00
Fabian Frederick 4d9fb711e4 powerpc: use kbuild generic-y where possible
Replace one line asm-generic include files declared in
arch/powerpc/include/asm/ by generic-y declaration
which creates arch/powerpc/include/generated/asm equivalent.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 15:09:44 +11:00
Paul Mackerras f57333a767 powerpc/powernv: Fix return value from power7_nap() et al.
The power7_nap(), power7_sleep() and power7_winkle() functions are
called from pnv_smp_cpu_kill_self(), which expects them to return the
SRR1 value set by the hardware on wakeup, or 0 if no nap/sleep/winkle
occurred.  However, in the case where an interrupt needs to be
replayed, the logic in power7_powersave_common (the common code for
power7_nap et al.) doesn't set r3 to 0 in this case.  Instead what we
get as the return value is the selector for the type of power-saving
mode requested (1, 2 or 3).  In fact this should not affect the
operation of pnv_smp_cpu_kill_self(), but it is better to get this
correct, so this adds an instruction to set r3 to 0 in this case.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 15:06:50 +11:00
Geert Uytterhoeven b140e5b20e powerpc: Spelling s/embeeded/embedded/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:47:48 +11:00
Stephen Rothwell a71aa05e14 powerpc: Convert relocs_check to a shell script using grep
This runs a bit faster and removes another use of perl from
the kernel build.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-By: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:47:39 +11:00
David Gibson f571872671 powerpc: Move Power Macintosh drivers to generic byteswappers
ppc has special instruction forms to efficiently load and store values
in non-native endianness.  These can be accessed via the arch-specific
{ld,st}_le{16,32}() inlines in arch/powerpc/include/asm/swab.h.

However, gcc is perfectly capable of generating the byte-reversing
load/store instructions when using the normal, generic cpu_to_le*() and
le*_to_cpu() functions eaning the arch-specific functions don't have much
point.

Worse the "le" in the names of the arch specific functions is now
misleading, because they always generate byte-reversing forms, but some
ppc machines can now run a little-endian kernel.

To start getting rid of the arch-specific forms, this patch removes them
from all the old Power Macintosh drivers, replacing them with the
generic byteswappers.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-23 14:29:40 +11:00
Hari Bathini e4a9616c54 powerpc/rtas: Make timestamp related code y2038-safe
While we are here, let us make timestamp related code y2038-safe.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:06:11 +11:00
Hari Bathini f7618299b4 powerpc/powernv: Add pstore support on powernv
This patch extends pstore, a generic interface to platform dependent
persistent storage, support for powernv  platform to capture certain
useful information, during dying moments. Such support is already in
place for  pseries platform. This patch re-uses most of that code.

It is a common practice to compile kernels with both CONFIG_PPC_PSERIES=y
and CONFIG_PPC_POWERNV=y. The code in nvram_init_oops_partition() routine
still works as intended, as the caller is platform specific code which
passes the appropriate value for "rtas_partition_exists" parameter.
In all other places, where CONFIG_PPC_PSERIES or CONFIG_PPC_POWERNV
flag is used in this patchset, it is to reduce the kernel size in cases
where this flag is not set and doesn't have any impact logic wise.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:06:10 +11:00
Hari Bathini 78989f0a55 powerpc/nvram: Move generic code for nvram and pstore
With minor checks, we can move most of the code for nvram
under pseries to a common place to be re-used by other
powerpc platforms like powernv. This patch moves such
common code to arch/powerpc/kernel/nvram_64.c file.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Move select of ZLIB_DEFLATE to PPC64 to fix the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:05:49 +11:00
Nishanth Aravamudan 3af229f207 powerpc/numa: Reset node_possible_map to only node_online_map
Raghu noticed an issue with excessive memory allocation on power with a
simple cgroup test, specifically, in mem_cgroup_css_alloc ->
for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing
up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup
directories).

The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes
possible), which defines node_possible_map, which in turn defines the
value of nr_node_ids in setup_nr_node_ids and the iteration of
for_each_node.

In practice, we never see a system with 256 NUMA nodes, and in fact, we
do not support node hotplug on power in the first place, so the nodes
that are online when we come up are the nodes that will be present for
the lifetime of this kernel. So let's, at least, drop the NUMA possible
map down to the online map at runtime. This is similar to what x86 does
in its initialization routines.

mem_cgroup_css_alloc should also be fixed to only iterate over
memory-populated nodes and handle hotplug, but that is a separate
change.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 13:25:53 +11:00
Paul Mackerras 2bf27601c7 KVM: PPC: Book3S HV: Fix instruction emulation
Commit 4a157d61b4 ("KVM: PPC: Book3S HV: Fix endianness of
instruction obtained from HEIR register") had the side effect that
we no longer reset vcpu->arch.last_inst to -1 on guest exit in
the cases where the instruction is not fetched from the guest.
This means that if instruction emulation turns out to be required
in those cases, the host will emulate the wrong instruction, since
vcpu->arch.last_inst will contain the last instruction that was
emulated.

This fixes it by making sure that vcpu->arch.last_inst is reset
to -1 in those cases.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-20 11:42:33 +01:00
Paul Mackerras ecb6d6185b KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
The VPA (virtual processor area) is defined by PAPR and is therefore
big-endian, so we need a be32_to_cpu when reading it in
kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
little-endian host, causing SMP guests to waste time spinning on
spinlocks.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-20 11:42:08 +01:00
Paul Mackerras 8f902b005e KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()
Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
and inside that does mutex_lock(&kvm->lock).  It is not permitted to
take a mutex while holding a spinlock, because the mutex_lock might
call schedule().  In addition, this causes lockdep to warn about a
lock ordering issue:

======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
-------------------------------------------------------
qemu-system-ppc/8179 is trying to acquire lock:
 (&kvm->lock){+.+.+.}, at: [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]

but task is already holding lock:
 (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&vcore->lock)->rlock){+.+...}:
       [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
       [<d00000000ecc7a14>] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
       [<d00000000eb9f5cc>] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
       [<d00000000eb9cb24>] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
       [<d00000000eb94478>] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
       [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
       [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009264>] syscall_exit+0x0/0x98

-> #0 (&kvm->lock){+.+.+.}:
       [<c0000000000ff28c>] .lock_acquire+0xcc/0x1a0
       [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
       [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
       [<d00000000ecc510c>] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
       [<d00000000eb9f234>] .kvmppc_set_one_reg+0x44/0x330 [kvm]
       [<d00000000eb9c9dc>] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
       [<d00000000eb9ced4>] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
       [<d00000000eb940b0>] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
       [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
       [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009264>] syscall_exit+0x0/0x98

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&vcore->lock)->rlock);
                               lock(&kvm->lock);
                               lock(&(&vcore->lock)->rlock);
  lock(&kvm->lock);

 *** DEADLOCK ***

2 locks held by qemu-system-ppc/8179:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f18>] .vcpu_load+0x28/0x90 [kvm]
 #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

stack backtrace:
CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131
Call Trace:
[c000001a66c0f310] [c000000000b486ac] .dump_stack+0x88/0xb4 (unreliable)
[c000001a66c0f390] [c0000000000f8bec] .print_circular_bug+0x27c/0x3d0
[c000001a66c0f440] [c0000000000fe9e8] .__lock_acquire+0x2028/0x2190
[c000001a66c0f5d0] [c0000000000ff28c] .lock_acquire+0xcc/0x1a0
[c000001a66c0f6a0] [c000000000b3c120] .mutex_lock_nested+0x80/0x570
[c000001a66c0f7c0] [d00000000ecc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
[c000001a66c0f860] [d00000000ecc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
[c000001a66c0f8d0] [d00000000eb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
[c000001a66c0f960] [d00000000eb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
[c000001a66c0f9f0] [d00000000eb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
[c000001a66c0faf0] [d00000000eb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
[c000001a66c0fcb0] [c00000000026cbb4] .do_vfs_ioctl+0x444/0x770
[c000001a66c0fd90] [c00000000026cfa4] .SyS_ioctl+0xc4/0xe0
[c000001a66c0fe30] [c000000000009264] syscall_exit+0x0/0x98

This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
the spin-locked region.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-20 11:41:14 +01:00
Tyrel Datwyler f6ff041496 powerpc/pseries: Little endian fixes for post mobility device tree update
We currently use the device tree update code in the kernel after resuming
from a suspend operation to re-sync the kernels view of the device tree with
that of the hypervisor. The code as it stands is not endian safe as it relies
on parsing buffers returned by RTAS calls that thusly contains data in big
endian format.

This patch annotates variables and structure members with __be types as well
as performing necessary byte swaps to cpu endian for data that needs to be
parsed.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Cyril Bur <cyrilbur@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:53:01 +11:00
Benjamin Herrenschmidt ddee09c099 powerpc: Add PVR for POWER8NVL processor
There's a new variant of POWER8 coming called "POWER8 with NVLink". The
core is identical to POWER8 but unfortunately they strapped it with a
different PVR, so we need to add an explicit entry for it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:52:27 +11:00
Paul Mackerras 755563bc79 powerpc/powernv: Fixes for hypervisor doorbell handling
Since we can now use hypervisor doorbells for host IPIs, this makes
sure we clear the host IPI flag when taking a doorbell interrupt, and
clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
already do for IPIs sent via the XICS interrupt controller).  Otherwise
if there did happen to be a leftover pending doorbell interrupt for
an offline CPU thread for any reason, it would prevent that thread from
going into a power-saving mode; it would instead keep waking up because
of the interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 14:51:53 +11:00
Alex Dowad 6eca8933d3 powerpc/kernel: Rename copy_thread() 'arg' argument to 'kthread_arg'
The 'arg' argument to copy_thread() is only ever used when forking a new
kernel thread. Hence, rename it to 'kthread_arg' for clarity.

Signed-off-by: Alex Dowad <alexinbeijing@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-20 12:41:15 +11:00
Greg Kurz 3338a65bad powerpc/vphn: parsing code rewrite
The current VPHN parsing logic has some flaws that this patch aims to fix:

1) when the value 0xffff is read, the value 0xffffffff gets added to the
   the output list and its element count isn't incremented. This is wrong.
   According to PAPR+ the domain identifiers are packed into a sequence
   terminated by the "reserved value of all ones". This means that 0xffff
   is a stream terminator.

2) the combination of byteswaps and casts make the code hardly readable.
   Let's parse the stream one 16-bit field at a time instead.

3) it is assumed that the hypercall returns 12 32-bit values packed into
   6 64-bit registers. According to PAPR+, the domain identifiers may be
   streamed as 16-bit values. Let's increase the number of expected numbers
   to 24.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Greg Kurz 4b6cfb2a8c powerpc/vphn: move VPHN parsing logic to a separate file
The goal behind this patch is to be able to write userland tests for the
VPHN parsing code.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Greg Kurz b1fc9484aa powerpc/vphn: move endianness fixing to vphn_unpack_associativity()
The first argument to vphn_unpack_associativity() is a const long *, but the
parsing code expects __be64 values actually. Let's move the endian fixing
down for consistency.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Greg Kurz 9d0e6d9db5 powerpc/vphn: clarify the H_HOME_NODE_ASSOCIATIVITY API
The number of values returned by the H_HOME_NODE_ASSOCIATIVITY h_call deserves
to be explicitly defined, for a better understanding of the code.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Kevin Hao 52d9962700 powerpc: kill PPC_OF
We have set CONFIG_PPC_OF to always 'y' in commit 0a498d96a3
("powerpc: set CONFIG_PPC_OF=y always for ARCH=powerpc") nine years
ago. And the arch/ppc also has gone away for many years. The OF
functionality was also moved to a common place and be used by many
archs. So it does make no sense to keep such a option in the current
kernel. Just kill it.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-17 20:04:32 +11:00
Rickard Strandqvist a02c0af2f0 powerpc/powermac: Cleaning up missing null-terminate in conjunction with strncpy
Replacing strncpy with strlcpy to avoid strings that lacks null terminate.
And removed unnecessary magic numbers.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 15:55:52 +11:00
Kyle Moffett b05ae4ee60 powerpc: Remove duplicate cacheable_memcpy/memzero functions
These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:25:50 +11:00
Nathan Fontenot 51925fb3c5 powerpc/pseries: Implement memory hotplug remove in the kernel
This patch adds the ability to do memory hotplug remove in the kernel.

Currently the operation to hotplug remove memory is handled by the drmgr
command which performs the operation by performing some work in user-space
and making requests to the kernel to handle other pieces. By moving all
of the work to the kernel we can do the remove faster, and provide a common
code path to do memory hotplug for both the PowerVM and PowerKVM environments.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:03:03 +11:00
Nathan Fontenot 5f97b2a0d1 powerpc/pseries: Implement memory hotplug add in the kernel
This patch adds the ability to do memory hotplug add in the kernel.

Currently the operation to hotplug add memory is handled by the drmgr
command which performs the operation by performing some work in user-space
and making requests to the kernel to handle other pieces. By moving all
of the work to the kernel we can do the add faster, and provide a common
code path to do memory hotplug for both the PowerVM and PowerKVM environments.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:03:01 +11:00
Nathan Fontenot 999e2dadb6 powerpc/pseries: Create new device hotplug entry point
The current hotplug (or dlpar) of devices (the process is generally the
same for memory, cpu, and pci) on PowerVM systems is initiated
from the HMC, which communicates the request to the partitions through
the RSCT framework. The RSCT framework then invokes the drmgr command.
The drmgr command performs the hotplug operation by doing some pieces,
such as most of the rtas calls and device tree parsing, in userspace
and make requests to the kernel to online/offline the device, update the
device tree and add/remove the device.

For PowerKVM the approach for device hotplug is to follow what is currently
being done for pci hotplug. A hotplug request is initiated from the host.
QEMU then generates an EPOW interrupt to the guest which causes the guest
to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
returns a rtas hotplug event to the guest.

Please note that the current pci hotplug path for PowerKVM involves the
kernel receiving the rtas hotplug event, passing it to rtas_errd in
userspace, and having rtas_errd invoke drmgr. The drmgr command then
handles the request as described above for PowerVM systems.

There is no need for this circuitous route, we should just handle the entire
hotplug of devices in the kernel. What I am planning is to enable this
by moving the code to handle hotplug from drmgr into the kernel to
provide a single path for handling device hotplug for both PowerVM and
PowerKVM systems. This patch provides the common iframework and entry point.
For PowerKVM a future update to the kernel rtas code will recognize rtas
hotplug events returned from rtas,check-exception calls and use the common
entry point to handle hotplug of the device.

For PowerVM systems, This patch creates /sys/kernel/dlpar that can be
used by the drmgr command to initiate hotplug requests. In order to do
this a string of the format "<resource> <action> <id_type> <id>" is
written to this file. The string consists of a resource (cpu, memory, pci,
phb), an action (add or remove), an id_type (count, drc index, drc name),
and the corresponding id. The kernel will parse the string and create a
rtas hotplug section that can be passed to the common entry point for
handling hotplug requests.

It should be noted that there is no chance of updating how we receive
hotplug (dlpar) requests from the HMC on PowerVM systems.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:02:58 +11:00
Nathan Fontenot 5e51d3c2a4 powerpc/pseries: Declare the acquire/release drc index routines
Add declarations for dlpar_{acquire,release}_drc(...)

They are already marked non-static but were missing a prototype/

[BenH: Added extern to be consistent with the rest of the file]

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:01:48 +11:00
Nathan Fontenot 366d395c8d powerpc/pseries: Define rtas hotplug event sections
In order to handle device hotplug in the kernel on pseries the hotplug
request will be communicated in the kernel in the form of a
rtas hotplug event. This patch adds the definition of rtas hotplug event
sections.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:57:44 +11:00
Gavin Shan 2f6cf79448 powerpc/powernv: Remove unused file
The patch removes unused file eeh-ioda.c and updates makefile
accordingly. Besides, the definition of "struct pnv_eeh_ops" and
the instances are all removed. Until now, the chip layer of EEH
implementation for PowerNV platform is removed completely.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:20 +11:00
Gavin Shan cadf364d14 powerpc/powernv: Drop PHB operation reset()
The patch drops PHB EEH operation reset() and merges its logic to
eeh_ops::reset().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan 2a485ad7c8 powerpc/powernv: Drop PHB operation next_error()
The patch drops PHB EEH operation next_error() and merges its
logic to eeh_ops::next_error().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan 40ae5f693f powerpc/powernv: Drop PHB operation get_state()
The patch drops PHB EEH operation get_state() and merges its logic
to eeh_ops::get_state().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan 7e3e4f8d5e powerpc/powernv: Drop PHB operation set_option()
The patch drops PHB EEH operation set_option() and merges its
logic to eeh_ops::set_option().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan bbe170ede1 powerpc/powernv: Drop PHB operation configure_bridge()
The patch drops PHB EEH operation configure_bridge() and merges
its logic to eeh_ops::configure_bridge().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan 95edcdeadf powerpc/powernv: Drop PHB operation get_log()
The patch drops PHB operation get_log() and merges its logic to
eeh_ops::get_log().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan 4cf1744558 powerpc/powernv: Drop PHB operation post_init()
The patch drops PHB EEH operation post_init() and merge its logic
to eeh_ops::post_init().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan fa646c3cab powerpc/powernv: Drop PHB operation err_inject()
The patch drops PHB EEH operation err_inject() and merge its logic
to eeh_ops::err_inject().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan 01f3bfb780 powerpc/powernv: Shorten EEH function names
The patch shortens names of EEH functions in powernv-eeh.c and no
logic change introduced by this patch.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan 6ec7334304 powerpc/pci: Fix comments about ppc_md.pcibios_fixup
The patch fixes the comments about ppc_md.pcibios_fixup(), which
should be called after allocating resources.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan 28158cd1b7 powerpc/eeh: Enhance pcibios_set_pcie_reset_state()
Function pcibios_set_pcie_reset_state() is possibly called by
pci_reset_function(), on which VFIO infrastructure depends to
issue reset. pcibios_set_pcie_reset_state() is issuing reset
on the parent PE of the indicated PCI device. The reset causes
state lost on all PCI devices except the indicated one as the
argument to pcibios_set_pcie_reset_state(). Also, sideband
MMIO access from guest when issuing reset would cause unexpected
EEH error.

For above two issues, the patch applies following enhancements
to pcibios_set_pcie_reset_state():

   * For all PCI devices except the indicated one, save their
     state prior to reset and restore state after that.
   * Explicitly freeze PE prior to reset and unfreeze it after
     that, in order to avoid unexpected EEH error.

Tested-by: Priya M. A <priyama2@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Mahesh Salgaonkar 45706bb53d powerpc/book3s: Fix flush_tlb cpu_spec hook to take a generic argument.
The flush_tlb hook in cpu_spec was introduced as a generic function hook
to invalidate TLBs. But the current implementation of flush_tlb hook
takes IS (invalidation selector) as an argument which is architecture
dependent. Hence, It is not right to have a generic routine where caller
has to pass non-generic argument.

This patch fixes this and makes flush_tlb hook as high level API.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-17 07:52:48 +11:00
Jeremy Kerr 7f664cf9e4 powerpc/boot: don't clobber r6 and r7 in epapr boot
We use r6 and r7 for epapr boot, but the current pre-C init will clobber
both of these.

This change does a simple replacement, of r6 -> r12 and r7 -> r13, so
that we hit platform init with these registers intact.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:58:35 +11:00
Jeremy Kerr 8c06f0d910 powerpc/boot: Fix stack corruption in epapr entry point
Currently, a 64-bit little-endian zImage.epapr won't boot in epapr mode,
as we never return from platform_init.

Before entering C, we initialise our stack by setting r1 16 bytes below
the end of the _bss_stack:

  stwu	r0,-16(r1)	/* establish a stack frame */

However, the called function will save the caller's lr in the caller's
frame's lr save area, at -16(r1) to -32(r1).

This means that writes to the fdt variable will corrupt the saved link
register:

 0000000020c06018 l     O .bss   0000000000001000 _bss_stack
 0000000020c07018 l     O .bss   0000000000000008 fdt

We'll need at least 32 bytes in the initial stack frame, to handle the
LR save area. We bump this to 112 bytes, as that'll be the max required
by ABIv1.

Thanks to Alistair Popple for debugging help.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:58:34 +11:00
Jeremy Kerr 90d1d44e0d powerpc/boot/wrapper: use the pseries wrapper for zImage.epapr
We'll likely be entering the zImage.epapr as BE, so include the pseries
implementation of _zimage_start, which adds the endian fixup magic.

Although the endian fixup won't work on Book III-E machines starting LE,
the current entry point doesn't support LE anyway, so we shouldn't be
breaking anything.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:58:32 +11:00
Jeremy Kerr 6c87b2202f powerpc/boot/fdt: Add little-endian support to libfdt wrappers
For epapr-style boot, we may be little-endian. This change implements
the proper conversion for fdt*_to_cpu and cpu_to_fdt*. We also need the
full cpu_to_* and *_to_cpu macros for this.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:58:31 +11:00
Jeremy Kerr 1680e4ba3d powerpc/boot/fdt: Use unsigned long for pointer casts
Now that the wrapper supports 64-bit builds, we see warnings when
attempting to cast pointers to int. Use unsigned long instead.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:58:30 +11:00
Arseny Solokha 5e86bfde9c powerpc/mpic: remove unused functions
Drop unused fsl_mpic_primary_get_version(), mpic_set_clk_ratio(),
mpic_set_serial_int().

  + fsl_mpic_primary_get_version() is just a safe wrapper around
fsl_mpic_get_version() for SMP configurations. While the latter is
called explicitly for handling PIC initialization and setting up error
interrupt vector depending on PIC hardware version, the former isn't
used for anything.

  + As for mpic_set_clk_ratio() and mpic_set_serial_int(), they both are
almost nine years old[1] but still have no chance to be called even from
out-of-tree modules because they both are __init and of course aren't
exported.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2006-June/023867.html

Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Cc: hongtao.jia@freescale.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:18 +11:00
Arseny Solokha f98e7f2fe9 powerpc/qe: drop unused ucc_slow_poll_transmitter_now
Drop ucc_slow_poll_transmitter_now() which has no users since its
inception in 2007 in commit 9865853851 ("[POWERPC] Add QUICC
Engine (QE) infrastructure").

Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:17 +11:00
Arseny Solokha 170acae4c9 powerpc/boot: drop planetcore_set_serial_speed
Drop planetcore_set_serial_speed() which had no users since its
inception in commit fec6047047 ("[POWERPC] bootwrapper: Add PlanetCore
firmware support") in 2007.

Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:17 +11:00
Michael Ellerman b887f9e324 powerpc/powernv: Remove unused definitions in opal-api.h
This removes definitions in opal-api.h that are completely unused in
Linux.

For each of these I see three possibilities, 1) we *should* be using
them in Linux and patches will arrive to do that, 2) they are not used
but should stay in the header to document the API for some important
reason, 3) they are not used and needn't be part of the API.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2015-03-16 18:50:16 +11:00
Michael Ellerman d7cf83fcaf powerpc/powernv: Move opal-api.h closer to the Skiboot version
This commit gets opal-api.h to mostly match the version in Skiboot as of
commit ea7d806ab0ba.

The exceptions are things which are not (currently) used in Linux.

Most of this is just whitespace and a few things moving around. I think
the diff is readable.

Also OpalMessageType became opal_msg_type, requiring a change in the
Linux code.

Finally Skiboot and Linux disagree on CAPI vs CXL, because CAPI means
something else in Linux. To handle that we just point the Linux wrapper,
which is named "cxl" to the OPAL token OPAL_PCI_SET_PHB_CAPI_MODE.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2015-03-16 18:50:16 +11:00
Michael Ellerman d800ba1218 powerpc/powernv: Move OPAL API definitions to opal-api.h
We'd like to get to the stage where the OPAL API is defined in a header
that is identical between Linux and Skiboot.

As step one, split the bits that actually define the API into
opal-api.h. The Linux specific parts stay in opal.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2015-03-16 18:50:15 +11:00
Michal Marek 755457f992 powerpc/boot: Makefile cleanup
The $(image-n) variable will never exist, because unset Kconfig options
are '' and not 'n'.

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:15 +11:00
Markus Elfring 7f4eec3953 powerpc: Delete unnecessary checks before kfree()
The kfree() function tests whether its argument is NULL and then returns
immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:14 +11:00
Stewart Smith 7e73a3b7f3 powerpc/powernv: only call OPAL_RESEND_DUMP if firmware supports it
Not all OPAL platforms support resending system dumps, so check
that current firmware supports it first. Otherwise we get firmware
complaining:
"OPAL: Called with bad token 91 !"

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:14 +11:00
Stewart Smith fc81de6310 powerpc/powernv: only call OPAL_ELOG_RESEND if firmware supports it
Otherwise firmware complains: "OPAL: Called with bad token 74 !"
as not all OPAL systems have the ability to resend error logs.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:14 +11:00
Stewart Smith b962f5a446 powerpc/powernv: only register log if OPAL supports doing so
Correct use of REGISTER/UNREGISTER is to check if the token exists
before calling. If we don't we get a "OPAL: Called with bad token 101 !"
error, which is harmless but may be alarming to some.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:50:13 +11:00
Anton Blanchard df99e6eb3f powerpc: Change vsrX register defines to vsX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VSX register definitions - the kernel uses vsrX whereas gcc uses
vsX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
Anton Blanchard c2ce6f9f3d powerpc: Change vrX register defines to vX to match gcc and glibc
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-16 18:32:11 +11:00
David S. Miller 3cef5c5b0b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09 23:38:02 -04:00
Rusty Russell 87313df7b4 powerpc: fix deprecated CPU_MASK_CPU0 usage.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-03-10 13:48:35 +10:30
Frederic Weisbecker c5ae732a44 ppc: Remove unused cpp symbols in kvm headers
These don't seem to be used anywhere.

Acked-by: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will deacon <will.deacon@arm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-03-09 16:40:48 +02:00
Kim Phillips 3265c4babe crypto: powerpc - move files to fix build error
The current cryptodev-2.6 tree commits:

d9850fc529 ("crypto: powerpc/sha1 - kernel config")
50ba29aaa7 ("crypto: powerpc/sha1 - glue")

failed to properly place files under arch/powerpc/crypto, which
leads to build errors:

make[1]: *** No rule to make target 'arch/powerpc/crypto/sha1-spe-asm.o', needed by 'arch/powerpc/crypto/sha1-ppc-spe.o'.  Stop.
make[1]: *** No rule to make target 'arch/powerpc/crypto/sha1_spe_glue.o', needed by 'arch/powerpc/crypto/sha1-ppc-spe.o'.  Stop.
Makefile:947: recipe for target 'arch/powerpc/crypto' failed

Move the two sha1 spe files under crypto/, and whilst there, rename
other powerpc crypto files with underscores to use dashes for
consistency.

Cc: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-09 21:06:19 +11:00
Yannick Guerrini c1f4775ab5 powerpc: Fix comment in smu.h
Change 'Kenrel' to 'Kernel'

Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-06 23:21:07 +01:00
Masanari Iida d939be3add treewide: Fix typo in printk messages
This patch fix spelling typo in printk messages.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-06 23:05:39 +01:00
Masanari Iida f42cf8d6a3 treewide: Fix typo in printk messages
This patch fix spelling typo in printk messages.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-03-06 23:04:40 +01:00
Markus Stockhausen e8e5995372 crypto: powerpc/md5 - kernel config
Integrate the module into the kernel config tree.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-04 22:13:26 +13:00
Markus Stockhausen e90508d3b0 crypto: powerpc/md5 - glue
Glue code for crypto infrastructure. Call the assembler
code where required. Take a little care about small input
data. Kick out early for input chunks < 64 bytes and replace
memset for context cleanup with simple loop.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-04 22:12:41 +13:00
Markus Stockhausen 209232d025 crypto: powerpc/md5 - assembler
This is the assembler code for the MD5 implementation.
Handling of algorithm constants has been slightly
changed to reduce register usage and make better use
of cores with multiple ALUs. Thus they are stored as
delta values.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-04 22:12:40 +13:00
Nishanth Aravamudan 4ad04e5987 powerpc/iommu: Remove IOMMU device references via bus notifier
After d905c5df9a ("PPC: POWERNV: move iommu_add_device earlier"), the
refcnt on the kobject backing the IOMMU group for a PCI device is
elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
set_iommu_table_base_and_group). When we go to dlpar a multi-function
PCI device out:

        iommu_reconfig_notifier ->
                iommu_free_table ->
                        iommu_group_put
                        BUG_ON(tbl->it_group)

We trip this BUG_ON, because there are still references on the table, so
it is not freed. Fix this by moving the powernv bus notifier to common
code and calling it for both powernv and pseries.

Fixes: d905c5df9a ("PPC: POWERNV: move iommu_add_device earlier")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-04 13:19:33 +11:00
Michael Ellerman 875ebe940d powerpc/smp: Wait until secondaries are active & online
Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

  BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

  CPU: 0
  Comm: watchdog/130

The problem is that we aren't ensuring the CPU active bit is set for the
secondary before allowing the master to continue on. The master unparks
the secondary CPU's kthreads and the scheduler looks for a CPU to run
on. It calls select_task_rq() and realises the suggested CPU is not in
the cpus_allowed mask. It then ends up in select_fallback_rq(), and
since the active bit isnt't set we choose some other CPU to run on.

This seems to have been introduced by 6acbfb9697 "sched: Fix hotplug
vs. set_cpus_allowed_ptr()", which changed from setting active before
online to setting active after online. However that was in turn fixing a
bug where other code assumed an active CPU was also online, so we can't
just revert that fix.

The simplest fix is just to spin waiting for both active & online to be
set. We already have a barrier prior to set_cpu_online() (which also
sets active), to ensure all other setup is completed before online &
active are set.

Fixes: 6acbfb9697 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-04 13:19:33 +11:00
David S. Miller 71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Markus Stockhausen d9850fc529 crypto: powerpc/sha1 - kernel config
Integrate the module into the kernel config tree.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-02 23:22:20 +13:00
Markus Stockhausen 50ba29aaa7 crypto: powerpc/sha1 - glue
Glue code for crypto infrastructure. Call the assembler
code where required. Disable preemption during calculation
and enable SPE instructions in the kernel prior to the
call. Avoid to disable preemption for too long.

Take a little care about small input data. Kick out early
for input chunks < 64 bytes and replace memset for context
cleanup with simple loop.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-02 23:22:20 +13:00
Markus Stockhausen 20f1b1f1f3 crypto: powerpc/sha1 - assembler
This is the assembler code for SHA1 implementation with
the SIMD SPE instruction set. With the enhanced instruction
set we can operate on 2 32 bit words in parallel. That helps
reducing the time to calculate W16-W79. For increasing
performance even more the assembler function can compute
hashes for more than one 64 byte input block.

The state of the used SPE registers is preserved via the
stack so we can run from interrupt context. There might
be the case that we interrupt ourselves and push sensitive
data from another context onto our stack. Clear this area
in the stack afterwards to avoid information leakage.

The code is endian independant.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-02 23:22:19 +13:00
Markus Stockhausen 504c6143c5 crypto: powerpc/aes - kernel config
Integrate the module into the kernel configuration

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:29 +13:00
Markus Stockhausen 8a28a1a894 cyprot: powerpc/aes - glue code
Integrate the assembler modules into the kernel crypto
framework. Take care to avoid long intervals of disabled
preemption.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:29 +13:00
Markus Stockhausen f2e2ad2e1b crypto: powerpc/aes - ECB/CBC/CTR/XTS modes
The assembler block cipher module that controls the core
AES functions.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:28 +13:00
Markus Stockhausen f98992af41 crypto: powerpc/aes - key handling
Key generation for big endian core routines.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:28 +13:00
Markus Stockhausen 1c201e6420 crypto: powerpc/aes - assembler core
The assembler AES encryption and decryption core routines.
Implemented & optimized for big endian. Nevertheless they
work on little endian too.

For most efficient reuse in (higher level) block cipher
routines they are implemented as "fast" call modules without
any stack handling or register saving. The caller must
take care of that part.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:28 +13:00
Markus Stockhausen 0c5f9aea2e crypto: powerpc/aes - aes tables
4K AES tables for big endian. To reduce the possiblity of
timing attacks, the size has been cut to 8KB + 256 bytes
in contrast to 16KB in the generic implementation. That
is not perfect but at least a good tradeoff for CPU limited
router devices.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:27 +13:00
Markus Stockhausen 74f2dc2041 crypto: powerpc/aes - register defines
Define some register aliases for better readability.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-03-01 23:02:27 +13:00
Markus Stockhausen 2ecc1e95ec crypto: ppc/sha256 - kernel config
Integrate the module into the kernel config tree.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-02-27 22:48:47 +13:00
Markus Stockhausen c147028ccc crypto: ppc/sha256 - glue
Glue code for crypto infrastructure. Call the assembler
code where required. Disable preemption during calculation
and enable SPE instructions in the kernel prior to the
call. Avoid to disable preemption for too long.

Take a little care about small input data. Kick out early
for input chunks < 64 bytes and replace memset for context
cleanup with simple loop.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-02-27 22:48:46 +13:00
Markus Stockhausen 6bb71004aa crypto: ppc/sha256 - assembler
This is the assembler code for SHA256 implementation with
the SIMD SPE instruction set. Although being only a 32 bit
architecture GPRs are extended to 64 bit presenting two
32 bit values. With the enhanced instruction set we can
operate on them in parallel. That helps reducing the time
to calculate W16-W64. For increasing performance even more
the assembler function can compute hashes for more than
one 64 byte input block. That saves a lot of register
saving/restoring

The state of the used SPE registers is preserved via the
stack so we can run from interrupt context. There might
be the case that we interrupt ourselves and push sensitive
data from another context onto our stack. Clear this area
in the stack afterwards to avoid information leakage.

The code is endian independant.

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-02-27 22:48:45 +13:00
Ingo Molnar e9e4e44309 Linux 34.0-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
 G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
 do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
 KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
 r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
 =vXB6
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc1' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:24:50 +01:00
Paul Clarke fea559f303 powerpc: Re-enable dynticks
Implement arch_irq_work_has_interrupt() for powerpc

Commit 9b01f5bf3 introduced a dependency on "IRQ work self-IPIs" for
full dynamic ticks to be enabled, by expecting architectures to
implement a suitable arch_irq_work_has_interrupt() routine.

Several arches have implemented this routine, including x86 (3010279f)
and arm (09f6edd4), but powerpc was omitted.

This patch implements this routine for powerpc.

The symptom, at boot (on powerpc systems) with "nohz_full=<CPU list>"
is displayed:

     NO_HZ: Can't run full dynticks because arch doesn't support irq work self-IPIs

after this patch:

     NO_HZ: Full dynticks CPUs: <CPU list>.

Tested against 3.19.

powerpc implements "IRQ work self-IPIs" by setting the decrementer to 1 in
arch_irq_work_raise(), which causes a decrementer exception on the next
timebase tick. We then handle the work in __timer_interrupt().

CC: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[mpe: Flesh out change log, fix ws & include guards, remove include of processor.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-23 14:52:04 +11:00
Linus Torvalds 18a8d49973 The clock framework changes for 3.20 contain the usual driver additions,
enhancements and fixes mostly for ARM32, ARM64, MIPS and Power-based
 devices. Additionaly the framework core underwent a bit of surgery with
 two major changes. The boundary between the clock core and clock
 providers (e.g clock drivers) is now more well defined with dedicated
 provider helper functions. struct clk no longer maps 1:1 with the
 hardware clock but is a true per-user cookie which helps us tracker
 users of hardware clocks and debug bad behavior. The second major change
 is the addition of rate constraints for clocks. Rate ranges are now
 supported which are analogous to the voltage ranges in the regulator
 framework. Unfortunately these changes to the core created some
 breakeage. We think we fixed it all up but for this reason there are
 lots of last minute commits trying to undo the damage.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU54D5AAoJEDqPOy9afJhJs6AQAK5YuUwjDchdpNZx9p7OnT1q
 +poehuUwE/gYjmdACqYFyaPrI/9f43iNCfFAgKGLQqmB5ZK4sm4ktzfBEhjWINR2
 iiCx9QYMQVGiKwC8KU0ddeBciglE2b/DwxB45m9TsJEjowucUeBzwLEIj5DsGxf7
 teXRoOWgXdz1MkQJ4pnA09Q3qEPQgmu8prhMfka/v75/yn7nb9VWiJ6seR2GqTKY
 sIKL9WbKjN4AzctggdqHnMSIqZoq6vew850bv2C1fPn7GiYFQfWW+jvMlVY40dp8
 nNa2ixSQSIXVw4fCtZhTIZcIvZ8puc7WVLcl8fz3mUe3VJn1VaGs0E+Yd3GexpIV
 7bwkTOIdS8gSRlsUaIPiMnUob5TUMmMqjF4KIh/AhP4dYrmVbU7Ie8ccvSxe31Ku
 lK7ww6BFv3KweTnW/58856ZXDlXLC6x3KT+Fw58L23VhPToFgYOdTxn8AVtE/LKP
 YR3UnY9BqFx6WHXVoNvg3Piyej7RH8fYmE9om8tyWc/Ab8Eo501SHs9l3b2J8snf
 w/5STd2CYxyKf1/9JLGnBvGo754O9NvdzBttRlygB14gCCtS/SDk/ELG2Ae+/a9P
 YgRk2+257h8PMD3qlp94dLidEZN4kYxP/J6oj0t1/TIkERWfZjzkg5tKn3/hEcU9
 qM97ZBTplTm6FM+Dt/Vk
 =zCVK
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus-3.20' of git://git.linaro.org/people/mike.turquette/linux

Pull clock framework updates from Mike Turquette:
 "The clock framework changes contain the usual driver additions,
  enhancements and fixes mostly for ARM32, ARM64, MIPS and Power-based
  devices.

  Additionally the framework core underwent a bit of surgery with two
  major changes:

   - The boundary between the clock core and clock providers (e.g clock
     drivers) is now more well defined with dedicated provider helper
     functions.  struct clk no longer maps 1:1 with the hardware clock
     but is a true per-user cookie which helps us tracker users of
     hardware clocks and debug bad behavior.

   - The addition of rate constraints for clocks.  Rate ranges are now
     supported which are analogous to the voltage ranges in the
     regulator framework.

  Unfortunately these changes to the core created some breakeage.  We
  think we fixed it all up but for this reason there are lots of last
  minute commits trying to undo the damage"

* tag 'clk-for-linus-3.20' of git://git.linaro.org/people/mike.turquette/linux: (113 commits)
  clk: Only recalculate the rate if needed
  Revert "clk: mxs: Fix invalid 32-bit access to frac registers"
  clk: qoriq: Add support for the platform PLL
  powerpc/corenet: Enable CLK_QORIQ
  clk: Replace explicit clk assignment with __clk_hw_set_clk
  clk: Add __clk_hw_set_clk helper function
  clk: Don't dereference parent clock if is NULL
  MIPS: Alchemy: Remove bogus args from alchemy_clk_fgcs_detr
  clkdev: Always allocate a struct clk and call __clk_get() w/ CCF
  clk: shmobile: div6: Avoid division by zero in .round_rate()
  clk: mxs: Fix invalid 32-bit access to frac registers
  clk: omap: compile legacy omap3 clocks conditionally
  clkdev: Export clk_register_clkdev
  clk: Add rate constraints to clocks
  clk: remove clk-private.h
  pci: xgene: do not use clk-private.h
  arm: omap2+ remove dead clock code
  clk: Make clk API return per-user struct clk instances
  clk: tegra: Define PLLD_DSI and remove dsia(b)_mux
  clk: tegra: Add support for the Tegra132 CAR IP block
  ...
2015-02-21 12:30:30 -08:00
Denis Kirjanov eb84bab0fb ppc: Kconfig: Enable BPF JIT on ppc32
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 15:19:43 -05:00
Denis Kirjanov 022909482d ppc: bpf: Add SKF_AD_CPU for ppc32
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 15:19:43 -05:00
Denis Kirjanov 2ddadeab07 ppc: bpf: rename bpf_jit_64.S to bpf_jit_asm.S
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 15:19:43 -05:00
Denis Kirjanov 09ca5ab23e ppc: bpf: update jit to use compatibility macros
Use helpers from the asm-compat.h to wrap up assembly mnemonics

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 15:19:43 -05:00
Denis Kirjanov 693930d69c ppc: bpf: add reqired opcodes for ppc32
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 15:19:43 -05:00
Denis Kirjanov fb7fc08e57 ppc: bpf: add required compatibility macros for jit
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-20 15:19:43 -05:00
Linus Torvalds 27a22ee4c7 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - several cleanups in kbuild

 - serialize multiple *config targets so that 'make defconfig kvmconfig'
   works

 - The cc-ifversion macro got support for an else-branch

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild,gcov: simplify kernel/gcov/Makefile more
  kbuild: allow cc-ifversion to have the argument for false condition
  kbuild,gcov: simplify kernel/gcov/Makefile
  kbuild,gcov: remove unnecessary workaround
  kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion
  kbuild: fix cc-ifversion macro
  kbuild: drop $(version_h) from MRPROPER_FILES
  kbuild: use mixed-targets when two or more config targets are given
  kbuild: remove redundant line from bounds.h/asm-offsets.h
  kbuild: merge bounds.h and asm-offsets.h rules
  kbuild: Drop support for clean-rule
2015-02-19 10:07:08 -08:00
Emil Medve 8f0ab1e141 powerpc/corenet: Enable CLK_QORIQ
Change-Id: I1a80ad7b9f6854791bd270b746f93a91439155a6
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Acked-by: Tang Yuantian <Yuantian.Tang@freescale.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
2015-02-18 09:56:10 -08:00
Peter Zijlstra acba3c7e46 perf, powerpc: Fix up flush_branch_stack() users
The recent LBR rework for x86 left a stray flush_branch_stack() user in
the PowerPC code, fix that up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:24:57 +01:00
Geoff Levand b28c2ee868 kexec: add IND_FLAGS macro
Add a new kexec preprocessor macro IND_FLAGS, which is the bitwise OR of
all the possible kexec IND_ kimage_entry indirection flags.  Having this
macro allows for simplified code in the prosessing of the kexec
kimage_entry items.  Also, remove the local powerpc definition and use the
generic one.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17 14:34:51 -08:00
Kirill A. Shutemov 780fc5642f powerpc: drop _PAGE_FILE and pte_file()-related helpers
We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00
Linus Torvalds 4ba63072b9 Char / Misc patches for 3.20-rc1
Here's the big char/misc driver update for 3.20-rc1.
 
 Lots of little things in here, all described in the changelog.  Nothing
 major or unusual, except maybe the binder selinux stuff, which was all
 acked by the proper selinux people and they thought it best to come
 through this tree.
 
 All of this has been in linux-next with no reported issues for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlTgs80ACgkQMUfUDdst+yn86gCeMLbxANGExVLd+PR46GNsAUQb
 SJ4AmgIqrkIz+5LCwZWM02ldbYhPeBVf
 =lfmM
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc patches from Greg KH:
 "Here's the big char/misc driver update for 3.20-rc1.

  Lots of little things in here, all described in the changelog.
  Nothing major or unusual, except maybe the binder selinux stuff, which
  was all acked by the proper selinux people and they thought it best to
  come through this tree.

  All of this has been in linux-next with no reported issues for a while"

* tag 'char-misc-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
  coresight: fix function etm_writel_cp14() parameter order
  coresight-etm: remove check for unknown Kconfig macro
  coresight: fixing CPU hwid lookup in device tree
  coresight: remove the unnecessary function coresight_is_bit_set()
  coresight: fix the debug AMBA bus name
  coresight: remove the extra spaces
  coresight: fix the link between orphan connection and newly added device
  coresight: remove the unnecessary replicator property
  coresight: fix the replicator subtype value
  pdfdocs: Fix 'make pdfdocs' failure for 'uio-howto.tmpl'
  mcb: Fix error path of mcb_pci_probe
  virtio/console: verify device has config space
  ti-st: clean up data types (fix harmless memory corruption)
  mei: me: release hw from reset only during the reset flow
  mei: mask interrupt set bit on clean reset bit
  extcon: max77693: Constify struct regmap_config
  extcon: adc-jack: Release IIO channel on driver remove
  extcon: Remove duplicated include from extcon-class.c
  Drivers: hv: vmbus: hv_process_timer_expiration() can be static
  Drivers: hv: vmbus: serialize Offer and Rescind offer
  ...
2015-02-15 10:48:44 -08:00
Linus Torvalds c833e17e27 Tighten rules for ACCESS_ONCE
This series tightens the rules for ACCESS_ONCE to only work
 on scalar types. It also contains the necessary fixups as
 indicated by build bots of linux-next.
 Now everything is in place to prevent new non-scalar users
 of ACCESS_ONCE and we can continue to convert code to
 READ_ONCE/WRITE_ONCE.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJU2H5MAAoJEBF7vIC1phx8Jm4QALPqKOMDSUBCrqJFWJeujtv2
 ILxJKsnjrAlt3dxnlVI3q6e5wi896hSce75PcvZ/vs/K3GdgMxOjrakBJGTJ2Qjg
 5njW9aGJDDr/SYFX33MLWfqy222TLtpxgSz379UgXjEzB0ymMWbJJ3FnGjVqQJdp
 RXDutpncRySc/rGHh9UPREIRR5GvimONsWE2zxgXjUzB8vIr2fCGvHTXfIb6RKbQ
 yaFoihzn0m+eisc5Gy4tQ1qhhnaYyWEGrINjHTjMFTQOWTlH80BZAyQeLdbyj2K5
 qloBPS/VhBTr/5TxV5onM+nVhu0LiblVNrdMHVeb7jyST4LeFOCaWK98lB3axSB5
 v/2D1YKNb3g1U1x3In/oNGQvs36zGiO1uEdMF1l8ZFXgCvHmATSFSTWBtqUhb5Ew
 JA3YyqMTG6dpRTMSnmu3/frr4wDqnxlB/ktQC1pf3tDp87mr1ZYEy/dQld+tltjh
 9Z5GSdrw0nf91wNI3DJf+26ZDdz5B+EpDnPnOKG8anI1lc/mQneI21/K/xUteFXw
 UZ1XGPLV2vbv9/a13u44SdjenHvQs1egsGeebMxVPoj6WmDLVmcIqinyS6NawYzn
 IlDGy/b3bSnXWMBP0ZVBX94KWLxqDDc4a/ayxsmxsP1tPZ+jDXjVDa7E3zskcHxG
 Uj5ULCPyU087t8Sl76mv
 =Dj70
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux

Pull ACCESS_ONCE() rule tightening from Christian Borntraeger:
 "Tighten rules for ACCESS_ONCE

  This series tightens the rules for ACCESS_ONCE to only work on scalar
  types.  It also contains the necessary fixups as indicated by build
  bots of linux-next.  Now everything is in place to prevent new
  non-scalar users of ACCESS_ONCE and we can continue to convert code to
  READ_ONCE/WRITE_ONCE"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
  kernel: Fix sparse warning for ACCESS_ONCE
  next: sh: Fix compile error
  kernel: tighten rules for ACCESS ONCE
  mm/gup: Replace ACCESS_ONCE with READ_ONCE
  x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE
  x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCE
  ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE
  ppc/kvm: Replace ACCESS_ONCE with READ_ONCE
2015-02-14 10:54:28 -08:00
Tejun Heo 0c118b7bd0 powerpc: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

* Spurious if (len > 1) test dropped from shared_cpu_map_show().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:36 -08:00
Linus Torvalds b9085bcbf5 Fairly small update, but there are some interesting new features.
Common: Optional support for adding a small amount of polling on each HLT
 instruction executed in the guest (or equivalent for other architectures).
 This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
 or TCP_RR netperf tests).  This also has to be enabled manually for now,
 but the plan is to auto-tune this in the future.
 
 ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
 tracking
 
 s390: several optimizations and bugfixes.  Also a first: a feature
 exposed by KVM (UUID and long guest name in /proc/sysinfo) before
 it is available in IBM's hypervisor! :)
 
 MIPS: Bugfixes.
 
 x86: Support for PML (page modification logging, a new feature in
 Broadwell Xeons that speeds up dirty page tracking), nested virtualization
 improvements (nested APICv---a nice optimization), usual round of emulation
 fixes.  There is also a new option to reduce latency of the TSC deadline
 timer in the guest; this needs to be tuned manually.
 
 Some commits are common between this pull and Catalin's; I see you
 have already included his tree.
 
 ARM has other conflicts where functions are added in the same place
 by 3.19-rc and 3.20 patches.  These are not large though, and entirely
 within KVM.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
 cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
 NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
 JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
 =7gdx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "Fairly small update, but there are some interesting new features.

  Common:
     Optional support for adding a small amount of polling on each HLT
     instruction executed in the guest (or equivalent for other
     architectures).  This can improve latency up to 50% on some
     scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests).  This
     also has to be enabled manually for now, but the plan is to
     auto-tune this in the future.

  ARM/ARM64:
     The highlights are support for GICv3 emulation and dirty page
     tracking

  s390:
     Several optimizations and bugfixes.  Also a first: a feature
     exposed by KVM (UUID and long guest name in /proc/sysinfo) before
     it is available in IBM's hypervisor! :)

  MIPS:
     Bugfixes.

  x86:
     Support for PML (page modification logging, a new feature in
     Broadwell Xeons that speeds up dirty page tracking), nested
     virtualization improvements (nested APICv---a nice optimization),
     usual round of emulation fixes.

     There is also a new option to reduce latency of the TSC deadline
     timer in the guest; this needs to be tuned manually.

     Some commits are common between this pull and Catalin's; I see you
     have already included his tree.

  Powerpc:
     Nothing yet.

     The KVM/PPC changes will come in through the PPC maintainers,
     because I haven't received them yet and I might end up being
     offline for some part of next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
  KVM: ia64: drop kvm.h from installed user headers
  KVM: x86: fix build with !CONFIG_SMP
  KVM: x86: emulate: correct page fault error code for NoWrite instructions
  KVM: Disable compat ioctl for s390
  KVM: s390: add cpu model support
  KVM: s390: use facilities and cpu_id per KVM
  KVM: s390/CPACF: Choose crypto control block format
  s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
  KVM: s390: reenable LPP facility
  KVM: s390: floating irqs: fix user triggerable endless loop
  kvm: add halt_poll_ns module parameter
  kvm: remove KVM_MMIO_SIZE
  KVM: MIPS: Don't leak FPU/DSP to guest
  KVM: MIPS: Disable HTW while in guest
  KVM: nVMX: Enable nested posted interrupt processing
  KVM: nVMX: Enable nested virtual interrupt delivery
  KVM: nVMX: Enable nested apic register virtualization
  KVM: nVMX: Make nested control MSRs per-cpu
  KVM: nVMX: Enable nested virtualize x2apic mode
  KVM: nVMX: Prepare for using hardware MSR bitmap
  ...
2015-02-13 09:55:09 -08:00
Linus Torvalds 818099574b Merge branch 'akpm' (patches from Andrew)
Merge third set of updates from Andrew Morton:

 - the rest of MM

   [ This includes getting rid of the numa hinting bits, in favor of
     just generic protnone logic.  Yay.     - Linus ]

 - core kernel

 - procfs

 - some of lib/ (lots of lib/ material this time)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits)
  lib/lcm.c: replace include
  lib/percpu_ida.c: remove redundant includes
  lib/strncpy_from_user.c: replace module.h include
  lib/stmp_device.c: replace module.h include
  lib/sort.c: move include inside #if 0
  lib/show_mem.c: remove redundant include
  lib/radix-tree.c: change to simpler include
  lib/plist.c: remove redundant include
  lib/nlattr.c: remove redundant include
  lib/kobject_uevent.c: remove redundant include
  lib/llist.c: remove redundant include
  lib/md5.c: simplify include
  lib/list_sort.c: rearrange includes
  lib/genalloc.c: remove redundant include
  lib/idr.c: remove redundant include
  lib/halfmd4.c: simplify includes
  lib/dynamic_queue_limits.c: simplify includes
  lib/sort.c: use simpler includes
  lib/interval_tree.c: simplify includes
  hexdump: make it return number of bytes placed in buffer
  ...
2015-02-12 18:54:28 -08:00
Cyril Bur 4be1b29795 powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
On POWER8 virtualised kernels the VTB register can be read to have a view
of time that only increases while the guest is running.  This will prevent
guests from seeing time jump if a guest is paused for significant amounts
of time.

On POWER7 and below virtualised kernels stolen time is subtracted from
local_clock as a best effort approximation.  This will not eliminate
spurious warnings in the case of a suspended guest but may reduce the
occurance in the case of softlockups due to host over commit.

Bare metal kernels should avoid reading the VTB as KVM does not restore
sane values when not executing, the approxmation is fine as host kernels
won't observe any stolen time.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Jones <drjones@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: chai wen <chaiw.fnst@cn.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Ben Zhang <benzh@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Andy Lutomirski f56141e3e2 all arches, signal: move restart_block to struct task_struct
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target.  This is because the
restart_block is held in the same memory allocation as the kernel stack.

Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.

Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.

It's also a decent simplification, since the restart code is more or less
identical on all architectures.

[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:12 -08:00
Mel Gorman 21d9ee3eda mm: remove remaining references to NUMA hinting bits and helpers
This patch removes the NUMA PTE bits and associated helpers.  As a
side-effect it increases the maximum possible swap space on x86-64.

One potential source of problems is races between the marking of PTEs
PROT_NONE, NUMA hinting faults and migration.  It must be guaranteed that
a PTE being protected is not faulted in parallel, seen as a pte_none and
corrupting memory.  The base case is safe but transhuge has problems in
the past due to an different migration mechanism and a dependance on page
lock to serialise migrations and warrants a closer look.

task_work hinting update			parallel fault
------------------------			--------------
change_pmd_range
  change_huge_pmd
    __pmd_trans_huge_lock
      pmdp_get_and_clear
						__handle_mm_fault
						pmd_none
						  do_huge_pmd_anonymous_page
						  read? pmd_lock blocks until hinting complete, fail !pmd_none test
						  write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none
      pmd_modify
      set_pmd_at

task_work hinting update			parallel migration
------------------------			------------------
change_pmd_range
  change_huge_pmd
    __pmd_trans_huge_lock
      pmdp_get_and_clear
						__handle_mm_fault
						  do_huge_pmd_numa_page
						    migrate_misplaced_transhuge_page
						    pmd_lock waits for updates to complete, recheck pmd_same
      pmd_modify
      set_pmd_at

Both of those are safe and the case where a transhuge page is inserted
during a protection update is unchanged.  The case where two processes try
migrating at the same time is unchanged by this series so should still be
ok.  I could not find a case where we are accidentally depending on the
PTE not being cleared and flushed.  If one is missed, it'll manifest as
corruption problems that start triggering shortly after this series is
merged and only happen when NUMA balancing is enabled.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Mel Gorman 842915f566 ppc64: add paranoid warnings for unexpected DSISR_PROTFAULT
ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if
they are triggered.  This patch adds warnings just in case they are being
accidentally depended upon.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Mel Gorman 8a0516ed8b mm: convert p[te|md]_numa users to p[te|md]_protnone_numa
Convert existing users of pte_numa and friends to the new helper.  Note
that the kernel is broken after this patch is applied until the other page
table modifiers are also altered.  This patch layout is to make review
easier.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Mel Gorman e7bb4b6d16 mm: add p[te|md] protnone helpers for use by NUMA balancing
This is a preparatory patch that introduces protnone helpers for automatic
NUMA balancing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Linus Torvalds 8494bcf5b7 Merge branch 'for-3.20/drivers' of git://git.kernel.dk/linux-block
Pull block driver changes from Jens Axboe:
 "This contains:

   - The 4k/partition fixes for brd from Boaz/Matthew.

   - A few xen front/back block fixes from David Vrabel and Roger Pau
     Monne.

   - Floppy changes from Takashi, cleaning the device file creation.

   - Switching libata to use the new blk-mq tagging policy, removing
     code (and a suboptimal implementation) from libata.  This will
     throw you a merge conflict, since a bug in the original libata
     tagging code was fixed since this code was branched.  Trivial.
     From Shaohua.

   - Conversion of loop to blk-mq, from Ming Lei.

   - Cleanup of the io_schedule() handling in bsg from Peter Zijlstra.
     He claims it improves on unreadable code, which will cost him a
     beer.

   - Maintainer update or NDB, now handled by Markus Pargmann.

   - NVMe:
        - Optimization from me that avoids a kmalloc/kfree per IO for
          smaller (<= 8KB) IO. This cuts about 1% of high IOPS CPU
          overhead.
        - Removal of (now) dead RCU code, a relic from before NVMe was
          converted to blk-mq"

* 'for-3.20/drivers' of git://git.kernel.dk/linux-block:
  xen-blkback: default to X86_32 ABI on x86
  xen-blkfront: fix accounting of reqs when migrating
  xen-blkback,xen-blkfront: add myself as maintainer
  block: Simplify bsg complete all
  floppy: Avoid manual call of device_create_file()
  NVMe: avoid kmalloc/kfree for smaller IO
  MAINTAINERS: Update NBD maintainer
  libata: make sata_sil24 use fifo tag allocator
  libata: move sas ata tag allocation to libata-scsi.c
  libata: use blk taging
  NVMe: within nvme_free_queues(), delete RCU sychro/deferred free
  null_blk: suppress invalid partition info
  brd: Request from fdisk 4k alignment
  brd: Fix all partitions BUGs
  axonram: Fix bug in direct_access
  loop: add blk-mq.h include
  block: loop: don't handle REQ_FUA explicitly
  block: loop: introduce lo_discard() and lo_req_flush()
  block: loop: say goodby to bio
  block: loop: improve performance via blk-mq
2015-02-12 14:30:53 -08:00
Linus Torvalds 3e12cefbe1 Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "This contains:

   - A series from Christoph that cleans up and refactors various parts
     of the REQ_BLOCK_PC handling.  Contributions in that series from
     Dongsu Park and Kent Overstreet as well.

   - CFQ:
        - A bug fix for cfq for realtime IO scheduling from Jeff Moyer.
        - A stable patch fixing a potential crash in CFQ in OOM
          situations.  From Konstantin Khlebnikov.

   - blk-mq:
        - Add support for tag allocation policies, from Shaohua. This is
          a prep patch enabling libata (and other SCSI parts) to use the
          blk-mq tagging, instead of rolling their own.
        - Various little tweaks from Keith and Mike, in preparation for
          DM blk-mq support.
        - Minor little fixes or tweaks from me.
        - A double free error fix from Tony Battersby.

   - The partition 4k issue fixes from Matthew and Boaz.

   - Add support for zero+unprovision for blkdev_issue_zeroout() from
     Martin"

* 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits)
  block: remove unused function blk_bio_map_sg
  block: handle the null_mapped flag correctly in blk_rq_map_user_iov
  blk-mq: fix double-free in error path
  block: prevent request-to-request merging with gaps if not allowed
  blk-mq: make blk_mq_run_queues() static
  dm: fix multipath regression due to initializing wrong request
  cfq-iosched: handle failure of cfq group allocation
  block: Quiesce zeroout wrapper
  block: rewrite and split __bio_copy_iov()
  block: merge __bio_map_user_iov into bio_map_user_iov
  block: merge __bio_map_kern into bio_map_kern
  block: pass iov_iter to the BLOCK_PC mapping functions
  block: add a helper to free bio bounce buffer pages
  block: use blk_rq_map_user_iov to implement blk_rq_map_user
  block: simplify bio_map_kern
  block: mark blk-mq devices as stackable
  block: keep established cmd_flags when cloning into a blk-mq request
  block: add blk-mq support to blk_insert_cloned_request()
  block: require blk_rq_prep_clone() be given an initialized clone request
  blk-mq: add tag allocation policy
  ...
2015-02-12 14:13:23 -08:00
Linus Torvalds a26be149fa IOMMU Updates for Linux v3.20
This time with:
 
 	* Generic page-table framework for ARM IOMMUs using the LPAE page-table
 	  format, ARM-SMMU and Renesas IPMMU make use of it already.
 
 	* Break out of the IO virtual address allocator from the Intel IOMMU so
 	  that it can be used by other DMA-API implementations too. The first
 	  user will be the ARM64 common DMA-API implementation for IOMMUs
 
 	* Device tree support for Renesas IPMMU
 
 	* Various fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJU3MJOAAoJECvwRC2XARrjopUP+wachFx8vb00M4hlnlwL6FCn
 DyIFkA1n4wL0muPhjcBI+LViEXrSxjr2TYoJEaBg+fiByWWQ1Hefg+KPz331Lo1D
 +uo7WiOa1AB3pfkQiUN9IN6xx+o6ivhb3UQPiL4FHjggB/qz+KVxMM9nx0j8o0fQ
 D9q6HLFiOIsFkra3xZaSuDGvYUBpcwyfn8FP1HVfvLlg1uxIGDcUJX3qU5UBpj9q
 al/lPZ4A7rp+JLApV6WyouPiyVOZKikb5x920KeRNBem7a9fNBdgf+x7QbKpNXa1
 5MaT5MarwGe8lJE4wtjOqRtsllhia+A1rg/6JbROPrlGetRFiuIh2sCKLvwOCko/
 IjBHSutpaRT1lFoAG0TAnXQlvHRG/58XxOlP3eF613X/p8/cezuUaTyTIwZam9X3
 j2GWwbUcBiHTxlu7bQDPz6a7cTf4w6wEALzYl18QrAFv+2LqlCfOo/LSlpStmjrF
 kRN8DYaohlTULvmFneSr8rfGsnp5yPgIPvdmqiSwTz/Ih7kYPgfLy6+v6IAHUqZj
 0n9oGs8eMqVvSzM2qqmyA9WGuQZRyhNjj4iDwn/he5YMw2kqxUQYGMpLnSu0Oi48
 n4PqodtVol64jKLwaHZwyU8u71iyjUC5K9TDot/I2wlSRcTELJhxGh6c1sfDLyrO
 u/htIszgKCgFvVrQoEZB
 =dwrA
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This time with:

   - Generic page-table framework for ARM IOMMUs using the LPAE
     page-table format, ARM-SMMU and Renesas IPMMU make use of it
     already.

   - Break out the IO virtual address allocator from the Intel IOMMU so
     that it can be used by other DMA-API implementations too.  The
     first user will be the ARM64 common DMA-API implementation for
     IOMMUs

   - Device tree support for Renesas IPMMU

   - Various fixes and cleanups all over the place"

* tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits)
  iommu/amd: Convert non-returned local variable to boolean when relevant
  iommu: Update my email address
  iommu/amd: Use wait_event in put_pasid_state_wait
  iommu/amd: Fix amd_iommu_free_device()
  iommu/arm-smmu: Avoid build warning
  iommu/fsl: Various cleanups
  iommu/fsl: Use %pa to print phys_addr_t
  iommu/omap: Print phys_addr_t using %pa
  iommu: Make more drivers depend on COMPILE_TEST
  iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered
  iommu: Disable on !MMU builds
  iommu/fsl: Remove unused fsl_of_pamu_ids[]
  iommu/fsl: Fix section mismatch
  iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator
  iommu: Fix trace_map() to report original iova and original size
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  iopoll: Introduce memory-mapped IO polling macros
  iommu/arm-smmu: don't touch the secure STLBIALL register
  iommu/arm-smmu: make use of generic LPAE allocator
  iommu: io-pgtable-arm: add non-secure quirk
  ...
2015-02-12 09:16:56 -08:00
Linus Torvalds 59d53737a8 Merge branch 'akpm' (patches from Andrew)
Merge second set of updates from Andrew Morton:
 "More of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  vmstat: Reduce time interval to stat update on idle cpu
  mm/page_owner.c: remove unnecessary stack_trace field
  Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
  mm: incorporate read-only pages into transparent huge pages
  vmstat: do not use deferrable delayed work for vmstat_update
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations
  mm: when stealing freepages, also take pages created by splitting buddy page
  mincore: apply page table walker on do_mincore()
  mm: /proc/pid/clear_refs: avoid split_huge_page()
  mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
  mempolicy: apply page table walker on queue_pages_range()
  arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
  memcg: cleanup preparation for page table walk
  numa_maps: remove numa_maps->vma
  numa_maps: fix typo in gather_hugetbl_stats
  pagemap: use walk->vma instead of calling find_vma()
  clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
  ...
2015-02-11 18:23:28 -08:00
Linus Torvalds d3f180ea1a powerpc updates for 3.20
Including:
 
 - Update of all defconfigs
 - Addition of a bunch of config options to modernise our defconfigs
 - Some PS3 updates from Geoff
 - Optimised memcmp for 64 bit from Anton
 - Fix for kprobes that allows 'perf probe' to work from Naveen
 - Several cxl updates from Ian & Ryan
 - Expanded support for the '24x7' PMU from Cody & Sukadev
 - Freescale updates from Scott:
   "Highlights include 8xx optimizations, some more work on datapath device
    tree content, e300 machine check support, t1040 corenet error reporting,
    and various cleanups and fixes."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2/LSAAoJEFHr6jzI4aWATDAQAKPU6v2Mq0sLnGst69waHU/Q
 vvpIq9hqVeSr6znHhrnazc3iQTLk0acqIdxUl/dT+5ADhi9+FxGD5Ckk+BH1DDve
 g6mQelSMlVZF9hKonHsbr4iUuTUyZyx2vj2qjdgOaRiv9Xubq6vUFNeolq3AeHxv
 J33vqRTmowj3VJ52u+V1dmzXQGfUye7DG2jHpjXoBieZsroTvyuYm5GoIPblWFO6
 zbYRh6IitALnQRtXfwIManPyWMkJti9JX8PwDkmvacr+V+MXbrksHpIOITMhNlo1
 WsVnFMpxuk80XuUfhaKZgISgBSfCqBckvKDn2QwztF2/kBnV6Su5xiOKVgouzM6B
 myy+maiMZlNJlNjqdMK5v2bqMXICP048zgfMbDN2e1K25jSSlRawt0RngoCQO2EP
 7aWmEDAlL3shgzkl68pj1fevQokxC/40C1yExIgAa9C31+bjtMz4Xb1SfN1SSveW
 7uWEY/eG9eLsrSE1CeBDvh6B8BRdyuIHgPhux4Tgc/bUtBGFQ29NuXwKh3QCeEy9
 9wWrRGx3U69eP06Ey7P5js3jPTQs80bjJewyGaiPQF5XHB89To8Dg8VfXjEV49Dx
 Pa3OLL5QsQloKfEBiEhQeGfKYImC00pVYAxc0qpmnr9T+25Ri1TLdF1EBAwriSYE
 5p9kSW+ZIht0lvzsdPNm
 =xDU3
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Update of all defconfigs

 - Addition of a bunch of config options to modernise our defconfigs

 - Some PS3 updates from Geoff

 - Optimised memcmp for 64 bit from Anton

 - Fix for kprobes that allows 'perf probe' to work from Naveen

 - Several cxl updates from Ian & Ryan

 - Expanded support for the '24x7' PMU from Cody & Sukadev

 - Freescale updates from Scott:
    "Highlights include 8xx optimizations, some more work on datapath
     device tree content, e300 machine check support, t1040 corenet
     error reporting, and various cleanups and fixes"

* tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
  cxl: Add missing return statement after handling AFU errror
  cxl: Fail AFU initialisation if an invalid configuration record is found
  cxl: Export optional AFU configuration record in sysfs
  powerpc/mm: Warn on flushing tlb page in kernel context
  powerpc/powernv: Add OPAL soft-poweroff routine
  powerpc/perf/hv-24x7: Document sysfs event description entries
  powerpc/perf/hv-gpci: add the remaining gpci requests
  powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
  powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
  perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
  perf: add PMU_EVENT_ATTR_STRING() helper
  perf: provide sysfs_show for struct perf_pmu_events_attr
  powerpc/kernel: Avoid initializing device-tree pointer twice
  powerpc: Remove old compile time disabled syscall tracing code
  powerpc/kernel: Make syscall_exit a local label
  cxl: Fix device_node reference counting
  powerpc/mm: bail out early when flushing TLB page
  powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
  perf/powerpc: reset event hw state when adding it to the PMU
  powerpc/qe: Use strlcpy()
  ...
2015-02-11 18:15:38 -08:00
Naoya Horiguchi 1757bbd9c5 arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
We don't have to use mm_walk->private to pass vma to the callback function
because of mm_walk->vma.  And walk_page_vma() is useful if we walk over a
single vma.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:06 -08:00
Kirill A. Shutemov d016bf7ece mm: make FIRST_USER_ADDRESS unsigned long on all archs
LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
 >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

The code:

 > 2857                WARN_ON(mm_nr_pmds(mm) >
   2858                                round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

In this, on tile, we have FIRST_USER_ADDRESS defined as 0.  round_up() has
the same type -- int.  PUD_SHIFT.

I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long.  On every arch for consistency.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
Naoya Horiguchi 61f77eda9b mm/hugetlb: reduce arch dependent code around follow_huge_*
Currently we have many duplicates in definitions around
follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
patch tries to remove the m.  The basic idea is to put the default
implementation for these functions in mm/hugetlb.c as weak symbols
(regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
arch-specific code only when the arch needs it.

For follow_huge_addr(), only powerpc and ia64 have their own
implementation, and in all other architectures this function just returns
ERR_PTR(-EINVAL).  So this patch sets returning ERR_PTR(-EINVAL) as
default.

As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
always return 0 in your architecture (like in ia64 or sparc,) it's never
called (the callsite is optimized away) no matter how implemented it is.
So in such architectures, we don't need arch-specific implementation.

In some architecture (like mips, s390 and tile,) their current
arch-specific follow_huge_(pmd|pud)() are effectively identical with the
common code, so this patch lets these architecture use the common code.

One exception is metag, where pmd_huge() could return non-zero but it
expects follow_huge_pmd() to always return NULL.  This means that we need
arch-specific implementation which returns NULL.  This behavior looks
strange to me (because non-zero pmd_huge() implies that the architecture
supports PMD-based hugepage, so follow_huge_pmd() can/should return some
relevant value,) but that's beyond this cleanup patch, so let's keep it.

Justification of non-trivial changes:
- in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
  patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
  is true when follow_huge_pmd() can be called (note that pmd_huge() has
  the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
- in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
  code. This patch forces these archs use PMD_MASK, but it's OK because
  they are identical in both archs.
  In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
  In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
  PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
  PTE_ORDER is always 0, so these are identical.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:01 -08:00
Linus Torvalds c08f846793 PCI changes for the v3.20 merge window:
Enumeration
     - Move domain assignment from arm64 to generic code (Lorenzo Pieralisi)
     - ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi)
     - ARM: Move to generic PCI domains (Lorenzo Pieralisi)
     - Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado)
     - Add and use generic config accessors on ARM, PowerPC (Rob Herring)
 
   Resource management
     - Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi)
     - Fix infinite loop with ROM image of size 0 (Michel Dänzer)
 
   PCI device hotplug
     - Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas)
 
   Virtualization
     - Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson)
     - Add DMA alias quirk for Adaptec 3405 (Alex Williamson)
     - Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson)
     - Add ACS quirk for Emulex NICs (Vasundhara Volam)
 
   MSI
     - Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang)
 
   Freescale Layerscape host bridge driver
     - Fix platform_no_drv_owner.cocci warnings (Julia Lawall)
 
   NVIDIA Tegra host bridge driver
     - Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach)
 
   Renesas R-Car host bridge driver
     - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
 
   TI Keystone host bridge driver
     - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
     - Fix misspelling of current function in debug output (Julia Lawall)
 
   Xilinx AXI host bridge driver
     - Fix harmless format string warning (Arnd Bergmann)
 
   Miscellaneous
     - Use standard parsing functions for ASPM sysfs setters (Chris J Arges)
     - Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao)
     - Delete unnecessary NULL pointer checks (Markus Elfring)
     - Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki)
     - Include clk.h instead of clk-private.h (Stephen Boyd)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU1idjAAoJEFmIoMA60/r8C+gP/3p1Ya/cW+liX7e0Pz6wsrkB
 pAk9Af9Iz7RHYb0ODAs1XnlvuJJtJ6nXb9iXTFefhWKfVK0dF1i0w2VqsLUa+iLS
 V65XzkdtrEa+uJj5plvzehrHQhOPh5U2WtZvsAgeC6yu9F/LhnJywOIZjaYdCYwE
 /lXBgLPJiqXfDyEpKT6TqObwPpY7Ly+7+yNZ4LcO84AuBwb6lBq88Eyl7Ft+K57m
 dIJVh8ZTQMzVy5EcGbLoIYF4Mg8zdQbxju73bfeBNerxwd7QD7l0mfiQ3yIexRrQ
 FvzgIerDYdabKgYcbC3cQzMR4V0TEcWs0E7VqiokU4onor4VnK4A0PtbMEWcK8YN
 OZnQ8d4imHhJN4HdJeMhiKIIk+Cr52A1fC/AKmL0Ddw8yKusgjPz2Ux0pHpXMR1a
 NodymVV4XWcDBKWPX0DLESe8wJC4fN+v8bwMVWg20BC709BaK61yA7lGqJ70HmJ+
 u5mWokjgiycTQmJBiJmkEM9b5YVHLjrQ0PwDiYEtkxhMmd/ti+o12fhyNU8Epkr7
 fgDEI/OTURbhbrX0qiQ8RZiSSt5WXuy+ZWbw76rfo3SUH3Lt1WrcdbZrUd9em2hy
 dqOA1vV97JUtQMD/Dsg9apTObR9XJk2B/lyZs1YyCcb77KGSEsSw2x+xz3RRVHM4
 uQDwVupB/QsRVAu4tubr
 =N6mH
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Move domain assignment from arm64 to generic code (Lorenzo Pieralisi)
    - ARM: Remove artificial dependency on pci_sys_data domain (Lorenzo Pieralisi)
    - ARM: Move to generic PCI domains (Lorenzo Pieralisi)
    - Generate uppercase hex for modalias var in uevent (Ricardo Ribalda Delgado)
    - Add and use generic config accessors on ARM, PowerPC (Rob Herring)

  Resource management
    - Free resources on failure in of_pci_get_host_bridge_resources() (Lorenzo Pieralisi)
    - Fix infinite loop with ROM image of size 0 (Michel Dänzer)

  PCI device hotplug
    - Handle surprise add even if surprise removal isn't supported (Bjorn Helgaas)

  Virtualization
    - Mark AMD/ATI VGA devices that don't reset on D3hot->D0 transition (Alex Williamson)
    - Add DMA alias quirk for Adaptec 3405 (Alex Williamson)
    - Add Wellsburg (X99) to Intel PCH root port ACS quirk (Alex Williamson)
    - Add ACS quirk for Emulex NICs (Vasundhara Volam)

  MSI
    - Fail MSI-X mappings if there's no space assigned to MSI-X BAR (Yijing Wang)

  Freescale Layerscape host bridge driver
    - Fix platform_no_drv_owner.cocci warnings (Julia Lawall)

  NVIDIA Tegra host bridge driver
    - Remove unnecessary tegra_pcie_fixup_bridge() (Lucas Stach)

  Renesas R-Car host bridge driver
    - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)

  TI Keystone host bridge driver
    - Fix error handling of irq_of_parse_and_map() (Dmitry Torokhov)
    - Fix misspelling of current function in debug output (Julia Lawall)

  Xilinx AXI host bridge driver
    - Fix harmless format string warning (Arnd Bergmann)

  Miscellaneous
    - Use standard parsing functions for ASPM sysfs setters (Chris J Arges)
    - Add pci_device_to_OF_node() stub for !CONFIG_OF (Kevin Hao)
    - Delete unnecessary NULL pointer checks (Markus Elfring)
    - Add and use defines for PCIe Max_Read_Request_Size (Rafał Miłecki)
    - Include clk.h instead of clk-private.h (Stephen Boyd)"

* tag 'pci-v3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
  PCI: Add pci_device_to_OF_node() stub for !CONFIG_OF
  PCI: xilinx: Convert to use generic config accessors
  PCI: xgene: Convert to use generic config accessors
  PCI: tegra: Convert to use generic config accessors
  PCI: rcar: Convert to use generic config accessors
  PCI: generic: Convert to use generic config accessors
  powerpc/powermac: Convert PCI to use generic config accessors
  powerpc/fsl_pci: Convert PCI to use generic config accessors
  ARM: ks8695: Convert PCI to use generic config accessors
  ARM: sa1100: Convert PCI to use generic config accessors
  ARM: integrator: Convert PCI to use generic config accessors
  PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver
  ARM: dts: versatile: add PCI controller binding
  of/pci: Free resources on failure in of_pci_get_host_bridge_resources()
  PCI: versatile: Add DT docs for ARM Versatile PB PCIe driver
  PCI: Fail MSI-X mappings if there's no space assigned to MSI-X BAR
  r8169: use PCI define for Max_Read_Request_Size
  [SCSI] esas2r: use PCI define for Max_Read_Request_Size
  tile: use PCI define for Max_Read_Request_Size
  rapidio/tsi721: use PCI define for Max_Read_Request_Size
  ...
2015-02-10 14:31:28 -08:00
Linus Torvalds 23e8fe2e16 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - Documentation updates.

   - Miscellaneous fixes.

   - Preemptible-RCU fixes, including fixing an old bug in the
     interaction of RCU priority boosting and CPU hotplug.

   - SRCU updates.

   - RCU CPU stall-warning updates.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  rcu: Initialize tiny RCU stall-warning timeouts at boot
  rcu: Fix RCU CPU stall detection in tiny implementation
  rcu: Add GP-kthread-starvation checks to CPU stall warnings
  rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
  rcu: Optionally run grace-period kthreads at real-time priority
  ksoftirqd: Use new cond_resched_rcu_qs() function
  ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
  rcutorture: Add more diagnostics in rcu_barrier() test failure case
  torture: Flag console.log file to prevent holdovers from earlier runs
  torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
  rcutorture: Handle different mpstat versions
  rcutorture: Check from beginning to end of grace period
  rcu: Remove redundant rcu_batches_completed() declaration
  rcutorture: Drop rcu_torture_completed() and friends
  rcu: Provide rcu_batches_completed_sched() for TINY_RCU
  rcutorture: Use unsigned for Reader Batch computations
  rcutorture: Make build-output parsing correctly flag RCU's warnings
  rcu: Make _batches_completed() functions return unsigned long
  rcutorture: Issue warnings on close calls due to Reader Batch blows
  documentation: Fix smp typo in memory-barriers.txt
  ...
2015-02-09 14:28:42 -08:00
Paolo Bonzini f781951299 kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest.  KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too.  When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest.  This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host.  The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter.  It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls.  During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark.  The wasted time is thus very low.  Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer.  Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06 13:08:37 +01:00
Joonsoo Kim 7b02190c27 mm/debug_pagealloc: fix build failure on ppc and some other archs
Kim Phillips reported following build failure.

  LD      init/built-in.o
  mm/built-in.o: In function `free_pages_prepare':
  mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
  mm/built-in.o: In function `prep_new_page':
  mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
  mm/built-in.o: In function `map_pages':
  mm/compaction.c:61: undefined reference to `.kernel_map_pages'
  make: *** [vmlinux] Error 1

Reason for this problem is that commit 031bc5743f
("mm/debug-pagealloc: make debug-pagealloc boottime configurable")
forgot to remove the old declaration of kernel_map_pages() for some
architectures.  This patch removes them to fix build failure.

Reported-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-05 13:35:30 -08:00
Joerg Roedel a20cc76b9e Merge branches 'arm/renesas', 'arm/smmu', 'arm/omap', 'ppc/pamu', 'x86/amd' and 'core' into next
Conflicts:
	drivers/iommu/Kconfig
	drivers/iommu/Makefile
2015-02-04 16:53:44 +01:00
Arseny Solokha c2c896bee0 powerpc/mm: Warn on flushing tlb page in kernel context
Function __flush_tlb_page() must only be called for user contexts, so
put in extra hardening to warn on calling it for kernel context.

Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-04 13:19:27 +11:00
Joel Stanley 7f43e71e8c powerpc/powernv: Add OPAL soft-poweroff routine
Register a notifier for a OPAL message indicating that the machine
should prepare itself for a graceful power off.

OPAL will tell us if the power off is a reboot or shutdown, but for now
we perform the same orderly_poweroff action.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-04 13:08:25 +11:00
Michael Ellerman a604c96eb0 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 8xx optimizations, some more work on datapath device
tree content, e300 machine check support, t1040 corenet error reporting,
and various cleanups and fixes."
2015-02-04 12:03:21 +11:00
Emil Medve cd70d4659f iommu/fsl: Various cleanups
Currently a PAMU driver patch is very likely to receive some
checkpatch complaints about the code in the context of the
patch. This patch is an attempt to fix most of that and make
the driver more readable

Also fixed a subset of the sparse and coccinelle reported
issues.

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-02-03 18:47:18 +01:00
Michael Turquette 54eea32f7e Merge branch 'clk-next' into v3.19-rc7 2015-02-02 14:59:38 -08:00
Cody P Schafer 97bf264018 powerpc/perf/hv-gpci: add the remaining gpci requests
Add the remaining gpci requests that contain counters suitable for use
by perf. Omit those that don't contain any counters (but note their
ommision).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:39 +11:00
Cody P Schafer 9e9f601084 powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
This adds (in req-gen/) a framework for defining gpci counter requests.
It uses macro magic similar to ftrace.

Also convert the existing hv-gpci request structures and enum values to
use the new framework (and adjust old users of the structs and enum
values to cope with changes in naming).

In exchange for this macro disaster, we get autogenerated event listing
for GPCI in sysfs, build time field offset checking, and zero
duplication of information about GPCI requests.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:39 +11:00
Cody P Schafer 5c5cd7b502 powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
Retrieves and parses the 24x7 catalog on POWER systems that supply it
(right now, only POWER 8). Events are exposed via sysfs in the standard
fashion, and are all parameterized.

	$ cd /sys/bus/event_source/devices/hv_24x7/events

	$ cat HPM_CS_FROM_L4_LDATA__PHYS_CORE
	domain=0x2,offset=0xd58,core=?,lpar=0x0

	$ cat HPM_TLBIE__VCPU_HOME_CHIP
	domain=0x4,offset=0x358,vcpu=?,lpar=?

where user is required to specify values for the fields with '?' (like
core, vcpu, lpar above), when specifying the event with the perf tool.

Catalog is (at the moment) only parsed on boot. It needs re-parsing
when a some hypervisor events occur. At that point we'll also need to
prevent old events from continuing to function (counter that is passed
in via spare space in the config values?).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:38 +11:00
sukadev@linux.vnet.ibm.com e08e52824e perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
Define a lite version of the EVENT_DEFINE_RANGE_FORMAT() that avoids
defining helper functions for the bit-field ranges.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 17:56:38 +11:00
Gavin Shan fe12545e76 powerpc/kernel: Avoid initializing device-tree pointer twice
As commit 50ba08f3 ("of/fdt: Don't clear initial_boot_params
if fdt_check_header() fails") does, the device-tree pointer
"initial_boot_params" is initialized by early_init_dt_verify(),
which is called by early_init_devtree(). So we needn't explicitly
initialize that again in early_init_devtree().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:32 +11:00
Michael Ellerman a4bcbe6a41 powerpc: Remove old compile time disabled syscall tracing code
We have code to do syscall tracing which is disabled at compile time by
default. It's not been touched since the dawn of time (ie. v2.6.12).

There are now better ways to do syscall tracing, ie. using the
raw_syscall, or syscall tracepoints.

For the specific case of tracing syscalls at boot on a system that
doesn't get to userspace, you can boot with:

  trace_event=syscalls tp_printk=on

Which will trace syscalls from boot, and echo all output to the console.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:32 +11:00
Michael Ellerman 4c3b216861 powerpc/kernel: Make syscall_exit a local label
Currently when we back trace something that is in a syscall we see
something like this:

[c000000000000000] [c000000000000000] SyS_read+0x6c/0x110
[c000000000000000] [c000000000000000] syscall_exit+0x0/0x98

Although it's entirely correct, seeing syscall_exit at the bottom can be
confusing - we were exiting from a syscall and then called SyS_read() ?

If we instead change syscall_exit to be a local label we get something
more intuitive:

[c0000001fa46fde0] [c00000000026719c] SyS_read+0x6c/0x110
[c0000001fa46fe30] [c000000000009264] system_call+0x38/0xd0

ie. we were handling a system call, and it was SyS_read().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:31 +11:00
Ryan Grimm 6f963ec2d6 cxl: Fix device_node reference counting
When unbinding and rebinding the driver on a system with a card in PHB0, this
error condition is reached after a few attempts:

ERROR: Bad of_node_put() on /pciex@3fffe40000000
CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152
Call Trace:
[c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable)
[c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0
[c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0
[c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30
[c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50
[c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0
[c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170
[c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0
[c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160
[c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60
[c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0
[c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0
[c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260
[c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100
[c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98

We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
call of_node_get() otherwise np's reference count isn't incremented and
it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
so it's clear it calls of_node_get().

Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-02-02 14:51:31 +11:00
Arseny Solokha 0dc294f717 powerpc/mm: bail out early when flushing TLB page
MMU_NO_CONTEXT is conditionally defined as 0 or (unsigned int)-1. However,
in __flush_tlb_page() a corresponding variable is only tested for open
coded 0, which can cause NULL pointer dereference if `mm' argument was
legitimately passed as such.

Bail out early in case the first argument is NULL, thus eliminate confusion
between different values of MMU_NO_CONTEXT and avoid disabling and then
re-enabling preemption unnecessarily.

Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-30 18:39:00 -06:00
Rob Herring b98fa50847 powerpc/powermac: Convert PCI to use generic config accessors
Convert the powermac PCI driver to use the generic config access functions.

This changes accesses from (in|out)_(8|le16|le32) to readX/writeX variants.
I believe these should be equivalent for PCI config space accesses, but
confirmation would be nice.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
2015-01-30 16:14:43 -06:00
Rob Herring 933d275f1c powerpc/fsl_pci: Convert PCI to use generic config accessors
Convert the fsl_pci driver to use the generic config access functions.

This changes accesses from (in|out)_(8|le16|le32) to readX/writeX variants.
I believe these should be equivalent for PCI config space accesses, but
confirmation would be nice.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
2015-01-30 16:14:43 -06:00
Brian Norris c9111a41dc powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
These defconfigs contain the CONFIG_M25P80 symbol, which is now
dependent on the MTD_SPI_NOR symbol. Add CONFIG_MTD_SPI_NOR to satisfy
the new dependency.

At the same time, drop the now-nonexistent CONFIG_MTD_CHAR symbol.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 23:47:50 -06:00
Alexandru-Cezar Sardan 0d7d9b3a45 perf/powerpc: reset event hw state when adding it to the PMU
When adding an event to the PMU with PERF_EF_START the STOPPED and UPTODATE
flags need to be cleared in the hw.event status variable because they are
preventing the update of the event count on overflow interrupt.

Signed-off-by: Alexandru-Cezar Sardan <alexandru.sardan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 23:44:18 -06:00
Rickard Strandqvist 5db431285d powerpc/qe: Use strlcpy()
Replace strcpy and strncpy with strlcpy to avoid strings that are too
big, or lack null termination.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[scottwood@freescale.com: cleaned up commit message]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 23:32:13 -06:00
Markus Elfring 8ac6e995ac PowerPC-83xx: Deletion of an unnecessary check before the function call "of_node_put"
The of_node_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 23:16:19 -06:00
Alessio Igor Bogani fd5f4914b6 powerpc: dts: pq3/85xx: Fix GPIO address
Fix the GPIO address in the device tree to match the documented location.

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 23:04:32 -06:00
Kumar Gala de58824fc4 powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA QMan
Change-Id: I16e63db731e55a3d60d4e147573c1af8718082d3
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Geoff Thorpe <Geoff.Thorpe@freescale.com>
Signed-off-by: Hai-Ying Wang <Haiying.Wang@freescale.com>
[Emil Medve: Sync with the upstream binding]
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 22:57:46 -06:00
Kumar Gala 39b55b53b9 powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA BMan
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Geoff Thorpe <Geoff.Thorpe@freescale.com>
Signed-off-by: Hai-Ying Wang <Haiying.Wang@freescale.com>
[Emil Medve: Sync with the upstream binding]
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 22:57:45 -06:00
Alessio Igor Bogani 2727ed5471 powerpc/85xx: Add support for Emerson/Artesyn MVME2500.
Add support for the Artesyn MVME2500 Single Board Computer.

The MVME2500 is a 6U form factor VME64 computer with:

	- A single Freescale QorIQ P2010 CPU
	- 1 GB of DDR3 onboard memory
	- Three Gigabit Ethernets
	- Five 16550 compatible UARTS
	- One USB 2.0 port, one SHDC socket and one SATA connector
	- One PCI/PCI eXpress Mezzanine Card (PMC/XMC) Slot
	- MultiProcessor Interrupt Controller (MPIC)
	- A DS1375T Real Time Clock (RTC) and 512 KB of Non-Volatile Memory
	- Two 64 KB EEPROMs
	- U-Boot in 16 SPI Flash

This patch is based on linux-3.18 and has been boot tested.

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 22:57:42 -06:00
Scott Wood bb344ca5b9 powerpc/mpc85xx: Add ranges to etsec2 nodes
Commit 746c9e9f92 "of/base: Fix PowerPC address parsing hack" limited
the applicability of the workaround whereby a missing ranges is treated
as an empty ranges.  This workaround was hiding a bug in the etsec2
device tree nodes, which have children with reg, but did not have
ranges.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Alexander Graf <agraf@suse.de>
2015-01-29 22:57:42 -06:00
Andy Fleming cbe8c43dfd powerpc/config: Enable MDIO support
Also, enable Vitesse PHY and fixed PHY support.

Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 22:57:41 -06:00
Esben Haabendal 974ff4e2d7 powerpc: Add machine_check cpu function for e300c3 cpus
Signed-off-by: Esben Haabendal <eha@deif.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 22:57:40 -06:00
LEROY Christophe 4545ff7ed8 powerpc/8xx: Remove duplicated code in set_context()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe fde5a9057f powerpc/8xx: Optimise access to swapper_pg_dir
All accessed to PGD entries are done via 0(r11).
By using lower part of swapper_pg_dir as load index to r11, we can remove the
ori instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe 17bb312f4c powerpc/8xx: Take benefit of aligned PGDIR
L1 base address is now aligned so we can insert L1 index into r11 directly and
then preserve r10

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe ce67f5d0a0 powerpc32: Use kmem_cache memory for PGDIR
When pages are not 4K, PGDIR table is allocated with kmalloc(). In order to
optimise TLB handlers, aligned memory is needed. kmalloc() doesn't provide
aligned memory blocks, so lets use a kmem_cache pool instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:59:02 -06:00
LEROY Christophe 5ddb75cee5 powerpc/8xx: remove tests on PGDIR entry validity
Kernel MMU handling code handles validity of entries via _PMD_PRESENT which
corresponds to V bit in MD_TWC and MI_TWC. When the V bit is not set, MPC8xx
triggers TLBError exception. So we don't have to check that and branch ourself
to TLBError. We can set TLB entries with non present entries, remove all those
tests and let the 8xx handle it. This reduce the number of cycle when the
entries are valid which is the case most of the time, and doesn't significantly
increase the time for handling invalid entries.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:58:02 -06:00
LEROY Christophe 2374d0af29 powerpc/8xx: remove remaining unnecessary code in FixupDAR
Since commit 33fb845a6f ("powerpc/8xx: Don't use MD_TWC for walk"), MD_EPN and
MD_TWC are not writen anymore in FixupDAR so saving r3 has become useless.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:58:01 -06:00
LEROY Christophe debddd95ec powerpc/8xx: reduce pressure on TLB due to context switches
For nohash powerpc, when we run out of contexts, contexts are freed by stealing
used contexts in-turn. When a victim has been selected, the associated TLB
entries are freed using _tlbil_pid(). Unfortunatly, on the PPC 8xx, _tlbil_pid()
does a tlbia, hence flushes ALL TLB entries and not only the one linked to the
stolen context. Therefore, as implented today, at each task switch requiring a
new context, all entries are flushed.

This patch modifies the implementation so that when running out of contexts, all
contexts get freed at once, hence dividing the number of calls to tlbia by 16.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 21:51:06 -06:00
LEROY Christophe cadbfd0146 powerpc/8xx: use _PAGE_RO instead of _PAGE_RW
On powerpc 8xx, in TLB entries, 0x400 bit is set to 1 for read-only pages
and is set to 0 for RW pages. So we should use _PAGE_RO instead of _PAGE_RW

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 20:13:10 -06:00
LEROY Christophe a7b9f671f2 powerpc32: adds handling of _PAGE_RO
Some powerpc like the 8xx don't have a RW bit in PTE bits but a RO
(Read Only) bit.  This patch implements the handling of a _PAGE_RO flag
to be used in place of _PAGE_RW

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood@freescale.com: fix whitespace]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 20:11:51 -06:00
Tom Huynh d2caa3cebd powerpc/perf: fix fsl_emb_pmu_start to write correct pmc value
PMCs on PowerPC increases towards 0x80000000 and triggers an overflow
interrupt when the msb is set to collect a sample. Therefore, to setup
for the next sample collection, pmu_start should set the pmc value to
0x80000000 - left instead of left which incorrectly delays the next
overflow interrupt. Same as commit 9a45a9407c ("powerpc/perf:
power_pmu_start restores incorrect values, breaking frequency events")
for book3s.

Signed-off-by: Tom Huynh <tom.huynh@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 20:05:56 -06:00
Emil Medve 238cac16c0 powerpc: Remove duplicate tlbcam_index declarations
They seem to be leftovers from '14cf11a powerpc: Merge enough to start
building in arch/powerpc'

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 19:59:03 -06:00
Emil Medve e8d2081bb4 powerpc/dts: Remove T4240 emulator support
Probably we should have not upstreamed this in the first place

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 19:58:39 -06:00
Kim Phillips 6d5f6a0eba powerpc/fsl_pci: Fix pci stack build bug with FRAME_WARN
Fix this:

  CC      arch/powerpc/sysdev/fsl_pci.o
arch/powerpc/sysdev/fsl_pci.c: In function 'fsl_pcie_check_link':
arch/powerpc/sysdev/fsl_pci.c:91:1: error: the frame size of 1360 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

when configuring FRAME_WARN, by refactoring indirect_read_config()
to take hose and bus number instead of the 1344-byte struct pci_bus.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29 19:56:15 -06:00
Linus Torvalds 33692f2759 vm: add VM_FAULT_SIGSEGV handling support
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.

Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29 10:51:32 -08:00
Gavin Shan 31494cf353 powerpc/powernv: Don't alloc IRQ map if necessary
On PowerNV platform, the OPAL interrupts are exported by firmware
through device-node property (/ibm,opal::opal-interrupts). Under
some extreme circumstances (e.g. simulator), we don't have this
property found from the device tree. For that case, we shouldn't
allocate the interrupt map. Otherwise, slab complains allocating
zero sized memory chunk.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:28:10 +11:00
Gavin Shan c1c3a526bb powerpc/powernv: Separate function for OPAL IRQ setup
The patch put the OPAL interrupt setup logic in opal_init() into
seperate function opal_irq_init() for easier code maintaining. The
patch doesn't introduce logic changes except:

   * Rename variable names.
   * Release virtual IRQ upon error from request_irq().
   * Don't cache the virtual IRQ to opal_irqs[] upon error from
     request_irq().

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:28:04 +11:00
Michael Ellerman 0813513943 powerpc/powernv: Remove "opal" prefix from pr_xxx()s
In commit c8742f8512 "powerpc/powernv: Expose OPAL firmware symbol
map" I added pr_fmt() to opal.c. This left some existing pr_xxx()s with
duplicate "opal" prefixes, eg:

    opal: opal: Found 0 interrupts reserved for OPAL

Fix them all up. Also make the "Not not found" message a bit more
verbose.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:10:33 +11:00
Michael Ellerman 8aa989b8fb powerpc: Remove some unused functions
Remove slice_set_psize() which is not used.

It was added in 3a8247cc2c "powerpc: Only demote individual slices
rather than whole process" but was never used.

Remove vsx_assist_exception() which is not used.

It was added in ce48b21007 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.

Remove generic_mach_cpu_die() which is not used.

Its last caller was removed in 375f561a41 "powerpc/powernv: Always go
into nap mode when CPU is offline".

Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.

These were introduced in c5d56332fd "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Michael Ellerman 1dcee55fea powerpc/lib: Makefile, use obj64-y to consolidate 64-bit rules
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Michael Ellerman 564ec2f2a0 powerpc/lib: Makefile, consolidate obj-y sections
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 15:00:24 +11:00
Kim Phillips ce614c3c6c powerpc/mm: fix undefined reference to `.__kernel_map_pages' on FSL PPC64
arch/powerpc has __kernel_map_pages implementations in mm/pgtable_32.c, and
mm/hash_utils_64.c, of which the former is built for PPC32, and the latter
for PPC64 machines with PPC_STD_MMU.  Fix arch/powerpc/Kconfig to not select
ARCH_SUPPORTS_DEBUG_PAGEALLOC when CONFIG_PPC_STD_MMU_64 isn't defined,
i.e., for 64-bit book3e builds to use the generic __kernel_map_pages()
in mm/debug-pagealloc.c.

  LD      init/built-in.o
mm/built-in.o: In function `kernel_map_pages':
include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
Makefile:925: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-28 14:22:22 +11:00
Linus Torvalds 7da323bb45 Two powerpc fixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUxx7MAAoJEFHr6jzI4aWAqe0QAIGGb2mtU1gON4XtWIlYPzSu
 TZDGAK6pCeCr92tH5H3HXy7hOEbZOPQGJCqvOHMK/kcJtlkXSrW1ERDPryqEv9Xq
 V+PJMY5Tre1ga1xttyk3ny4zKmAy4UcLm6Wepi5054kB0Svo7/s2IKECwsInY/D3
 yYxSW4XUZXvCYqdZUqdQbCSgocWOfFJbPqt4kf2IiqjCPlWU1GGjpMpmqXFhTh4Y
 wuU0FtcC7db+wCvjaXs/dbCHMZqWqsQ+WCo9kz2+fLtRbthXSk0A9V96jLzah32/
 yQHqyH2+/6ZklxApjgvuQ4ZhZogU2U2PlFLfpJqiBhy+5WnkBpZtJ1lIj9hGlyC4
 pvKjrHTp0Q8+gZ8ZxuEuiKV1L+01kWbtQwg4Z5HIuulz7DfiSGatSez6CuwIeV49
 A5dQmse0bK6L4XWlB+RKyLdnj+HSPVFIcLlTPMSV/UlDp7IySphR+zp28ykyWv0C
 yGa9JUeD7hvM3Oezo88QoS4P5rk/KmQjPMhYz+ab8/eX3RCtBA1jb/KrdLm2b4tM
 sVJyuiyalL35IoOZlAh8rrKHfu1ZgXyYkJgMdccAy2yQZghmK4H8o8gdf9YWGNNq
 vEZHk+2gKI1T1o5uqvHMmkw0IWV4pE9jm3Jyo2W/pZozxR3oswOtUebCXbK+mLqy
 81GwzBWwALxbRM5HGtIr
 =cDwY
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc fixes from Michael Ellerman:
 "Two powerpc fixes"

* tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared
  powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
2015-01-27 10:04:38 -08:00
Pranith Kumar 6501ab5e38 powerpc/powernv: Skip registering log region when CONFIG_PRINTK=n
When CONFIG_PRINTK=n, log_buf_addr_get() returns NULL and log_buf_len_get()
return 0. Check for these return values and skip registering the dump buffer.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-27 14:04:00 +11:00
Cyril Bur 3df76a9dcc powerpc/pseries: Fix endian problems with LE migration
RTAS events require arguments be passed in big endian while hypercalls
have their arguments passed in registers and the values should therefore
be in CPU endian.

The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
one true RTAS call. This means that "ibm,suspend_me" is handled
specially in the ppc_rtas() syscall.

The ppc_rtas() syscall has its arguments in big endian and can therefore
pass these arguments directly to the RTAS call. "ibm,suspend_me" is
handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
which has left an endian bug on little endian systems due to the
requirement of hypercalls. The return value from rtas_ibm_suspend_me()
gets returned in cpu endian, and is left unconverted, also a bug on
little endian systems.

rtas_ibm_suspend_me() does not actually make use of the rtas_args that
it is passed. This patch removes the convoluted use of the rtas_args
struct to pass params to rtas_ibm_suspend_me() in favour of passing what
it needs as actual arguments. This patch also ensures the two callers of
rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
case of ppc_rtas(), converts the return value.

migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
sysfs file which deals with everything in cpu endian so this function
only underwent cleanup.

This patch has been tested with KVM both LE and BE and on PowerVM both
LE and BE. Under QEMU/KVM the migration happens without touching these
code pathes.

For PowerVM there is no obvious regression on BE and the LE code path
now provides the correct parameters to the hypervisor.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-27 14:03:53 +11:00
Linus Torvalds 550695925d PCI updates for v3.19:
Resource management
     - Clip bridge windows to fit in upstream windows (Yinghai Lu)
 
   Virtualization
     - Mark Atheros AR93xx to avoid using bus reset (Alex Williamson)
 
   Miscellaneous
     - Update Richard Zhu's email address (Lucas Stach)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUwqpDAAoJEFmIoMA60/r8ykIQAINgkP/iPaFMTPkTSfzTJCMY
 oQVNGha4FDt6Ic1UWGyS/sYUpywSnALxlYWxVZTm5r+sGQ2yJBo6veuxvCI09YFw
 lWqf6lfvkFSthWCo7pHLoNaIjKJUNCy4a2han31aAIScMCNX4YF60YorMSjQBST8
 smLMG75U3U9VWaXYsV1e5gTvLa5IQh4lgaTgMAOXqd+6WcAR4WwOgD2sR06o2X43
 63JF2U+ieuA789Xu2IS92TmMMESD5haEZATqdGPtxpnqyHxmBNu0Y4JkkBWD2S92
 HvveOoLBT2TBfICkftvCJscBLHh7PZMIx9nLx58SnijVzX+hzVr4Zfc96MZU50MK
 DuNbbZn3sO902ukOEpfih7Mg0tDxCxNytleEdAnXmZuqf+odbd/Y4AA0Hg6w7GEY
 OsVGbQAT/knlTfsSZsivtmUl7l1SXzrozv+q4f4szY95v34S9pm0sWzz0IBn7oKj
 h7N9Vslr3lyEudOUo1OrFq+0arDw53kwOOkIavMUH0nvTqKs4cmXBcGMfo1EfMa+
 3YhjwbgpvtZ3AXi2NSBk4gIGZEmQslvgRStLhgXVDl+9DieK+sw1Vx4cKe8gu9mD
 c7zPStEsJBJgd3v+8s8avwo8R0oPZb6MsCKFjjaYojTvpfFmfX0YyWE/TzYoUm6Z
 +BTyA8t0+3jTArTs/Zid
 =HRy7
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "These are fixes for:

   - a resource management problem that causes a Radeon "Fatal error
     during GPU init" on machines where the BIOS programmed an invalid
     Root Port window.  This was a regression in v3.16.

   - an Atheros AR93xx device that doesn't handle PCI bus resets
     correctly.  This was a regression in v3.14.

   - an out-of-date email address"

* tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Update Richard Zhu's email address
  sparc/PCI: Clip bridge windows to fit in upstream windows
  powerpc/PCI: Clip bridge windows to fit in upstream windows
  parisc/PCI: Clip bridge windows to fit in upstream windows
  mn10300/PCI: Clip bridge windows to fit in upstream windows
  microblaze/PCI: Clip bridge windows to fit in upstream windows
  ia64/PCI: Clip bridge windows to fit in upstream windows
  frv/PCI: Clip bridge windows to fit in upstream windows
  alpha/PCI: Clip bridge windows to fit in upstream windows
  x86/PCI: Clip bridge windows to fit in upstream windows
  PCI: Add pci_claim_bridge_resource() to clip window if necessary
  PCI: Add pci_bus_clip_resource() to clip to fit upstream window
  PCI: Pass bridge device, not bus, when updating bridge windows
  PCI: Mark Atheros AR93xx to avoid bus reset
  PCI: Add flag for devices where we can't use bus reset
2015-01-24 10:58:47 +12:00
Dominik Dingel 31928aa586 KVM: remove unneeded return value of vcpu_postcreate
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller.  This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed.  But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:24:52 +01:00
Anton Blanchard 15c2d45d17 powerpc: Add 64bit optimised memcmp
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:55 +11:00
Gavin Shan a113de373b powerpc/powernv: Remove pnv_pci_probe_mode()
The callback (ppc_md.pci_probe_mode()) is used to determine if the
child PCI devices of the indicated PCI bus should be probed from
device-tree or hardware. On PowerNV platform, we always expect
probing PCI devices from hardware, which is PowerPC PCI core's
default behaviour. Also, the callback had some delay implemented
based on PHB's device node property "reset-clear-timestamp", which
wasn't exported from skiboot. So we don't need this function and
it's safe to remove it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:54 +11:00
Gavin Shan 1b28f170d9 powerpc/eeh: Allow to set maximal frozen times
When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.

The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).

Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:54 +11:00
Gavin Shan 432227e907 powerpc/eeh: Introduce flag EEH_PE_REMOVED
The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.

Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.

Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:53 +11:00
Gavin Shan 2aa5cf9e48 powerpc/eeh: Fix missed PE#0 on P7IOC
PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:

[root@ltcfbl8eb ~]# dmesg
       :
pci 0000:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20     : [PE# 002] Secondary bus 32..63 associated with PE#2
       :
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:52 +11:00
Gavin Shan 6f20e7f2e9 powerpc/kernel: Avoid memory corruption at early stage
When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.

This hasn't been observed to cause any trouble in practice, but is
obviously fishy.

Fixes: 6f4441ef70 ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:52 +11:00
Emil Medve 53a448c3d5 powerpc: Replace cpumask_weight(cpu_possible_mask) with num_possible_cpus()
num_possible_cpus() is just a shorthand for it.

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:51 +11:00
Naveen N. Rao bf794bf52a powerpc/kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2
Currently, all non-dot symbols are being treated as function descriptors
in ABIv1. This is incorrect and is resulting in perf probe not working:

  # perf probe do_fork
  Added new event:
  Failed to write event: Invalid argument
    Error: Failed to add events.
  # dmesg | tail -1
  [192268.073063] Could not insert probe at _text+768432: -22

perf probe bases all kernel probes on _text and writes,
for example, "p:probe/do_fork _text+768432" to
/sys/kernel/debug/tracing/kprobe_events. In-kernel, _text is being
considered to be a function descriptor and is resulting in the above
error.

Fix this by changing how we lookup symbol addresses on ppc64. We first
check for the dot variant of a symbol and look at the non-dot variant
only if that fails. In this manner, we avoid having to look at the
function descriptor.

While at it, also separate out how this works on ABIv2 where
we don't have dot symbols, but need to use the local entry point.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:51 +11:00
Michael Ellerman 10ea834364 powerpc: Rename _TIF_SYSCALL_T_OR_A to _TIF_SYSCALL_DOTRACE
Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.

All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:51 +11:00
Michael Ellerman ed77d4182b powerpc: Remove unused CPU_FTR_IABR
We removed the last usage of CPU_FTR_IABR in commit 1ad7d70562
"powerpc/xmon: Enable HW instruction breakpoint on POWER8".

Mark it as free.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:50 +11:00
Wei Yang 145a2d0427 powerpc/pci: remove the multi-init for pci_dn->phb
pci_dn->phb is set to phb in update_dn_pci_info(), if succeed.

This patch removes the duplication of pci_dn->phb initialization.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:48 +11:00
Thadeu Lima de Souza Cascardo 4e28784024 powernv/iommu: disable IOMMU bypass with param iommu=nobypass
When IOMMU bypass is enabled, a PCI device can read and write memory
that was not mapped by the driver without causing an EEH. That might
cause memory corruption, for example.

When we disable bypass, DMA reads and writes to addresses not mapped by
the IOMMU will cause an EEH, allowing us to debug such issues.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:46 +11:00
Anshuman Khandual 79872e3546 powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown
The current handling of EPOW_SHUTDOWN_ON_UPS event does not shutdown the
system after logging the message. All the events of EPOW_SYSTEM_SHUTDOWN
action code (EPOW_SHUTDOWN_ON_UPS is a part of it) must initiate system
shutdown as per the SPAPR spec. If the LPAR does not shutdown after
receiving this rtas based event, it will expose itself to a forced
abrupt shutdown initiated by the platform firmware. This patch fixes the
situation.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 11:17:30 +11:00
Wei Yang e9863e687d powerpc/powernv: Print the M64 range information in bootup log
The M64 range information is missed in dmesg, which would be helpful in debug.

This patch prints the M64 range information in the same format as M32.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 11:17:30 +11:00
Rob Herring 453c02c284 powerpc/PCI: Add struct pci_ops member names to initialization
Some instances of pci_ops initialization rely on the read/write members'
location in the struct.  This is fragile and may break when adding new
members to the beginning of the struct.

No functional change.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
CC: cbe-oss-dev@lists.ozlabs.org
2015-01-22 13:58:43 -06:00
Linus Torvalds 193934123c Surprising number of fixes this merge window :(
First two are minor fallout from the param rework which went in this merge
 window.
 
 Next three are a series which fixes a longstanding (but never previously
 reported and unlikely , so no CC stable) race between kallsyms and freeing
 the init section.
 
 Finally, a minor cleanup as our module refcount will now be -1 during
 unload.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUwEmwAAoJENkgDmzRrbjx77kP/1cNQR2eG2sBwokg3q0tvHnQ
 IKqEXErW7NvxRa+RAMEmy2uQoGt6+uNklAbtyJEYM9oR1NieFbPi2yrt9Xn5SAXS
 Brp1S8WYBMilA3W3o6I0trFDRWHdpdtkKIQwLWgJNSEWjbTXh8bSwp/2X1rlOPyI
 ZmphCMOQMU2/uFEyJhTz1WMEV8eVXiRLN8OxSkPxToxdZoGln2U8IBCCCJC9OG+f
 Cf3eMgEcNdEXNcPKqr11NIcHkAx6M6qI/eMDOqk151PslHa8lbis6di9Z87aE0ps
 i8PyrkJGTmgM9cCjXwE8deNseeCmuKYlbPIF+NoxcqtvZstfaMrISwTIEuzV4JHi
 p13YhDxy4XiC3H6pKHub/jo7UCl+wWtFh9SqpqGgduFX/p6FtUHQJm0S0X/DFFZt
 C+2MFVSe6HRHE8B7bFz86+619Qd/rU7+806CLCE+NbYlYAKIBYKzWt/bml6VH3RJ
 OjwXhQqmznWhJjsfD3BUUUpZpHijmylI9gAe2F1oErb8YjRU6gIm7P8hlkOzD7AS
 TfGHPFq2raQcfAiGdVmvkbvvhvYZXnB3WVsAexrYoqrT9I8eEfRI+7SkL75MLR2E
 ikzhJS3SHkAUAd7fUVMt7xMwh0jmhsPjWCCqc13m6UUFoXhTaDgKgPGftltN0bI2
 g85+enZ3/eca6xh/KxvW
 =Kf9b
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module and param fixes from Rusty Russell:
 "Surprising number of fixes this merge window :(

  The first two are minor fallout from the param rework which went in
  this merge window.

  The next three are a series which fixes a longstanding (but never
  previously reported and unlikely , so no CC stable) race between
  kallsyms and freeing the init section.

  Finally, a minor cleanup as our module refcount will now be -1 during
  unload"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: make module_refcount() a signed integer.
  module: fix race in kallsyms resolution during module load success.
  module: remove mod arg from module_free, rename module_memfree().
  module_arch_freeing_init(): new hook for archs before module->module_init freed.
  param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC
  param: initialize store function to NULL if not available.
2015-01-23 06:40:36 +12:00
Ryan Grimm 1212aa1c8c cxl: Enable CAPP recovery
Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
have reinitialized the PHB and done the previous recovery steps.

Add mode argument to opal call to do this. Driver can turn snoops off although
it does not currently.

Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:31:52 +11:00
Michael Ellerman d83fb87b2e powerpc/ps3: Enable CONFIG_PS3_REPOSITORY_WRITE in ps3_defconfig
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:31:22 +11:00
Geoff Levand 5ae74630ee powerpc/ps3: Write highmem info to repository
Add calls to the ps3_mm_set_repository_highmem() routine when the ps3
r1 highmem region is either created or destroyed.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:31:21 +11:00
Geoff Levand d4b18bd675 powerpc/ps3: Add ps3_mm_set_repository_highmem
Add the new routine ps3_mm_set_repository_highmem() that saves highmem info to
the LV1 hypervisor registry so that the info will be available to second stage
OS's loaded by petitboot/kexec. FreeBSD and some Linux derivatives use
this feature.

Also, move the existing ps3_mm_get_repository_highmem() routine up in
the source file.

This implementation of ps3_mm_set_repository_highmem() assumes the repository
will have a single highmem region entry (at index 0).

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:31:21 +11:00
Geoff Levand c02d350673 powerpc/ps3: Add empty repository highmem routines
To avoid the need for preprocessor conditionals in C source files add a set of
empty inline repository highmem write routines to platform.h that are used when
CONFIG_PS3_REPOSITORY_WRITE is not defined.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:31:21 +11:00
Shreyas B. Prabhu 0eb13208aa powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
cause exit from power-saving mode. While waking up from winkle, restoring
LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
issue in the following scenario:

- All the threads in a core are offlined. The core enters deep winkle.
- Spurious interrupt wakes up a thread in the core. Here LPCR is restored
  with LPCR_PECE1 bit set.
- Since it was a spurious interrupt on a offline thread, the thread clears
  the interrupt and goes back to winkle.
- Here before the thread executes winkle and puts the core into deep winkle,
  if a decrementer interrupt occurs on any of the sibling threads in the core
  that thread wakes up.
- Since in offline loop we are flushing interrupt only in case of external
  interrupt, the decrementer interrupt does not get flushed. So at this stage
  the thread is stuck in this is loop of waking up at 0x100 due to decrementer
  interrupt, not flushing the interrupt as only external interrupts get flushed,
  entering winkle, waking up at 0x100 again.

Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit
cleared when waking up from winkle.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 17:22:57 +11:00
Michael Ellerman 601fad35d2 powerpc: Update G5 defconfig
Add CGROUPS and DEVTMPFS, which allows booting with newer userspaces.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-22 16:49:47 +11:00
Ingo Molnar f49028292c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

  - Documentation updates.

  - Miscellaneous fixes.

  - Preemptible-RCU fixes, including fixing an old bug in the
    interaction of RCU priority boosting and CPU hotplug.

  - SRCU updates.

  - RCU CPU stall-warning updates.

  - RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-21 06:12:21 +01:00
Michael Neuling 9654f95a08 powerpc: Enable NUMA balancing in pseries[_le]_defconfig
Distros are enabling NUMA balancing (eg Ubuntu), so it would be good to
get some more test coverage with it.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-21 15:10:25 +11:00
Anton Blanchard 7c62cf2416 powerpc: Enable various container features on pseries defconfigs.
Enable config options required by lxc and docker.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-21 15:10:25 +11:00
Anton Blanchard eaf31601a1 powerpc: Enable overlayfs on pseries and ppc64 defconfigs
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-21 15:10:24 +11:00
Anton Blanchard 36c7bfc4d0 powerpc: Enable KSM on pseries and ppc64 defconfigs
KSM will only be used on areas marked for merging via madvise, and it
is showing nice improvements on KVM workloads, so enable it by
default.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-21 15:10:24 +11:00
Anton Blanchard afda939f0a powerpc: Enable CONFIG_SATA_AHCI on pseries and ppc64 defconfigs
We are starting to see ppc64 boxes with SATA AHCI adapters in it,
so enable it in our defconfigs.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-21 15:10:23 +11:00
Anton Blanchard e91e992588 powerpc: Enable on demand governor on ppc64_defconfig
This was enabled on the pseries defconfigs recently, but missed
the ppc64 one.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-21 15:10:22 +11:00
Kevin Hao f0d3730092 powerpc: call of_clk_init() from time_init()
So the boards which has COMMON_CLK enabled don't have to
invoke this in its board specific file.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Acked-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
2015-01-20 10:09:02 -08:00
Michael Ellerman a85cade676 powerpc: Update all configs using savedefconfig
It looks like it's ~4 years since we updated some of these, so do a bulk
update.

Verified that the before and after generated configs are exactly the
same.

Which begs the question why update them? The answer is that it can be
confusing when the stored defconfig drifts too far from the generated
result.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-20 18:06:58 +11:00
Linus Torvalds 2262889091 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes a regression that arose from the change to add a crypto
  prefix to module names which was done to prevent the loading of
  arbitrary modules through the Crypto API.

  In particular, a number of modules were missing the crypto prefix
  which meant that they could no longer be autoloaded"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: add missing crypto module aliases
2015-01-20 18:17:34 +12:00
Rusty Russell be1f221c04 module: remove mod arg from module_free, rename module_memfree().
Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.

This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org
2015-01-20 11:38:33 +10:30
Christian Borntraeger da1a288d85 ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCE
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the ppc/hugetlbfs code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-19 14:14:19 +01:00
Christian Borntraeger 5ee07612e9 ppc/kvm: Replace ACCESS_ONCE with READ_ONCE
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the ppc/kvm code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
2015-01-19 14:14:07 +01:00
Laurent Dufour e6eb2eba49 powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
The commit 3b8a3c0109 ("powerpc/pseries: Fix endiannes issue in RTAS
call from xmon") was fixing an endianness issue in the call made from
xmon to RTAS.

However, as Michael Ellerman noticed, this fix was not complete, the
token value was not byte swapped. This lead to call an unexpected and
most of the time unexisting RTAS function, which is silently ignored by
RTAS.

This fix addresses this hole.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-19 09:58:16 +11:00
Greg Kroah-Hartman 3542f6b183 Merge 3.19-rc5 into char-misc-next
We want the 3.19-rc5 fixes in here for our testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-19 06:56:57 +08:00
Yinghai Lu 3ebfe46ac7 powerpc/PCI: Clip bridge windows to fit in upstream windows
Every PCI-PCI bridge window should fit inside an upstream bridge window
because orphaned address space is unreachable from the primary side of the
upstream bridge.  If we inherit invalid bridge windows that overlap an
upstream window from firmware, clip them to fit and update the bridge
accordingly.

[bhelgaas: changelog]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Gavin Shan <gwshan@linux.vnet.ibm.com>
CC: Anton Blanchard <anton@samba.org>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: Andrew Murray <amurray@embedded-bits.co.uk>
CC: linuxppc-dev@lists.ozlabs.org
2015-01-16 10:04:43 -06:00
Jens Axboe d4119ee0e1 Merge branch 'for-3.20/core' into for-3.20/drivers 2015-01-13 21:58:45 -07:00
Matthew Wilcox dd22f551ac block: Change direct_access calling convention
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.

Add a new helper function, bdev_direct_access(), to handle common
functionality including partition handling, checking the length requested
is positive, checking for the sector being page-aligned, and checking
the length of the request does not pass the end of the partition.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-13 21:58:11 -07:00
Matthew Wilcox 91117a2024 axonram: Fix bug in direct_access
The 'pfn' returned by axonram was completely bogus, and has been since
2008.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-13 21:57:56 -07:00
Mathias Krause 3e14dcf7cb crypto: add missing crypto module aliases
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.

Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.

Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.

Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-01-13 22:29:11 +11:00
John Ogness fbc4a8a857 uio: uio_fsl_elbc_gpcm: new driver
This driver provides UIO access to memory of a peripheral connected
to the Freescale enhanced local bus controller (eLBC) interface
using the general purpose chip-select mode (GPCM).

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-12 05:04:13 -08:00
Michael Ellerman a87e810f61 powerpc: Work around gcc bug in current_thread_info()
In commit a3e5b356b3 "powerpc: Don't use local named register variable
in current_thread_info" Anton changed the way we did current_thread_info()
to accommodate LLVM, and it was not meant to have any effect elsewhere.

Unfortunately it has exposed a gcc bug, where r1 gets copied into
another register and then gcc uses that register to restore the toc
after a function call, even when that register is volatile and has been
clobbered by the function call.

We could revert Anton's patch, but it's not clear the original code is
safe either, we may just have been lucky.

The cleanest solution is to just use the existing CURRENT_THREAD_INFO()
asm macro, and call it using inline asm.

Segher points out we don't need volatile on the asm, if the result of
the shift is unused it's fine for the compiler to elide it.

Fixes: a3e5b356b3 ("powerpc: Don't use local named register variable in current_thread_info")
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-12 16:40:02 +11:00
Anton Blanchard bfe5fda8e7 powernv: Fix OPAL tracepoint code
Patch c49f63530b ("powernv: Add OPAL tracepoints") has a spurious
store to the stack:

	ld      r12,opal_tracepoint_refcount@toc(r2);           \
	std     r12,32(r1);                                     \

The store was originally used to save the current tracepoint status
so the entry and the exit tracepoints were always balanced. In the
end I just created a separate path when tracepoints are enabled.

The offset on the stack used for this store is not valid for ABIv2
and it causes strange issues. I noticed it because OPAL console input
was broken.

Fixes: c49f63530b ("powernv: Add OPAL tracepoints")
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-12 16:40:02 +11:00
Masahiro Yamada 665d92e38f kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion
The macros cc-version, cc-fullversion and ld-version take no argument.
It is not necessary to add $(call ...) to invoke them.

Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Signed-off-by: Michal Marek <mmarek@suse.cz>
2015-01-09 17:25:44 +01:00
Pranith Kumar 83fe27ea53 rcu: Make SRCU optional by using CONFIG_SRCU
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.

The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.

If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

   text    data     bss     dec     hex filename
   2007       0       0    2007     7d7 kernel/rcu/srcu.o

Size of arch/powerpc/boot/zImage changes from

   text    data     bss     dec     hex filename
 831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
 829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after

so the savings are about ~2000 bytes.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
2015-01-06 11:04:29 -08:00
Michael Ellerman 1be6f10f6f Revert "powerpc: Secondary CPUs must set cpu_callin_map after setting active and online"
This reverts commit 7c5c92ed56.

Although this did fix the bug it was aimed at, it also broke secondary
startup on platforms that use give/take_timebase(). Unfortunately we
didn't detect that while it was in next.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:51:51 +11:00
Michael Ellerman 9a4fc4eaf1 powerpc/kvm: Create proper names for the kvm_host_state PMU fields
We have two arrays in kvm_host_state that contain register values for
the PMU. Currently we only create an asm-offsets symbol for the base of
the arrays, and do the array offset in the assembly code.

Creating an asm-offsets symbol for each field individually makes the
code much nicer to read, particularly for the MMCRx/SIxR/SDAR fields, and
might have helped us notice the recent double restore bug we had in this
code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Alexander Graf <agraf@suse.de>
2014-12-29 15:45:55 +11:00
Andreas Ruprecht 803d57de2b powerpc/lib: Do not include string.o in obj-y twice
In the Makefile, string.o (which is generated from string.S) is
included into the list of objects being built unconditionally
(obj-y) in line 12.

Additionally, if CONFIG_PPC64 is set, it is included again in
line 17.

This patch removes the latter unnecessary inclusion.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:55 +11:00
Paul Bolle 01b14505c3 powerpc/44x/Akebono: Remove select of IBM_EMAC_RGMII_WOL
Commit 2a2c74b2ef ("IBM Akebono: Add the Akebono platform") added a
select of IBM_EMAC_RGMII_WOL. But that Kconfig symbol isn't (yet) part
of the tree. So this select has been a nop since that commit was
included in v3.16-rc1.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:44 +11:00
Wei Yongjun e144d4edbc powerpc/4xx: Fix return value check in hsta_msi_probe()
In case of error, the function ioremap() returns NULL
not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:43 +11:00
Vincent Bernat 05b981f493 powerpc/xmon: use isspace/isxdigit/isalnum from linux/ctype.h
isxdigit() macro definition is the same.

isalnum() from linux/ctype.h will accept additional latin non-ASCII
characters. This is harmless since this macro is used in scanhex() which
parses user input.

isspace() from linux/ctype.h will accept vertical tab and form feed but
not NULL. The use of this macro is modified to accept NULL as
well. Additional characters are harmless since this macro is also only
used in scanhex().

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:43 +11:00
Cody P Schafer e3a8446a6a powerpc/pseries: relocate "config DTL" so kconfig nests properly
Moving config DTL up so it is below config PPC_SPLPAR means that
menuconfig will show config DTL nicely indented right below config
PPC_SPLPAR when PPC_SPLPAR is enabled.

To contrast that, right now if I enable PPC_SPLPAR in menuconfig, all I
can immediately tell is that "something showed up further down the list
where I wasn't looking", and I end up having to toggle the option a few
times to figure out what showed up, or look at the KConfig to find out
that config DTL depends on config PPC_SPLPAR.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:45:42 +11:00
Hari Bathini c1caae3de4 powerpc/kdump: Ignore failure in enabling big endian exception during crash
In LE kernel, we currently have a hack for kexec that resets the exception
endian before starting a new kernel as the kernel that is loaded could be a
big endian or a little endian kernel. In kdump case, resetting exception
endian fails when one or more cpus is disabled. But we can ignore the failure
and still go ahead, as in most cases crashkernel will be of same endianess
as primary kernel and reseting endianess is not even needed in those cases.
This patch adds a new inline function to say if this is kdump path. This
function is used at places where such a check is needed.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Rename to kdump_in_progress(), use bool, and edit comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:44:53 +11:00
Pranith Kumar 1e5d0fdb5b powerpc: Wire up sys_execveat() syscall
Wire up sys_execveat(). This passes the selftests for the system call.

Check success of execveat(3, '../execveat', 0)... [OK]
Check success of execveat(5, 'execveat', 0)... [OK]
Check success of execveat(6, 'execveat', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
Check success of execveat(99, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
Check success of execveat(8, '', 4096)... [OK]
Check success of execveat(17, '', 4096)... [OK]
Check success of execveat(9, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(14, '', 4096)... [OK]
Check success of execveat(15, '', 4096)... [OK]
Check failure of execveat(8, '', 0) with ENOENT... [OK]
Check failure of execveat(8, '(null)', 4096) with EFAULT... [OK]
Check success of execveat(5, 'execveat.symlink', 0)... [OK]
Check success of execveat(6, 'execveat.symlink', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...xec/execveat.symlink', 0)... [OK]
Check success of execveat(10, '', 4096)... [OK]
Check success of execveat(10, '', 4352)... [OK]
Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(6, 'execveat.symlink', 256) with ELOOP... [OK]
Check failure of execveat(-100, '/home/pranith/linux/tools/testing/selftests/exec/execveat.symlink', 256) with ELOOP... [OK]
Check success of execveat(3, '../script', 0)... [OK]
Check success of execveat(5, 'script', 0)... [OK]
Check success of execveat(6, 'script', 0)... [OK]
Check success of execveat(-100, '/home/pranith/linux/...elftests/exec/script', 0)... [OK]
Check success of execveat(13, '', 4096)... [OK]
Check success of execveat(13, '', 4352)... [OK]
Check failure of execveat(18, '', 4096) with ENOENT... [OK]
Check failure of execveat(7, 'script', 0) with ENOENT... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(16, '', 4096)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check success of execveat(4, 'script', 0)... [OK]
Check success of execveat(4, '../script', 0)... [OK]
Check failure of execveat(4, 'script', 0) with ENOENT... [OK]
Check failure of execveat(5, 'execveat', 65535) with EINVAL... [OK]
Check failure of execveat(5, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(6, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(-100, 'no-such-file', 0) with ENOENT... [OK]
Check failure of execveat(5, '', 4096) with EACCES... [OK]
Check failure of execveat(5, 'Makefile', 0) with EACCES... [OK]
Check failure of execveat(11, '', 4096) with EACCES... [OK]
Check failure of execveat(12, '', 4096) with EACCES... [OK]
Check failure of execveat(99, '', 4096) with EBADF... [OK]
Check failure of execveat(99, 'execveat', 0) with EBADF... [OK]
Check failure of execveat(8, 'execveat', 0) with ENOTDIR... [OK]
Invoke copy of 'execveat' via filename of length 4093:
Check success of execveat(19, '', 4096)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
Invoke copy of 'script' via filename of length 4093:
Check success of execveat(20, '', 4096)... [OK]
/bin/sh: 0: Can't open /dev/fd/5/xxxxxxx(... a long line of x's and y's, 0)... [OK]
Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]

Tested on a 32-bit powerpc system.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-29 15:44:53 +11:00
Linus Torvalds 5d6a546886 CONFIG_PM_RUNTIME elimination for 3.19-rc1
This removes the last few uses of CONFIG_PM_RUNTIME introduced
 recently and makes that config option finally go away.
 
 CONFIG_PM will be available directly from the menu now and
 also it will be selected automatically if CONFIG_SUSPEND or
 CONFIG_HIBERNATION is set.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUlYaPAAoJEILEb/54YlRx/SoP/2wYioGzBhOCYfHw6fZF8zrP
 rotQ86sakhvSHre8K9QyFjvsA9wJ0CaTJF46YKZuHFhqU+IJZ7aXvNdEM1hK214J
 Mf3L2AcbcdnXioAN+HpeZhQklp2qHe84YkVXBqsFD6kb/qUNV2LSjy6nKEUdY3jW
 6KL2f3RgF/LDjTdedujJgcCYwMBwfX4B7U42BG4NQQ8z3wCV+imJgzNDrR5nNlqK
 xu8ab8hO1Gi3msOJxS0y4MN6VTUpYOvQKhSyM9ErcB2ibclAdmcivKuFAz6gy5U7
 PyDfYo/P3mXjMRBFb9fLqGtRcfstsnxPPSeKwp236tIQFX19Bj76UVUMJoUlXJP5
 /f55/P7mCascg74ZZC4GiD/BSCRdqwInCsFMzqAfSq2NciKzeS6W7Mhd9VTLKDpl
 5kqE39imUjZyps7/QqkfWskzB7Puhmqk3ZgTq2yAd4uQTpV7xlJYcnvr4oHCmAia
 SsLdYOqMQzWr3qyz2f5cOqPAvOo3/Xk/HHfTOCHW/4L+Ov+C921/f3d5GnxX9Ha+
 ucRaMp9j5FPYVwFaFkczAMNF2Eanq+Fupa3e6XUNNbYdchFqT9obnHZbVKyvswjR
 vdGAYAjP/cLzIH9ETDCCXCRvBRw5pzeelDgvDPjPdmPjndHXG8WViyTIEyLL4+1i
 BENtc/SUw3pZ7iNlGO78
 =QnSO
 -----END PGP SIGNATURE-----

Merge tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull CONFIG_PM_RUNTIME elimination from Rafael Wysocki:
 "This removes the last few uses of CONFIG_PM_RUNTIME introduced
  recently and makes that config option finally go away.

  CONFIG_PM will be available directly from the menu now and also it
  will be selected automatically if CONFIG_SUSPEND or CONFIG_HIBERNATION
  is set"

* tag 'pm-config-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: Eliminate CONFIG_PM_RUNTIME
  tty: 8250_omap: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  sound: sst-haswell-pcm: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
2014-12-20 13:37:44 -08:00
Rafael J. Wysocki 464ed18ebd PM: Eliminate CONFIG_PM_RUNTIME
Having switched over all of the users of CONFIG_PM_RUNTIME to use
CONFIG_PM directly, turn the latter into a user-selectable option
and drop the former entirely from the tree.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
2014-12-19 22:55:06 +01:00
Linus Torvalds 34b85e3574 powerpc updates for 3.19 batch 2
The highlight is the series that reworks the idle management on powernv, which
 allows us to use deeper idle states on those machines.
 
 There's the fix from Anton for the "BUG at kernel/smpboot.c:134!" problem.
 
 An i2c driver for powernv. This is acked by Wolfram Sang, and he asked that we
 take it through the powerpc tree.
 
 A fix for audit from rgb at Red Hat, acked by Paul Moore who is one of the audit
 maintainers.
 
 A patch from Ben to export the symbol map of our OPAL firmware as a sysfs file,
 so that tools can use it.
 
 Also some CXL fixes, a couple of powerpc perf fixes, a fix for smt-enabled, and
 the patch to add __force to get_user() so we can use bitwise types.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUk+oCAAoJEFHr6jzI4aWADBAP/i/CJ+cu6o4mzNDdfs8bnxqn
 RGZCSV+SrkTZPcoLbLiM9iaqq34ORVIn7hwkhkTz2/koluMVfTsqtVulMoFf+hVd
 GTVt81MjMFzA3hM3bXEV58KRT79+64K54dLCe0F7OaD6f4AikKR4LLz/PY0EBMiZ
 2h13uQlfglaMeYTsaD9eeUpIIKs7+PwsNqUknmN9We07WWfxWqnRpiTR4TYTMXx4
 3lQPvCnnHokwDqjuKgwiqDVSaCfCl8laS1i+BPk0G0aRV1AnPDvR3MhgVb2IpNxX
 Joxy2D1HSawwDhqHOsId8dkGZXOM4vzo+Y658qnC1XfThqE0MhA+kCfa5/b6xlOR
 K7nDO5A41B6nXB3mMOQh/szTXSIa8KJRTR3ibbJJrMdF6F0TN0JLLQNUcmM4j/5D
 vvgZEzvFNZhWX98ktlQLde2E4ClWJg6mWESCGSgJeVjIXaxe/6GneIa8vLKm5QMu
 OoykNsASMDGqddYMGoYeX/mSsvjPjK0PDO2q19sPbkP8xpyDLx6J8xo+5hO4l8xc
 0Cdb38ECfeno+w5oKAnjidHnz0KYBsuYFLeS+rV0b8sUSWAzfdEjSn2AVIQ8gLOv
 IOCAqwZ5tL9EcUs+AKru5EHtBEV+2XB54xPRxfdFS/k+vYRE7MpS3ipxveIynN2l
 eRxf9hsSO7ASNDd0b3ID
 =GXdK
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull second batch of powerpc updates from Michael Ellerman:
 "The highlight is the series that reworks the idle management on
  powernv, which allows us to use deeper idle states on those machines.

  There's the fix from Anton for the "BUG at kernel/smpboot.c:134!"
  problem.

  An i2c driver for powernv.  This is acked by Wolfram Sang, and he
  asked that we take it through the powerpc tree.

  A fix for audit from rgb at Red Hat, acked by Paul Moore who is one of
  the audit maintainers.

  A patch from Ben to export the symbol map of our OPAL firmware as a
  sysfs file, so that tools can use it.

  Also some CXL fixes, a couple of powerpc perf fixes, a fix for
  smt-enabled, and the patch to add __force to get_user() so we can use
  bitwise types"

* tag 'powerpc-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/powernv: Ignore smt-enabled on Power8 and later
  powerpc/uaccess: Allow get_user() with bitwise types
  powerpc/powernv: Expose OPAL firmware symbol map
  powernv/powerpc: Add winkle support for offline cpus
  powernv/cpuidle: Redesign idle states management
  powerpc/powernv: Enable Offline CPUs to enter deep idle states
  powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
  i2c: Driver to expose PowerNV platform i2c busses
  powerpc: add little endian flag to syscall_get_arch()
  power/perf/hv-24x7: Use kmem_cache_free() instead of kfree
  powerpc/perf/hv-24x7: Use per-cpu page buffer
  cxl: Unmap MMIO regions when detaching a context
  cxl: Add timeout to process element commands
  cxl: Change contexts_lock to a mutex to fix sleep while atomic bug
  powerpc: Secondary CPUs must set cpu_callin_map after setting active and online
2014-12-19 12:57:45 -08:00
Linus Torvalds 66dcff86ba 3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-assisted
 virtualization on the PPC970
 - ARM, PPC, s390 all had only small fixes
 
 For x86:
 - small performance improvements (though only on weird guests)
 - usual round of hardware-compliancy fixes from Nadav
 - APICv fixes
 - XSAVES support for hosts and guests.  XSAVES hosts were broken because
 the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
 ABI whenever XSAVES was enabled; hence, this part is going to stable.
 Guest support is just a matter of exposing the feature and CPUID leaves
 support.
 
 Right now KVM is broken for PPC BookE in your tree (doesn't compile).
 I'll reply to the pull request with a patch, please apply it either
 before the pull request or in the merge commit, in order to preserve
 bisectability somewhat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
 lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
 yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
 DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
 MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
 =Zwq1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "3.19 changes for KVM:

   - spring cleaning: removed support for IA64, and for hardware-
     assisted virtualization on the PPC970

   - ARM, PPC, s390 all had only small fixes

  For x86:
   - small performance improvements (though only on weird guests)
   - usual round of hardware-compliancy fixes from Nadav
   - APICv fixes
   - XSAVES support for hosts and guests.  XSAVES hosts were broken
     because the (non-KVM) XSAVES patches inadvertently changed the KVM
     userspace ABI whenever XSAVES was enabled; hence, this part is
     going to stable.  Guest support is just a matter of exposing the
     feature and CPUID leaves support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
  KVM: move APIC types to arch/x86/
  KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
  KVM: PPC: Book3S HV: Improve H_CONFER implementation
  KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
  KVM: PPC: Book3S HV: Remove code for PPC970 processors
  KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
  KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
  arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
  arch: powerpc: kvm: book3s_pr.c: Remove unused function
  arch: powerpc: kvm: book3s.c: Remove some unused functions
  arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
  KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
  KVM: PPC: Book3S HV: ptes are big endian
  KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
  KVM: PPC: Book3S HV: Fix KSM memory corruption
  KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
  KVM: PPC: Book3S HV: Fix computation of tlbie operand
  KVM: PPC: Book3S HV: Add missing HPTE unlock
  KVM: PPC: BookE: Improve irq inject tracepoint
  arm/arm64: KVM: Require in-kernel vgic for the arch timers
  ...
2014-12-18 16:05:28 -08:00
Alexander Graf 91ed9e8a32 KVM: PPC: E500: Compile fix in this_cpu_write
Commit 69111bac42 ("powerpc: Replace __get_cpu_var uses") introduced
compile breakage to the e500 target by introducing invalid automatically
created C syntax.

Fix up the breakage and make the code compile again.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18 16:02:26 -08:00
Greg Kurz d70a54e2d0 powerpc/powernv: Ignore smt-enabled on Power8 and later
Starting with POWER8, the subcore logic relies on all threads of a core
being booted so that they can participate in split mode switches. So on
those machines we ignore the smt_enabled_at_boot setting (smt-enabled on
the kernel command line).

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
[mpe: Update comment and change log to be more precise]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-18 19:59:21 +11:00
Michael S. Tsirkin 505e428374 powerpc/uaccess: Allow get_user() with bitwise types
At the moment, if p and x are both of the same bitwise type
(eg. __le32), get_user(x, p) produces a sparse warning.

This is because *p is loaded into a long then cast back to typeof(*p).

When typeof(*p) is a bitwise type (which is uncommon), such a cast needs
__force, otherwise sparse produces a warning.

For non-bitwise types __force should have no effect, and should not hide
any legitimate errors.

Note that we are casting to typeof(*p) not typeof(x). Even with the
cast, if x and *p are of different types we should get the warning, so I
think we are not loosing the ability to detect any actual errors.

virtio would like to use bitwise types with get_user() so fix these
spurious warnings by adding __force.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[mpe: Fill in changelog with more details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-18 19:11:09 +11:00
Anton Blanchard 476ce5ef09 KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
The in-kernel XICS emulation is faster than doing it all in QEMU
and it has got a lot of testing, so enable it by default.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 22:23:22 +01:00
Sam Bobroff 90fd09f804 KVM: PPC: Book3S HV: Improve H_CONFER implementation
Currently the H_CONFER hcall is implemented in kernel virtual mode,
meaning that whenever a guest thread does an H_CONFER, all the threads
in that virtual core have to exit the guest.  This is bad for
performance because it interrupts the other threads even if they
are doing useful work.

The H_CONFER hcall is called by a guest VCPU when it is spinning on a
spinlock and it detects that the spinlock is held by a guest VCPU that
is currently not running on a physical CPU.  The idea is to give this
VCPU's time slice to the holder VCPU so that it can make progress
towards releasing the lock.

To avoid having the other threads exit the guest unnecessarily,
we add a real-mode implementation of H_CONFER that checks whether
the other threads are doing anything.  If all the other threads
are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
it returns H_TOO_HARD which causes a guest exit and allows the
H_CONFER to be handled in virtual mode.

Otherwise it spins for a short time (up to 10 microseconds) to give
other threads the chance to observe that this thread is trying to
confer.  The spin loop also terminates when any thread exits the guest
or when all other threads are idle or trying to confer.  If the
timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
guest VCPU will recheck the spinlock word and most likely call
H_CONFER again.

This also improves the implementation of the H_CONFER virtual mode
handler.  If the VCPU is part of a virtual core (vcore) which is
runnable, there will be a 'runner' VCPU which has taken responsibility
for running the vcore.  In this case we yield to the runner VCPU
rather than the target VCPU.

We also introduce a check on the target VCPU's yield count: if it
differs from the yield count passed to H_CONFER, the target VCPU
has run since H_CONFER was called and may have already released
the lock.  This check is required by PAPR.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:53:39 +01:00
Paul Mackerras 4a157d61b4 KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register).  If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.

Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness.  The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.

To fix this, we define a new vcpu field to store the HEIR value.  Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:50:39 +01:00
Paul Mackerras c17b98cf60 KVM: PPC: Book3S HV: Remove code for PPC970 processors
This removes the code that was added to enable HV KVM to work
on PPC970 processors.  The PPC970 is an old CPU that doesn't
support virtualizing guest memory.  Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.

Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:44:03 +01:00
Suresh E. Warrier 3c78f78af9 KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
This patch adds trace points in the guest entry and exit code and also
for exceptions handled by the host in kernel mode - hypercalls and page
faults. The new events are added to /sys/kernel/debug/tracing/events
under a new subsystem called kvm_hv.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:29:27 +01:00
Paul Mackerras 2711e248a3 KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
Currently the calculations of stolen time for PPC Book3S HV guests
uses fields in both the vcpu struct and the kvmppc_vcore struct.  The
fields in the kvmppc_vcore struct are protected by the
vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
running the virtual core.  This works correctly but confuses lockdep,
because it sees that the code takes the tbacct_lock for a vcpu in
kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
vcore_stolen_time(), and it thinks there is a possibility of deadlock,
causing it to print reports like this:

=============================================
[ INFO: possible recursive locking detected ]
3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
---------------------------------------------
qemu-system-ppc/6188 is trying to acquire lock:
 (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]

but task is already holding lock:
 (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&vcpu->arch.tbacct_lock)->rlock);
  lock(&(&vcpu->arch.tbacct_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by qemu-system-ppc/6188:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm]
 #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv]
 #2:  (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]

stack backtrace:
CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89
Call Trace:
[c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable)
[c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190
[c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0
[c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70
[c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
[c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv]
[c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv]
[c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
[c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
[c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm]
[c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770
[c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0
[c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98

In order to make the locking easier to analyse, we change the code to
use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
preempt_tb fields.  This lock needs to be an irq-safe lock since it is
used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
functions, which are called with the scheduler rq lock held, which is
an irq-safe lock.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:20:09 +01:00
Rickard Strandqvist a0499cf746 arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
Remove the function inst_set_field() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:13:29 +01:00
Rickard Strandqvist 6178839b01 arch: powerpc: kvm: book3s_pr.c: Remove unused function
Remove the function get_fpr_index() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:13:16 +01:00
Rickard Strandqvist 54ca162a0c arch: powerpc: kvm: book3s.c: Remove some unused functions
Removes some functions that are not used anywhere:
kvmppc_core_load_guest_debugstate() kvmppc_core_load_host_debugstate()

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:12:42 +01:00
Rickard Strandqvist 24aaaf22ea arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
Remove the function sr_nx() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17 13:12:25 +01:00
Suresh E. Warrier 1bc5d59c35 KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
The kvmppc_vcore_blocked() code does not check for the wait condition
after putting the process on the wait queue. This means that it is
possible for an external interrupt to become pending, but the vcpu to
remain asleep until the next decrementer interrupt.  The fix is to
make one last check for pending exceptions and ceded state before
calling schedule().

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:25 +01:00
Cédric Le Goater ffada016fb KVM: PPC: Book3S HV: ptes are big endian
When being restored from qemu, the kvm_get_htab_header are in native
endian, but the ptes are big endian.

This patch fixes restore on a KVM LE host. Qemu also needs a fix for
this :

     http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg00008.html

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:24 +01:00
Suresh E. Warrier 5b88cda665 KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
This fixes some inaccuracies in the state machine for the virtualized
ICP when implementing the H_IPI hcall (Set_MFFR and related states):

1. The old code wipes out any pending interrupts when the new MFRR is
   more favored than the CPPR but less favored than a pending
   interrupt (by always modifying xisr and the pending_pri). This can
   cause us to lose a pending external interrupt.

   The correct code here is to only modify the pending_pri and xisr in
   the ICP if the MFRR is equal to or more favored than the current
   pending pri (since in this case, it is guaranteed that that there
   cannot be a pending external interrupt). The code changes are
   required in both kvmppc_rm_h_ipi and kvmppc_h_ipi.

2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check
   for whether MFRR is being made less favored AND further if new MFFR
   is also less favored than the current CPPR, we check for any
   resends pending in the ICP. These checks look like they are
   designed to cover the case where if the MFRR is being made less
   favored, we opportunistically trigger a resend of any interrupts
   that had been previously rejected. Although, this is not a state
   described by PAPR, this is an action we actually need to do
   especially if the CPPR is already at 0xFF.  Because in this case,
   the resend bit will stay on until another ICP state change which
   may be a long time coming and the interrupt stays pending until
   then. The current code which checks for MFRR < CPPR is broken when
   CPPR is 0xFF since it will not get triggered in that case.

   Ideally, we would want to do a resend only if

   	prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr

   where pending interrupt is the one that was rejected. But we don't
   have the priority of the pending interrupt state saved, so we
   simply trigger a resend whenever the MFRR is made less favored.

3. In kvmppc_rm_h_ipi, where we save state to pass resends to the
   virtual mode, we also need to save the ICP whose need_resend we
   reset since this does not need to be my ICP (vcpu->arch.icp) as is
   incorrectly assumed by the current code. A new field rm_resend_icp
   is added to the kvmppc_icp structure for this purpose.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:24 +01:00
Paul Mackerras b4a839009a KVM: PPC: Book3S HV: Fix KSM memory corruption
Testing with KSM active in the host showed occasional corruption of
guest memory.  Typically a page that should have contained zeroes
would contain values that look like the contents of a user process
stack (values such as 0x0000_3fff_xxxx_xxx).

Code inspection in kvmppc_h_protect revealed that there was a race
condition with the possibility of granting write access to a page
which is read-only in the host page tables.  The code attempts to keep
the host mapping read-only if the host userspace PTE is read-only, but
if that PTE had been temporarily made invalid for any reason, the
read-only check would not trigger and the host HPTE could end up
read-write.  Examination of the guest HPT in the failure situation
revealed that there were indeed shared pages which should have been
read-only that were mapped read-write.

To close this race, we don't let a page go from being read-only to
being read-write, as far as the real HPTE mapping the page is
concerned (the guest view can go to read-write, but the actual mapping
stays read-only).  When the guest tries to write to the page, we take
an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a
writable HPTE for the page.

This eliminates the occasional corruption of shared pages
that was previously seen with KSM active.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:24 +01:00
Mahesh Salgaonkar dee6f24c33 KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
When we get an HMI (hypervisor maintenance interrupt) while in a
guest, we see that guest enters into paused state.  The reason is, in
kvmppc_handle_exit_hv it falls through default path and returns to
host instead of resuming guest.  This causes guest to enter into
paused state.  HMI is a hypervisor only interrupt and it is safe to
resume the guest since the host has handled it already.  This patch
adds a switch case to resume the guest.

Without this patch we see guest entering into paused state with following
console messages:

[ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered]
[ 3003.329356]  Error detail: Timer facility experienced an error
[ 3003.329359] 	HMER: 0840000000000000
[ 3003.329360] 	TFMR: 4a12000980a84000
[ 3003.329366] vcpu c0000007c35094c0 (40):
[ 3003.329368] pc  = c0000000000c2ba0  msr = 8000000000009032  trap = e60
[ 3003.329370] r 0 = c00000000021ddc0  r16 = 0000000000000046
[ 3003.329372] r 1 = c00000007a02bbd0  r17 = 00003ffff27d5d98
[ 3003.329375] r 2 = c0000000010980b8  r18 = 00001fffffc9a0b0
[ 3003.329377] r 3 = c00000000142d6b8  r19 = c00000000142d6b8
[ 3003.329379] r 4 = 0000000000000002  r20 = 0000000000000000
[ 3003.329381] r 5 = c00000000524a110  r21 = 0000000000000000
[ 3003.329383] r 6 = 0000000000000001  r22 = 0000000000000000
[ 3003.329386] r 7 = 0000000000000000  r23 = c00000000524a110
[ 3003.329388] r 8 = 0000000000000000  r24 = 0000000000000001
[ 3003.329391] r 9 = 0000000000000001  r25 = c00000007c31da38
[ 3003.329393] r10 = c0000000014280b8  r26 = 0000000000000002
[ 3003.329395] r11 = 746f6f6c2f68656c  r27 = c00000000524a110
[ 3003.329397] r12 = 0000000028004484  r28 = c00000007c31da38
[ 3003.329399] r13 = c00000000fe01400  r29 = 0000000000000002
[ 3003.329401] r14 = 0000000000000046  r30 = c000000003011e00
[ 3003.329403] r15 = ffffffffffffffba  r31 = 0000000000000002
[ 3003.329404] ctr = c00000000041a670  lr  = c000000000272520
[ 3003.329405] srr0 = c00000000007e8d8 srr1 = 9000000000001002
[ 3003.329406] sprg0 = 0000000000000000 sprg1 = c00000000fe01400
[ 3003.329407] sprg2 = c00000000fe01400 sprg3 = 0000000000000005
[ 3003.329408] cr = 48004482  xer = 2000000000000000  dsisr = 42000000
[ 3003.329409] dar = 0000010015020048
[ 3003.329410] fault dar = 0000010015020048 dsisr = 42000000
[ 3003.329411] SLB (8 entries):
[ 3003.329412]   ESID = c000000008000000 VSID = 40016e7779000510
[ 3003.329413]   ESID = d000000008000001 VSID = 400142add1000510
[ 3003.329414]   ESID = f000000008000004 VSID = 4000eb1a81000510
[ 3003.329415]   ESID = 00001f000800000b VSID = 40004fda0a000d90
[ 3003.329416]   ESID = 00003f000800000c VSID = 400039f536000d90
[ 3003.329417]   ESID = 000000001800000d VSID = 0001251b35150d90
[ 3003.329417]   ESID = 000001000800000e VSID = 4001e46090000d90
[ 3003.329418]   ESID = d000080008000019 VSID = 40013d349c000400
[ 3003.329419] lpcr = c048800001847001 sdr1 = 0000001b19000006 last_inst = ffffffff
[ 3003.329421] trap=0xe60 | pc=0xc0000000000c2ba0 | msr=0x8000000000009032
[ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered]
[ 3003.329526]  Error detail: Timer facility experienced an error
[ 3003.329527] 	HMER: 0840000000000000
[ 3003.329527] 	TFMR: 4a12000980a94000
[ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered]
[ 3006.359792]  Error detail: Timer facility experienced an error
[ 3006.359795] 	HMER: 0840000000000000
[ 3006.359797] 	TFMR: 4a12000980a84000

 Id    Name                           State
----------------------------------------------------
 2     guest2                         running
 3     guest3                         paused
 4     guest4                         running

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:24 +01:00
Paul Mackerras d506735b1a KVM: PPC: Book3S HV: Fix computation of tlbie operand
The B (segment size) field in the RB operand for the tlbie
instruction is two bits, which we get from the top two bits of
the first doubleword of the HPT entry to be invalidated.  These
bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
bit numbering).

The compute_tlbie_rb() function gets these bits as v >> (62 - 8),
which is not correct as it will bring in the top 10 bits, not
just the top two.  These extra bits could corrupt the AP, AVAL
and L fields in the RB value.  To fix this we shift right 62 bits
and then shift left 8 bits, so we only get the two bits of the
B field.

The first doubleword of the HPT entry is under the control of the
guest kernel.  In fact, Linux guests will always put zeroes in bits
54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
this.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:23 +01:00
Aneesh Kumar K.V f6fb9e848c KVM: PPC: Book3S HV: Add missing HPTE unlock
In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the
next HPTE without unlocking the invalid one.  In fact we should never
find an invalid and unlocked HPTE in the rmap chain, but for robustness
we should unlock it.  This adds the missing unlock.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:23 +01:00
Alexander Graf b6b612571e KVM: PPC: BookE: Improve irq inject tracepoint
When injecting an IRQ, we only document which IRQ priority (which translates
to IRQ type) gets injected. However, when reading traces you don't necessarily
have all the numbers in your head to know which IRQ really is meant.

This patch converts the IRQ number field to a symbolic name that is in sync
with the respective define. That way it's a lot easier for readers to figure
out what interrupt gets injected.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15 13:27:23 +01:00
Benjamin Herrenschmidt c8742f8512 powerpc/powernv: Expose OPAL firmware symbol map
Newer versions of OPAL will provide this, so let's expose it to user
space so tools like perf can use it to properly decode samples in
firmware space.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 13:21:44 +11:00
Linus Torvalds e6b5be2be4 Driver core patches for 3.19-rc1
Here's the set of driver core patches for 3.19-rc1.
 
 They are dominated by the removal of the .owner field in platform
 drivers.  They touch a lot of files, but they are "simple" changes, just
 removing a line in a structure.
 
 Other than that, a few minor driver core and debugfs changes.  There are
 some ath9k patches coming in through this tree that have been acked by
 the wireless maintainers as they relied on the debugfs changes.
 
 Everything has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
 53kAoLeteByQ3iVwWurwwseRPiWa8+MI
 =OVRS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Shreyas B. Prabhu 77b54e9f21 powernv/powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.

But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.

Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.

With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:41 +11:00
Shreyas B. Prabhu 7cba160ad7 powernv/cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.

The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.

This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:40 +11:00
Shreyas B. Prabhu 8eb8ac89a3 powerpc/powernv: Enable Offline CPUs to enter deep idle states
The secondary threads should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.

Since the  device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.

Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.

They will be taken care of by the core powernv code.

Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:40 +11:00
Paul Mackerras 8117ac6a6c powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on.  This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context.  If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.

This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.

Cc: stable@vger.kernel.org
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15 10:46:32 +11:00
Neelesh Gupta 470834508f i2c: Driver to expose PowerNV platform i2c busses
The patch exposes the available i2c busses on the PowerNV platform
to the kernel and implements the bus driver to support i2c and
smbus commands.
The driver uses the platform device infrastructure to probe the busses
on the platform and registers them with the i2c driver framework.

Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-14 12:44:46 +11:00
Linus Torvalds e3aa91a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - The crypto API is now documented :)
 - Disallow arbitrary module loading through crypto API.
 - Allow get request with empty driver name through crypto_user.
 - Allow speed testing of arbitrary hash functions.
 - Add caam support for ctr(aes), gcm(aes) and their derivatives.
 - nx now supports concurrent hashing properly.
 - Add sahara support for SHA1/256.
 - Add ARM64 version of CRC32.
 - Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: tcrypt - Allow speed testing of arbitrary hash functions
  crypto: af_alg - add user space interface for AEAD
  crypto: qat - fix problem with coalescing enable logic
  crypto: sahara - add support for SHA1/256
  crypto: sahara - replace tasklets with kthread
  crypto: sahara - add support for i.MX53
  crypto: sahara - fix spinlock initialization
  crypto: arm - replace memset by memzero_explicit
  crypto: powerpc - replace memset by memzero_explicit
  crypto: sha - replace memset by memzero_explicit
  crypto: sparc - replace memset by memzero_explicit
  crypto: algif_skcipher - initialize upon init request
  crypto: algif_skcipher - removed unneeded code
  crypto: algif_skcipher - Fixed blocking recvmsg
  crypto: drbg - use memzero_explicit() for clearing sensitive data
  crypto: drbg - use MODULE_ALIAS_CRYPTO
  crypto: include crypto- module prefix in template
  crypto: user - add MODULE_ALIAS
  crypto: sha-mb - remove a bogus NULL check
  crytpo: qat - Fix 64 bytes requests
  ...
2014-12-13 13:33:26 -08:00
Linus Torvalds 78a45c6f06 Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of ->i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
2014-12-13 13:00:36 -08:00
Riku Voipio 957e3facd1 gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs
Following the suggestions from Andrew Morton and Stephen Rothwell,
Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
can enable.

set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
previously allowed + ARM64 which I tested.

Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00
Joonsoo Kim 031bc5743f mm/debug-pagealloc: make debug-pagealloc boottime configurable
Now, we have prepared to avoid using debug-pagealloc in boottime.  So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions.  Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Linus Torvalds f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Linus Torvalds a7cb7bb664 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree update from Jiri Kosina:
 "Usual stuff: documentation updates, printk() fixes, etc"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  intel_ips: fix a type in error message
  cpufreq: cpufreq-dt: Move newline to end of error message
  ps3rom: fix error return code
  treewide: fix typo in printk and Kconfig
  ARM: dts: bcm63138: change "interupts" to "interrupts"
  Replace mentions of "list_struct" to "list_head"
  kernel: trace: fix printk message
  scsi: mpt2sas: fix ioctl in comment
  zbud, zswap: change module author email
  clocksource: Fix 'clcoksource' typo in comment
  arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help
  gpio: msm-v1: make boolean argument more obvious
  usb: Fix typo in usb-serial-simple.c
  PCI: Fix comment typo 'COMFIG_PM_OPS'
  powerpc: Fix comment typo 'CONIFG_8xx'
  powerpc: Fix comment typos 'CONFiG_ALTIVEC'
  clk: st: Spelling s/stucture/structure/
  isci: Spelling s/stucture/structure/
  usb: gadget: zero: Spelling s/infrastucture/infrastructure/
  treewide: Fix company name in module descriptions
  ...
2014-12-12 10:08:06 -08:00
Richard Guy Briggs 63f13448d8 powerpc: add little endian flag to syscall_get_arch()
Since both ppc and ppc64 have LE variants which are now reported by uname, add
that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE
variant.

Without this,  perf trace and auditctl fail.

Mainline kernel reports ppc64le (per a058801) but there is no matching
AUDIT_ARCH_PPC64LE.

Since 32-bit PPC LE is not supported by audit, don't advertise it in
AUDIT_ARCH_PPC* variants.

See:
	https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html
	https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12 20:14:08 +11:00
Sukadev Bhattiprolu ec2aef5a8d power/perf/hv-24x7: Use kmem_cache_free() instead of kfree
Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12 16:06:13 +11:00
sukadev@linux.vnet.ibm.com f34b6c72c3 powerpc/perf/hv-24x7: Use per-cpu page buffer
The 24x7 counters are continuously running and not updated on an
interrupt. So we record the event counts when stopping the event or
deleting it.

But to "read" a single counter in 24x7, we allocate a page and pass it
into the hypervisor (The HV returns the page full of counters from which
we extract the specific counter for this event).

We allocate a page using GFP_USER and when deleting the event, we end up
with the following warning because we are blocking in interrupt context.

  [  698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000

We could use GFP_ATOMIC but that could result in failures. Pre-allocate
a buffer so we don't have to allocate in interrupt context. Further as
Michael Ellerman suggested, use Per-CPU buffer so we only need to
allocate once per CPU.

Cc: stable@vger.kernel.org
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12 16:05:18 +11:00
Alexander Duyck 1077fa36f2 arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier   Call     Explanation
  --------- -------- ----------------------------------
  rmb()     dsb()    Data synchronization barrier - system
  dma_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb() dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no dma_mb().  Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb().  As such there isn't much to
be gained in trying to define such a function.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:06 -05:00
Alexander Duyck 8a44971841 arch: Cleanup read_barrier_depends() and comments
This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:05 -05:00
Linus Torvalds 140cd7fb04 powerpc updates for 3.19
Some nice cleanups like removing bootmem, and removal of __get_cpu_var().
 
 There is one patch to mm/gup.c. This is the generic GUP implementation, but is
 only used by us and arm(64). We have an ack from Steve Capper, and although we
 didn't get an ack from Andrew he told us to take the patch through the powerpc
 tree.
 
 There's one cxl patch. This is in drivers/misc, but Greg said he was happy for
 us to manage fixes for it.
 
 There is an infrastructure patch to support an IPMI driver for OPAL. That patch
 also appears in Corey Minyard's IPMI tree, you may see a conflict there.
 
 There is also an RTC driver for OPAL. We weren't able to get any response from
 the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver.
 
 The usual batch of Freescale updates from Scott.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUiSTSAAoJEFHr6jzI4aWAirQP/3rIEng0LzLu5kW2zkGylIaM
 SNDum1vze3mHiTFl+CFcSIGpC1UEULoB49HA+2oE/ExKpIceG6lpL2LP+wNh2FW5
 mozjMjS6mZt4w1Fu1D2ZtgQc3O1T1pxkqsnZmPa8gVf5k5d5IQNPY6yB0pgVWwbV
 gwBKxe4VwPAzJjppE9i9MDhNTJwmHZq0lI8XuoTXOOU/f+4G1WxmjrbyveQ7cRP5
 i/sq2cKjxpWA+KDeIXo0GR0DpXR7qMeAvFX5xXY7oKuUJIFDM4kSHfmMYP6qLf5c
 2vlsJqHVqfOgQdve41z1ooaPzNtg7ezVo+VqqguSgtSgwy2JUo/uHpnzz3gD1Olo
 AP5+6xj8LZac0rTPxF4n4Hoyrp7AaaFjEFt1zqT9PWniZW4B41wtia0QORBNUf1S
 UEmKAC9T3WZJ47mH7WMSadtOPF9E3Yd/zuiPD4udtptCNKPbr6/k1MpJPIW2D4Rn
 BJ0QZTRd7V0yRofXxZtHxaMxq8pWd/Tip7J/zr/ghz+ulnH8BuFamuhCCLuJlESU
 +A2PMfuseyTMpH9sMAmmTwSGPDKjaUFWvmFvY/n88NZL7r2LlomNrDWFSSQOIHUP
 FxjYmjUMpZeexsfyRdgFV/INhYC3o3cso2fRGO45YK6nkxNnjNFEBS6WhQLvNLBu
 sknd1WjXkuJtoMC15SrQ
 =jvyT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:
 "Some nice cleanups like removing bootmem, and removal of
  __get_cpu_var().

  There is one patch to mm/gup.c.  This is the generic GUP
  implementation, but is only used by us and arm(64).  We have an ack
  from Steve Capper, and although we didn't get an ack from Andrew he
  told us to take the patch through the powerpc tree.

  There's one cxl patch.  This is in drivers/misc, but Greg said he was
  happy for us to manage fixes for it.

  There is an infrastructure patch to support an IPMI driver for OPAL.

  There is also an RTC driver for OPAL.  We weren't able to get any
  response from the RTC maintainer, Alessandro Zummo, so in the end we
  just merged the driver.

  The usual batch of Freescale updates from Scott"

* tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (101 commits)
  powerpc/powernv: Return to cpu offline loop when finished in KVM guest
  powerpc/book3s: Fix partial invalidation of TLBs in MCE code.
  powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
  powerpc/xmon: Cleanup the breakpoint flags
  powerpc/xmon: Enable HW instruction breakpoint on POWER8
  powerpc/mm/thp: Use tlbiel if possible
  powerpc/mm/thp: Remove code duplication
  powerpc/mm/hugetlb: Sanity check gigantic hugepage count
  powerpc/oprofile: Disable pagefaults during user stack read
  powerpc/mm: Check for matching hpte without taking hpte lock
  powerpc: Drop useless warning in eeh_init()
  powerpc/powernv: Cleanup unused MCE definitions/declarations.
  powerpc/eeh: Dump PHB diag-data early
  powerpc/eeh: Recover EEH error on ownership change for BCM5719
  powerpc/eeh: Set EEH_PE_RESET on PE reset
  powerpc/eeh: Refactor eeh_reset_pe()
  powerpc: Remove more traces of bootmem
  powerpc/pseries: Initialise nvram_pstore_info's buf_lock
  cxl: Name interrupts in /proc/interrupt
  cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning
  ...
2014-12-11 17:48:14 -08:00
Linus Torvalds 70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds 7ef58b32f5 Devicetree changes for v3.19
Lots of activity in the devicetree code for v3.18. Most of it is related
 to getting all of the overlay support code in place, but there are other
 important things in there.
 
 There are a few trivial merge conflicts. They shouldn't give you any
 trouble.
 
 Highlights:
 - OF_RECONFIG notifiers for SPI, I2C and Platform devices. Those
   subsystems can now respond to live changes to the device tree.
 - CONFIG_OF_OVERLAY method for applying live changes to the device tree
 - Removal of the of_allnodes list. This used to be used to iterate over
   all the nodes in the device tree, but it is unnecessary because the
   same thing can be done by iterating over the list of child pointers.
   Getting rid of of_allnodes saves some memory and avoids the
   possibility of of_allnodes being sorted differently from the child
   lists.
 - Support for retrieving original DTB blob via sysfs. Needed by kexec.
 - More unittests
 - Documentation and minor bug fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUiaTJAAoJEMWQL496c2LNdKkP/1rk20JXzJc948Z3VFZPXkzf
 TUKXC+Qn0FmVjQhESkx6LxLDrMDTQlQLlWBmFuWRB87Fk5E32FEf5zzW7I9oQPS4
 msIqJoYf5T7EPlmJ/85156xjK5ezc0OyoKEizn23mcKrJE4bmXQEbVw99UUFhq4R
 Oz1a1ZPQQSSaMteKftOoRBiE3bJut3tJ3dfufNjwOuXi5rALJ0DVxuOeU/Hba13d
 t05qlImwocKXGBDd/B4psBI5fZl4Tf4AmGOD9aU7YHxrLg4jOCbvqies3DQQ0q3D
 o9YZBnuBw7A3tzJJ3F5KajRnFLazJBOV5BKGo7eYuTzT56mpZW/HF6eS9b1DbP9x
 4q71Vd5qhIuU9JsQAStfZ6pdx3FBXRNGpIXXfwzbCSdaePIuOKS17zvA/Iy5bWeA
 2TyqgMuKZwnXOXxQesMZJYIw2IEnIyobzh0A1wAnvReyos/nHF/tha/SA/Jutq1s
 +0gOkMlPW2EdpADmlfLPRSHgSqO8bfCPeNPihn672MS2dAv9H+XRLcoKuSNErhdl
 1gYtnR7IK+Sl0KmMC5YoMvXPchkV5YS2qEp1f3p+ZmgcMSWyHHKMtf8VwjNTaSBU
 e1AshH6HvmYEPt0cnntSMAxbw+N596QjkVp4RbHsLpyj7qeUVVY56/K/aiM7M69P
 BvJkuewrhsAxyM2X2OsD
 =ak0A
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux

Pull devicetree changes from Grant Likely:
 "Lots of activity in the devicetree code for v3.18.  Most of it is
  related to getting all of the overlay support code in place, but there
  are other important things in there.

  Highlights:

   - OF_RECONFIG notifiers for SPI, I2C and Platform devices.  Those
     subsystems can now respond to live changes to the device tree.

   - CONFIG_OF_OVERLAY method for applying live changes to the device
     tree

   - Removal of the of_allnodes list.  This used to be used to iterate
     over all the nodes in the device tree, but it is unnecessary
     because the same thing can be done by iterating over the list of
     child pointers.  Getting rid of of_allnodes saves some memory and
     avoids the possibility of of_allnodes being sorted differently from
     the child lists.

   - Support for retrieving original DTB blob via sysfs.  Needed by
     kexec.

   - More unittests

   - Documentation and minor bug fixes"

* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux: (42 commits)
  of: Delete unnecessary check before calling "of_node_put()"
  of: Drop ->next pointer from struct device_node
  spi: Check for spi_of_notifier when CONFIG_OF_DYNAMIC=y
  of: support passing console options with stdout-path
  of: add optional options parameter to of_find_node_by_path()
  of: Add bindings for chosen node, stdout-path
  of: Remove unneeded and incorrect MODULE_DEVICE_TABLE
  ARM: dt: fix up PL011 device tree bindings
  of: base, fix of_property_read_string_helper kernel-doc
  of: remove select of non-existant OF_DEVICE config symbol
  spi/of: Add OF notifier handler
  spi/of: Create new device registration method and accessors
  i2c/of: Add OF_RECONFIG notifier handler
  i2c/of: Factor out Devicetree registration code
  of/overlay: Add overlay unittests
  of/overlay: Introduce DT overlay support
  of/reconfig: Add OF_DYNAMIC notifier for platform_bus_type
  of/reconfig: Always use the same structure for notifiers
  of/reconfig: Add debug output for OF_RECONFIG notifiers
  of/reconfig: Add empty stubs for the of_reconfig methods
  ...
2014-12-11 13:06:58 -08:00
Linus Torvalds 1dd7dcb6ea There was a lot of clean ups and minor fixes. One of those clean ups was
to the trace_seq code. It also removed the return values to the
 trace_seq_*() functions and use trace_seq_has_overflowed() to see if
 the buffer filled up or not. This is similar to work being done to the
 seq_file code as well in another tree.
 
 Some of the other goodies include:
 
  o Added some "!" (NOT) logic to the tracing filter.
 
  o Fixed the frame pointer logic to the x86_64 mcount trampolines
 
  o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
    That is, the ftrace trampoline can be dynamically allocated
    and be called directly by functions that only have a single hook
    to them.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG
 Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk
 xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL
 WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv
 mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R
 Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY=
 =Z7Cm
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "There was a lot of clean ups and minor fixes.  One of those clean ups
  was to the trace_seq code.  It also removed the return values to the
  trace_seq_*() functions and use trace_seq_has_overflowed() to see if
  the buffer filled up or not.  This is similar to work being done to
  the seq_file code as well in another tree.

  Some of the other goodies include:

   - Added some "!" (NOT) logic to the tracing filter.

   - Fixed the frame pointer logic to the x86_64 mcount trampolines

   - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
     That is, the ftrace trampoline can be dynamically allocated and be
     called directly by functions that only have a single hook to them"

* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
  tracing: Truncated output is better than nothing
  tracing: Add additional marks to signal very large time deltas
  Documentation: describe trace_buf_size parameter more accurately
  tracing: Allow NOT to filter AND and OR clauses
  tracing: Add NOT to filtering logic
  ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
  ftrace/x86: Get rid of ftrace_caller_setup
  ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
  ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
  ftrace/x86: Simplify save_mcount_regs on getting RIP
  ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
  ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
  ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
  ftrace/x86: Have static tracing also use ftrace_caller_setup
  ftrace/x86: Have static function tracing always test for function graph
  kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
  ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
  kprobes/ftrace: Recover original IP if pre_handler doesn't change it
  tracing/trivial: Fix typos and make an int into a bool
  tracing: Deletion of an unnecessary check before iput()
  ...
2014-12-10 19:58:13 -08:00
Linus Torvalds b6da0076ba Merge branch 'akpm' (patchbomb from Andrew)
Merge first patchbomb from Andrew Morton:
 - a few minor cifs fixes
 - dma-debug upadtes
 - ocfs2
 - slab
 - about half of MM
 - procfs
 - kernel/exit.c
 - panic.c tweaks
 - printk upates
 - lib/ updates
 - checkpatch updates
 - fs/binfmt updates
 - the drivers/rtc tree
 - nilfs
 - kmod fixes
 - more kernel/exit.c
 - various other misc tweaks and fixes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  exit: pidns: fix/update the comments in zap_pid_ns_processes()
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  exit: exit_notify: re-use "dead" list to autoreap current
  exit: reparent: call forget_original_parent() under tasklist_lock
  exit: reparent: avoid find_new_reaper() if no children
  exit: reparent: introduce find_alive_thread()
  exit: reparent: introduce find_child_reaper()
  exit: reparent: document the ->has_child_subreaper checks
  exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
  exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
  exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
  exit: proc: don't try to flush /proc/tgid/task/tgid
  exit: release_task: fix the comment about group leader accounting
  exit: wait: drop tasklist_lock before psig->c* accounting
  exit: wait: don't use zombie->real_parent
  exit: wait: cleanup the ptrace_reparented() checks
  usermodehelper: kill the kmod_thread_locker logic
  usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
  fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  ...
2014-12-10 18:34:42 -08:00
Yann Droneaud 6b9cdf39c8 ppc/cell: replace get_unused_fd() with get_unused_fd_flags(0)
This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code start
using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either
by default or choosen by userspace.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:10 -08:00
Kirill A. Shutemov c164e038ee mm: fix huge zero page accounting in smaps report
As a small zero page, huge zero page should not be accounted in smaps
report as normal page.

For small pages we rely on vm_normal_page() to filter out zero page, but
vm_normal_page() is not designed to handle pmds.  We only get here due
hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not
necessary compatible on each and every architecture.

Let's add separate codepath to handle pmds.  follow_trans_huge_pmd() will
detect huge zero page for us.

We would need pmd_dirty() helper to do this properly.  The patch adds it
to THP-enabled architectures which don't yet have one.

[akpm@linux-foundation.org: use do_div to fix 32-bit build]
Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengwei Yin <yfw.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:08 -08:00
Linus Torvalds cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
Daniel Borkmann 0cb6c969ed net, lib: kill arch_fast_hash library bits
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:46 -05:00
Linus Torvalds 9e66645d72 Merge branch 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq domain updates from Thomas Gleixner:
 "The real interesting irq updates:

   - Support for hierarchical irq domains:

     For complex interrupt routing scenarios where more than one
     interrupt related chip is involved we had no proper representation
     in the generic interrupt infrastructure so far.  That made people
     implement rather ugly constructs in their nested irq chip
     implementations.  The main offenders are x86 and arm/gic.

     To distangle that mess we have now hierarchical irqdomains which
     seperate the various interrupt chips and connect them via the
     hierarchical domains.  That keeps the domain specific details
     internal to the particular hierarchy level and removes the
     criss/cross referencing of chip internals.  The resulting hierarchy
     for a complex x86 system will look like this:

        vector          mapped: 74
          msi-0         mapped: 2
          dmar-ir-1     mapped: 69
            ioapic-1    mapped: 4
            ioapic-0    mapped: 20
            pci-msi-2   mapped: 45
          dmar-ir-0     mapped: 3
            ioapic-2    mapped: 1
            pci-msi-1   mapped: 2
          htirq         mapped: 0

     Neither ioapic nor pci-msi know about the dmar interrupt remapping
     between themself and the vector domain.  If interrupt remapping is
     disabled ioapic and pci-msi become direct childs of the vector
     domain.

     In hindsight we should have done that years ago, but in hindsight
     we always know better :)

   - Support for generic MSI interrupt domain handling

     We have more and more non PCI related MSI interrupts, so providing
     a generic infrastructure for this is better than having all
     affected architectures implementing their own private hacks.

   - Support for PCI-MSI interrupt domain handling, based on the generic
     MSI support.

     This part carries the pci/msi branch from Bjorn Helgaas pci tree to
     avoid a massive conflict.  The PCI/MSI parts are acked by Bjorn.

  I have two more branches on top of this.  The full conversion of x86
  to hierarchical domains and a partial conversion of arm/gic"

* 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  genirq: Move irq_chip_write_msi_msg() helper to core
  PCI/MSI: Allow an msi_controller to be associated to an irq domain
  PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
  PCI/MSI: Enhance core to support hierarchy irqdomain
  PCI/MSI: Move cached entry functions to irq core
  genirq: Provide default callbacks for msi_domain_ops
  genirq: Introduce msi_domain_alloc/free_irqs()
  asm-generic: Add msi.h
  genirq: Add generic msi irq domain support
  genirq: Introduce callback irq_chip.irq_write_msi_msg
  genirq: Work around __irq_set_handler vs stacked domains ordering issues
  irqdomain: Introduce helper function irq_domain_add_hierarchy()
  irqdomain: Implement a method to automatically call parent domains alloc/free
  genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
  genirq: Split out flow handler typedefs into seperate header file
  genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
  genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
  genirq: Add more helper functions to support stacked irq_chip
  genirq: Introduce helper functions to support stacked irq_chip
  irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
  ...
2014-12-10 09:01:01 -08:00
Linus Torvalds a0e4467726 asm-generic: asm/io.h rewrite
While there normally is no reason to have a pull request for asm-generic
 but have all changes get merged through whichever tree needs them, I do
 have a series for 3.19. There are two sets of patches that change
 significant portions of asm/io.h, and this branch contains both in order
 to resolve the conflicts:
 
 - Will Deacon has done a set of patches to ensure that all architectures
   define {read,write}{b,w,l,q}_relaxed() functions or get them by
   including asm-generic/io.h. These functions are commonly used on ARM
   specific drivers to avoid expensive L2 cache synchronization implied by
   the normal {read,write}{b,w,l,q}, but we need to define them on all
   architectures in order to share the drivers across architectures and
   to enable CONFIG_COMPILE_TEST configurations for them
 
 - Thierry Reding has done an unrelated set of patches that extends
   the asm-generic/io.h file to the degree necessary to make it useful
   on ARM64 and potentially other architectures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj
 xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI
 ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX
 ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98
 uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb
 71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8
 +l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr
 erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2
 6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR
 HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ
 vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA
 6tROM77bwIQ=
 =xUv6
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
 "While there normally is no reason to have a pull request for
  asm-generic but have all changes get merged through whichever tree
  needs them, I do have a series for 3.19.

  There are two sets of patches that change significant portions of
  asm/io.h, and this branch contains both in order to resolve the
  conflicts:

   - Will Deacon has done a set of patches to ensure that all
     architectures define {read,write}{b,w,l,q}_relaxed() functions or
     get them by including asm-generic/io.h.

     These functions are commonly used on ARM specific drivers to avoid
     expensive L2 cache synchronization implied by the normal
     {read,write}{b,w,l,q}, but we need to define them on all
     architectures in order to share the drivers across architectures
     and to enable CONFIG_COMPILE_TEST configurations for them

   - Thierry Reding has done an unrelated set of patches that extends
     the asm-generic/io.h file to the degree necessary to make it useful
     on ARM64 and potentially other architectures"

* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
  ARM64: use GENERIC_PCI_IOMAP
  sparc: io: remove duplicate relaxed accessors on sparc32
  ARM: sa11x0: Use void __iomem * in MMIO accessors
  arm64: Use include/asm-generic/io.h
  ARM: Use include/asm-generic/io.h
  asm-generic/io.h: Implement generic {read,write}s*()
  asm-generic/io.h: Reconcile I/O accessor overrides
  /dev/mem: Use more consistent data types
  Change xlate_dev_{kmem,mem}_ptr() prototypes
  ARM: ixp4xx: Properly override I/O accessors
  ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
  ARM: ebsa110: Properly override I/O accessors
  ARC: Remove redundant PCI_IOBASE declaration
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  ...
2014-12-09 17:25:00 -08:00
Linus Torvalds 3a647c1d7a ARM: SoC driver updates for 3.19
These are changes for drivers that are intimately tied to some SoC
 and for some reason could not get merged through the respective
 subsystem maintainer tree.
 
 The largest single change here this time around is the Tegra
 iommu/memory controller driver, which gets updated to the new
 iommu DT binding. More drivers like this are likely to follow
 for the following merge window, but we should be able to do
 those through the iommu maintainer.
 
 Other notable changes are:
 * reset controller drivers from the reset maintainer (socfpga, sti, berlin)
 * fixes for the keystone navigator driver merged last time
 * at91 rtc driver changes related to the at91 cleanups
 * ARM perf driver changes from Will Deacon
 * updates for the brcmstb_gisb driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAVIcj4mCrR//JCVInAQIvWg//WD72+2q0RmEvu8r/YN4SDfg5iY7OMzgy
 Jyt6rN1IhXBY5GJL5Hil1q2JP/7o8vypekllohmBYWzXO3ZJ2VK6NPIXEMuzaiCz
 D9gmb+N6FdR2L2iYPv7B/3uOf55pHjBu525+vLspCTOgcWBrLgCnA9e9Yg462AEf
 VP3x+kV0AH25lovEi3mPrc2e46jnl0Mzp3f3PCkPqRSEMn7sxu9ipii+elxvArYp
 jYYCB03ZEBFa7T0e4HD38gnVLbC6dTj47AcSCWYP9WhxJ2RmCQKRBEnJre02hgar
 NPg8z+OrUACIAkvJHzg3WccmXdi0aqQ2JDsl46Tkl7pA6NdyMLfizT3OiZnMRmgc
 34H0ZSxclW+j25aI8OmDpv2ypZev+UAzkbRobcvF+aV/zJeAX88tPgcshfCUVZll
 ZIqO7oJB73nCl1XBLv2ZrLV2tcOox6jL/5LQt0WYA5Szg5upo7D1fZl8v5jXX7eJ
 C62ychuABs6hsmH5jEy+73kdpHbYft7dZfGZxdgq1AIOkdWoynCze/R7Vj24xoXR
 118cTNN9ZTPHmN5yxUvuGoqA3FWOqkJXaTS4W0hRD6OxOGTsTV4FIlRnD+K7feOW
 ng1yfIcvKR1Dx7tsySTHQK+bZGNnovA/ENPK6VDuhbwE62Lx7N5hcbsSIKKwRI9C
 D1m1fC+AIcQ=
 =MwMG
 -----END PGP SIGNATURE-----

Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "These are changes for drivers that are intimately tied to some SoC and
  for some reason could not get merged through the respective subsystem
  maintainer tree.

  The largest single change here this time around is the Tegra
  iommu/memory controller driver, which gets updated to the new iommu DT
  binding.  More drivers like this are likely to follow for the
  following merge window, but we should be able to do those through the
  iommu maintainer.

  Other notable changes are:
   - reset controller drivers from the reset maintainer (socfpga, sti,
     berlin)
   - fixes for the keystone navigator driver merged last time
   - at91 rtc driver changes related to the at91 cleanups
   - ARM perf driver changes from Will Deacon
   - updates for the brcmstb_gisb driver"

* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (53 commits)
  clocksource: arch_timer: Allow the device tree to specify uninitialized timer registers
  clocksource: arch_timer: Fix code to use physical timers when requested
  memory: Add NVIDIA Tegra memory controller support
  bus: brcmstb_gisb: Add register offset tables for older chips
  bus: brcmstb_gisb: Look up register offsets in a table
  bus: brcmstb_gisb: Introduce wrapper functions for MMIO accesses
  bus: brcmstb_gisb: Make the driver buildable on MIPS
  of: Add NVIDIA Tegra memory controller binding
  ARM: tegra: Move AHB Kconfig to drivers/amba
  amba: Add Kconfig file
  clk: tegra: Implement memory-controller clock
  serial: samsung: Fix serial config dependencies for exynos7
  bus: brcmstb_gisb: resolve section mismatch
  ARM: common: edma: edma_pm_resume may be unused
  ARM: common: edma: add suspend resume hook
  powerpc/iommu: Rename iommu_[un]map_sg functions
  rtc: at91sam9: add DT bindings documentation
  rtc: at91sam9: use clk API instead of relying on AT91_SLOW_CLOCK
  ARM: at91: add clk_lookup entry for RTT devices
  rtc: at91sam9: rework the Kconfig description
  ...
2014-12-09 14:48:22 -08:00
Linus Torvalds b64bb1d758 arm64 updates for 3.19
Changes include:
  - Support for alternative instruction patching from Andre
  - seccomp from Akashi
  - Some AArch32 instruction emulation, required by the Android folks
  - Optimisations for exception entry/exit code, cmpxchg, pcpu atomics
  - mmu_gather range calculations moved into core code
  - EFI updates from Ard, including long-awaited SMBIOS support
  - /proc/cpuinfo fixes to align with the format used by arch/arm/
  - A few non-critical fixes across the architecture
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJUhbSAAAoJELescNyEwWM07PQH/AolxqOJTTg8TKe2wvRC+DwY
 R98bcECMwhXvwep1KhTBew7z7NRzXJvVVs+EePSpXWX2+KK2aWN4L50rAb9ow4ty
 PZ5EFw564g3rUpc7cbqIrM/lasiYWuIWw/BL+wccOm3mWbZfokBB2t0tn/2rVv0K
 5tf2VCLLxgiFJPLuYk61uH7Nshvv5uJ6ODwdXjbrH+Mfl6xsaiKv17ZrfP4D/M4o
 hrLoXxVTuuWj3sy/lBJv8vbTbKbQ6BGl9JQhBZGZHeKOdvX7UnbKH4N5vWLUFZya
 QYO92AK1xGolu8a9bEfzrmxn0zXeAHgFTnRwtDCekOvy0kTR9MRIqXASXKO3ZEU=
 =rnFX
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Here's the usual mixed bag of arm64 updates, also including some
  related EFI changes (Acked by Matt) and the MMU gather range cleanup
  (Acked by you).

  Changes include:
   - support for alternative instruction patching from Andre
   - seccomp from Akashi
   - some AArch32 instruction emulation, required by the Android folks
   - optimisations for exception entry/exit code, cmpxchg, pcpu atomics
   - mmu_gather range calculations moved into core code
   - EFI updates from Ard, including long-awaited SMBIOS support
   - /proc/cpuinfo fixes to align with the format used by arch/arm/
   - a few non-critical fixes across the architecture"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
  arm64: remove the unnecessary arm64_swiotlb_init()
  arm64: add module support for alternatives fixups
  arm64: perf: Prevent wraparound during overflow
  arm64/include/asm: Fixed a warning about 'struct pt_regs'
  arm64: Provide a namespace to NCAPS
  arm64: bpf: lift restriction on last instruction
  arm64: Implement support for read-mostly sections
  arm64: compat: align cacheflush syscall with arch/arm
  arm64: add seccomp support
  arm64: add SIGSYS siginfo for compat task
  arm64: add seccomp syscall for compat task
  asm-generic: add generic seccomp.h for secure computing mode 1
  arm64: ptrace: allow tracer to skip a system call
  arm64: ptrace: add NT_ARM_SYSTEM_CALL regset
  arm64: Move some head.text functions to executable section
  arm64: jump labels: NOP out NOP -> NOP replacement
  arm64: add support to dump the kernel page tables
  arm64: Add FIX_HOLE to permanent fixed addresses
  arm64: alternatives: fix pr_fmt string for consistency
  arm64: vmlinux.lds.S: don't discard .exit.* sections at link-time
  ...
2014-12-09 13:12:47 -08:00
Anton Blanchard 7c5c92ed56 powerpc: Secondary CPUs must set cpu_callin_map after setting active and online
I have a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

  BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

  CPU: 0
  Comm: watchdog/130

The problem is that we aren't ensuring the CPU active and online bits are set
before allowing the master to continue on. The master unparks the secondary
CPUs kthreads and the scheduler looks for a CPU to run on. It calls
select_task_rq and realises the suggested CPU is not in the cpus_allowed
mask. It then ends up in select_fallback_rq, and since the active and
online bits aren't set we choose some other CPU to run on.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-09 16:36:11 +11:00
Al Viro ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
Paul Mackerras 56548fc0e8 powerpc/powernv: Return to cpu offline loop when finished in KVM guest
When a secondary hardware thread has finished running a KVM guest, we
currently put that thread into nap mode using a nap instruction in
the KVM code.  This changes the code so that instead of doing a nap
instruction directly, we instead cause the call to power7_nap() that
put the thread into nap mode to return.  The reason for doing this is
to avoid having the KVM code having to know what low-power mode to
put the thread into.

In the case of a secondary thread used to run a KVM guest, the thread
will be offline from the point of view of the host kernel, and the
relevant power7_nap() call is the one in pnv_smp_cpu_disable().
In this case we don't want to clear pending IPIs in the offline loop
in that function, since that might cause us to miss the wakeup for
the next time the thread needs to run a guest.  To tell whether or
not to clear the interrupt, we use the SRR1 value returned from
power7_nap(), and check if it indicates an external interrupt.  We
arrange that the return from power7_nap() when we have finished running
a guest returns 0, so pending interrupts don't get flushed in that
case.

Note that it is important a secondary thread that has finished
executing in the guest, or that didn't have a guest to run, should
not return to power7_nap's caller while the kvm_hstate.hwthread_req
flag in the PACA is non-zero, because the return from power7_nap
will reenable the MMU, and the MMU might still be in guest context.
In this situation we spin at low priority in real mode waiting for
hwthread_req to become zero.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-08 13:16:31 +11:00
Alexei Starovoitov 89aa075832 net: sock: allow eBPF programs to be attached to sockets
introduce new setsockopt() command:

setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.

The same eBPF program can be attached to multiple sockets.

User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 21:47:32 -08:00
Mahesh Salgaonkar 682e77c861 powerpc/book3s: Fix partial invalidation of TLBs in MCE code.
The existing MCE code calls flush_tlb hook with IS=0 (single page) resulting
in partial invalidation of TLBs which is not right. This patch fixes
that by passing IS=0xc00 to invalidate whole TLB for successful recovery
from TLB and ERAT errors.

Cc: stable@vger.kernel.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-05 16:26:21 +11:00
Aneesh Kumar K.V aefa5688c0 powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.

We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.

Performance number:
We use randbox_access_bench written by Anton.

Kernel with THP disabled and smaller hash page table size.

    86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
     2.10%  random_access_b  random_access_bench              [.] doit
     1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
     1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
     1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
     1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
     0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
     0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
     0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
     0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
     0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm

With Fix:

    27.54%  random_access_b  random_access_bench              [.] doit
    22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
     5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
     5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
     5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
     4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
     3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
     1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-05 16:26:15 +11:00
Julia Lawall d83480b061 crypto: powerpc - replace memset by memzero_explicit
Memset on a local variable may be removed when it is called just before the
variable goes out of scope.  Using memzero_explicit defeats this
optimization.  A simplified version of the semantic patch that makes this
change is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
type T;
@@

{
... when any
T x[...];
... when any
    when exists
- memset
+ memzero_explicit
  (x,
-0,
  ...)
... when != x
    when strict
}
// </smpl>

This change was suggested by Daniel Borkmann <dborkman@redhat.com>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-12-02 22:55:50 +08:00
Michael Ellerman abb90ee7bc powerpc/xmon: Cleanup the breakpoint flags
Drop BP_IABR_TE, which though used, does not do anything useful. Rename
BP_IABR to BP_CIABR. Renumber the flags.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02 14:23:04 +11:00
Anshuman Khandual 1ad7d70562 powerpc/xmon: Enable HW instruction breakpoint on POWER8
This patch enables support for hardware instruction breakpoint in xmon
on POWER8 platform with the help of a new register called the CIABR
(Completed Instruction Address Breakpoint Register). With this patch, a
single hardware instruction breakpoint can be added and cleared during
any active xmon debug session. The hardware based instruction breakpoint
mechanism works correctly with the existing TRAP based instruction
breakpoint available on xmon.

There are no powerpc CPU with CPU_FTR_IABR feature any more. This patch
has re-purposed all the existing IABR related code to work with CIABR
register based HW instruction breakpoint.

This has one odd feature, which is that when we hit a breakpoint xmon
doesn't tell us we have hit the breakpoint. This is because xmon is
expecting bp->address == regs->nip. Because CIABR fires on completition
regs->nip points to the instruction after the breakpoint. We could fix
that, but it would then confuse other parts of the xmon code which think
we need to emulate the instruction. [mpe]

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2014-12-02 14:23:04 +11:00
Michael Ellerman b5be75d008 Merge remote-tracking branch 'benh/next' into next
Merge updates collected & acked by Ben. A few EEH patches from Gavin,
some mm updates from Aneesh and a few odds and ends.
2014-12-02 14:19:20 +11:00
Aneesh Kumar K.V d557b09800 powerpc/mm/thp: Use tlbiel if possible
If we know that user address space has never executed on other cpus
we could use tlbiel.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 14:10:11 +11:00
Aneesh Kumar K.V f1581bf14b powerpc/mm/thp: Remove code duplication
Rename invalidate_old_hpte to flush_hash_hugepage and use that in
other places.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 14:10:10 +11:00
James Yang c4f3eb5fc5 powerpc/mm/hugetlb: Sanity check gigantic hugepage count
Limit the number of gigantic hugepages specified by the
hugepages= parameter to MAX_NUMBER_GPAGES.

Signed-off-by: James Yang <James.Yang@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 14:10:09 +11:00
Jiang Lu 0de3b56b13 powerpc/oprofile: Disable pagefaults during user stack read
A page fault occurred during reading user stack in oprofile backtrace
would lead following calltrace:

WARNING: at linux/kernel/smp.c:210
Modules linked in:
CPU: 5 PID: 736 Comm: sh Tainted: G W 3.14.23-WR7.0.0.0_standard #1
task: c0000000f6208bc0 ti: c00000007c72c000 task.ti: c00000007c72c000
NIP: c0000000000ed6e4 LR: c0000000000ed5b8 CTR: 0000000000000000
REGS: c00000007c72f050 TRAP: 0700 Tainted: G W (3.14.23-WR7.0.0
tandard)
MSR: 0000000080021000 <CE,ME> CR: 48222482 XER: 00000000
SOFTE: 0
GPR00: c0000000000ed5b8 c00000007c72f2d0 c0000000010aa048 0000000000000005
GPR04: c000000000fdb820 c00000007c72f410 0000000000000001 0000000000000005
GPR08: c0000000010b5768 c000000000f8a048 0000000000000001 0000000000000000
GPR12: 0000000048222482 c00000000fffe580 0000000022222222 0000000010129664
GPR16: 0000000010143cc0 0000000000000000 0000000044444444 0000000000000000
GPR20: c00000007c7221d8 c0000000f638e3c8 000003f15a20120d 0000000000000001
GPR24: 000000005a20120d c00000007c722000 c00000007cdedda8 00003fffef23b160
GPR28: 0000000000000001 c00000007c72f410 c000000000fdb820 0000000000000006
NIP [c0000000000ed6e4] .smp_call_function_single+0x18c/0x248
LR [c0000000000ed5b8] .smp_call_function_single+0x60/0x248
Call Trace:
[c00000007c72f2d0] [c0000000000ed5b8] .smp_call_function_single+0x60/0x248 (unreliable)
[c00000007c72f3a0] [c000000000030810] .__flush_tlb_page+0x164/0x1b0
[c00000007c72f460] [c00000000002e054] .ptep_set_access_flags+0xb8/0x168
[c00000007c72f500] [c0000000001ad3d8] .handle_mm_fault+0x4a8/0xbac
[c00000007c72f5e0] [c000000000bb3238] .do_page_fault+0x3b8/0x868
[c00000007c72f810] [c00000000001e1d0] storage_fault_common+0x20/0x44
 Exception: 301 at .__copy_tofrom_user_base+0x54/0x5b0
    LR = .op_powerpc_backtrace+0x190/0x20c
[c00000007c72fb00] [c000000000a2ec34] .op_powerpc_backtrace+0x204/0x20c (unreliable)
[c00000007c72fbc0] [c000000000a2b5fc] .oprofile_add_ext_sample+0xe8/0x118
[c00000007c72fc70] [c000000000a2eee0] .fsl_emb_handle_interrupt+0x20c/0x27c
[c00000007c72fd30] [c000000000a2e440] .op_handle_interrupt+0x44/0x58
[c00000007c72fdb0] [c000000000016d68] .performance_monitor_exception+0x74/0x90
[c00000007c72fe30] [c00000000001d8b4] exc_0x260_common+0xfc/0x100

performance_monitor_exception() is executed in a context with interrupt
disabled and preemption enabled. When there is a user space page fault
happened, do_page_fault() invoke in_atomic() to decide whether kernel
should handle such page fault. in_atomic() only check preempt_count.
So need call pagefault_disable() to disable preemption before reading
user stack.

Signed-off-by: Jiang Lu <lu.jiang@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 14:10:08 +11:00
Aneesh Kumar K.V 0ec2698fe1 powerpc/mm: Check for matching hpte without taking hpte lock
With smaller hash page table config, we would end up in situation
where we would be replacing hash page table slot frequently. In
such config, we will find the hpte to be not matching, and we
can do that check without holding the hpte lock. We need to
recheck the hpte again after holding lock.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:45 +11:00
Greg Kurz 221195fb80 powerpc: Drop useless warning in eeh_init()
This is what we get in dmesg when booting a pseries guest and
the hypervisor doesn't provide EEH support.

[    0.166655] EEH functionality not supported
[    0.166778] eeh_init: Failed to call platform init function (-22)

Since both powernv_eeh_init() and pseries_eeh_init() already complain when
hitting an error, it is not needed to print more (especially such an
uninformative message).

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:45 +11:00
Mahesh Salgaonkar 6d626c5ea3 powerpc/powernv: Cleanup unused MCE definitions/declarations.
Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:45 +11:00
Gavin Shan a450e8f55a powerpc/eeh: Dump PHB diag-data early
On PowerNV platform, PHB diag-data is dumped after stopping device
drivers. In case of recursive EEH errors, the kernel is usually
crashed before dumping PHB diag-data for the second EEH error. It's
hard to locate the root cause of the second EEH error without PHB
diag-data.

The patch adds one more EEH option "eeh=early_log", which helps
dumping PHB diag-data immediately once frozen PE is detected, in
order to get the PHB diag-data for the second EEH error.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:26 +11:00
Gavin Shan b1d76a7d57 powerpc/eeh: Recover EEH error on ownership change for BCM5719
In PCI passthrou scenario, we need simulate EEH recovery for Emulex
adapters when their ownership changes, as we did in commit 5cfb20b96
("powerpc/eeh: Emulate EEH recovery for VFIO devices"). Broadcom
BCM5719 adpaters are facing same problem and needs same cure.

Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:26 +11:00
Gavin Shan 28bf36f92a powerpc/eeh: Set EEH_PE_RESET on PE reset
The patch introduces additional flag EEH_PE_RESET to indicate the
corresponding PE is under reset. In turn, the PE retrieval bakcend
on PowerNV platform can return unfrozen state for the EEH core to
moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for
the purpose.

In PCI passthrou case, the problem is more worse: Guest doesn't
recover 6th EEH error. The PE is left in isolated (frozen) and
config blocked state on Broadcom adapters. We can't retrieve the
PE's state correctly any more, even from the host side via sysfs
/sys/bus/pci/devices/xxx/eeh_pe_state.

Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:26 +11:00
Gavin Shan b85743ee95 powerpc/eeh: Refactor eeh_reset_pe()
The patch refactors eeh_reset_pe() in order for:

   * Varied return values for different failure cases.
   * Replace pr_err() with pr_warn() and print function name.
   * Coding style cleanup.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02 11:03:26 +11:00
Neelesh Gupta 8de303bae4 hwmon: (ibmpowernv) Use platform 'id_table' to probe the device
The current driver probe() function assumes the sensor device to be
always present and gets executed every time if the driver is loaded,
but the appropriate hardware could not be present.

So, move the platform device creation as part of platform init code
and use the 'id_table' to check if the device is present or not.

Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-11-30 20:13:13 -08:00
Paul Gortmaker cd4c910e00 netpoll: delete defconfig references to obsolete NETPOLL_TRAP
In commit 9c62a68d13 ("netpoll:
Remove dead packet receive code (CONFIG_NETPOLL_TRAP)") this
Kconfig option was removed.  So remove references to it from
all defconfigs as well.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29 21:13:48 -08:00
David S. Miller 60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Linus Torvalds 21f122f472 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc fixes from Michael Ellerman:
 "Here are five fixes for you to pull please.

  They're all CC'ed to stable except the "Fix PE state format" one which
  went in this release"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
  powerpc/powernv: Replace OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE
  powerpc/eeh: Fix PE state format
  powerpc/pseries: Fix endiannes issue in RTAS call from xmon
  powerpc/powernv: Fix the hmi event version check.
2014-11-27 18:23:41 -08:00
Anton Blanchard 152d44a853 powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
I used some 64 bit instructions when adding the 32 bit getcpu VDSO
function. Fix it.

Fixes: 18ad51dd34 ("powerpc: Add VDSO version of getcpu")
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-27 09:42:12 +11:00
Gavin Shan 360d88a9e3 powerpc/powernv: Replace OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE
The flag passed to ioda_eeh_phb_reset() should be EEH_RESET_DEACTIVATE,
which is translated to OPAL_DEASSERT_RESET or something else by the
EEH backend accordingly.

The patch replaces OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE for
ioda_eeh_phb_reset().

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-27 09:40:32 +11:00
Gavin Shan 7531473c30 powerpc/eeh: Fix PE state format
Obviously I had wrong format given to the PE state output from
/sys/bus/pci/devices/xxxx/eeh_pe_state with some typoes, which
was introduced by commit 2013add4ce. The patch fixes it up.

Fixes: 2013add4ce ("powerpc/eeh: Show hex prefix for PE state sysfs")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-27 09:32:58 +11:00
Laurent Dufour 3b8a3c0109 powerpc/pseries: Fix endiannes issue in RTAS call from xmon
On pseries system (LPAR) xmon failed to enter when running in LE mode,
system is hunging. Inititating xmon will lead to such an output on the
console:

SysRq : Entering xmon
cpu 0x15: Vector: 0  at [c0000003f39ffb10]
    pc: c00000000007ed7c: sysrq_handle_xmon+0x5c/0x70
    lr: c00000000007ed7c: sysrq_handle_xmon+0x5c/0x70
    sp: c0000003f39ffc70
   msr: 8000000000009033
  current = 0xc0000003fafa7180
  paca    = 0xc000000007d75e80	 softe: 0	 irq_happened: 0x01
    pid   = 14617, comm = bash
Bad kernel stack pointer fafb4b0 at eca7cc4
cpu 0x15: Vector: 300 (Data Access) at [c000000007f07d40]
    pc: 000000000eca7cc4
    lr: 000000000eca7c44
    sp: fafb4b0
   msr: 8000000000001000
   dar: 10000000
 dsisr: 42000000
  current = 0xc0000003fafa7180
  paca    = 0xc000000007d75e80	 softe: 0	 irq_happened: 0x01
    pid   = 14617, comm = bash
cpu 0x15: Exception 300 (Data Access) in xmon, returning to main loop
xmon: WARNING: bad recursive fault on cpu 0x15

The root cause is that xmon is calling RTAS to turn off the surveillance
when entering xmon, and RTAS is requiring big endian parameters.

This patch is byte swapping the RTAS arguments when running in LE mode.

Cc: stable@vger.kernel.org
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-27 09:32:57 +11:00
Mahesh Salgaonkar 6acbc5a1da powerpc/powernv: Fix the hmi event version check.
The current HMI event structure is an ABI and carries a version field to
accommodate future changes without affecting/rearranging current structure
members that are valid for previous versions.

The current version check "if (hmi_evt->version != OpalHMIEvt_V1)"
doesn't accomodate the fact that the version number may change in
future.

If firmware starts returning an HMI event with version > 1, this check
will fail and no HMI information will be printed on older kernels.

This patch fixes this issue.

Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Reword changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-27 09:32:52 +11:00
Grant Likely f5242e5a88 of/reconfig: Always use the same structure for notifiers
The OF_RECONFIG notifier callback uses a different structure depending
on whether it is a node change or a property change. This is silly, and
not very safe. Rework the code to use the same data structure regardless
of the type of notifier.

Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
2014-11-24 22:25:03 +00:00
Kees Cook 5d26a105b5 crypto: prefix module autoloading with "crypto-"
This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:

https://lkml.org/lkml/2013/3/4/70

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-11-24 22:43:57 +08:00
Benjamin Herrenschmidt 31345e1a07 powerpc/pci: Remove unused force_32bit_msi quirk
This is now fully replaced with the generic "no_64bit_msi" one
that is set by the respective drivers directly.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-24 14:36:02 +11:00
Benjamin Herrenschmidt 415072a041 powerpc/pseries: Honor the generic "no_64bit_msi" flag
Instead of the arch specific quirk which we are deprecating

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
2014-11-24 14:36:02 +11:00
Benjamin Herrenschmidt 360743814c powerpc/powernv: Honor the generic "no_64bit_msi" flag
Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
2014-11-24 14:36:01 +11:00
Thomas Gleixner 280510f106 PCI/MSI: Rename mask/unmask_msi_irq treewide
The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
sites. The conversion helper functions are kept around to avoid
conflicts in next and will be removed after merging into mainline.

Coccinelle assisted conversion. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: x86@kernel.org
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Mohit Kumar <mohit.kumar@st.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yijing Wang <wangyijing@huawei.com>
2014-11-23 13:01:45 +01:00
Jiang Liu 83a18912b0 PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg()
Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
specific.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-23 13:01:45 +01:00
Jiang Liu 891d4a48f7 PCI/MSI: Rename __read_msi_msg() to __pci_read_msi_msg()
Rename __read_msi_msg() to __pci_read_msi_msg() and kill unused
read_msi_msg(). It's a preparation to separate generic MSI code from
PCI core.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-23 13:01:45 +01:00
David S. Miller 1459143386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 22:28:24 -05:00
Masanari Iida 6774def642 treewide: fix typo in printk and Kconfig
This patch fix spelling typo in printk and Kconfig within
various part of kernel sources.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-11-20 14:56:11 +01:00
Jiri Kosina a02001086b Merge Linus' tree to be be to apply submitted patches to newer code than
current trivial.git base
2014-11-20 14:42:02 +01:00
Al Viro a455589f18 assorted conversions to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:20 -05:00
Michael Ellerman e39f223fc9 powerpc: Remove more traces of bootmem
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.

Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:

  p = (struct foo *)alloc_bootmem(x);

Becomes:

  p = memblock_virt_alloc(x, 0);

We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.

We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.

The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().

Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-19 21:41:51 +11:00
Li Zhong a49ab6eeeb powerpc/pseries: Initialise nvram_pstore_info's buf_lock
nvram_pstore_info's buf_lock is not initialized before registering,
which is clearly incorrect.

It causes some strange behavior when trying to obtain the lock during
kdump process.

On a UP configuration, the console stopped for a couple of seconds, then
"lockup suspected" warning printed out, but then it continued to run.

So try lock fails, and lockup reported, but then arch_spin_lock()
passes.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
[mpe: Edited changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-19 16:09:13 +11:00
Denis Kirjanov cadaecd218 PPC: bpf_jit_comp: Unify BPF_MOD | BPF_X and BPF_DIV | BPF_X
Reduce duplicated code by unifying
BPF_ALU | BPF_MOD | BPF_X and BPF_ALU | BPF_DIV | BPF_X

CC: Alexei Starovoitov<alexei.starovoitov@gmail.com>
CC: Daniel Borkmann<dborkman@redhat.com>
CC: Philippe Bergheaud<felix@linux.vnet.ibm.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-18 13:20:09 -05:00
Joerg Roedel 0690cbd2e5 powerpc/iommu: Rename iommu_[un]map_sg functions
The IOMMU-API gained support for a new iommu_map_sg
function. This causes compile failures on powerpc because
the function name is already globally used there.
This patch renames adds a ppc_ prefix to these functions to
solve the compile problem.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2014-11-18 11:30:01 +01:00
Michael Ellerman 35891d40bf Merge remote-tracking branch 'scottwood/next' into next
Scott says:

"Highlights include a bunch of 8xx optimizations, device tree bindings
for Freescale BMan, QMan, and FMan datapath components, misc device tree
updates, and inbound rio window support."
2014-11-18 17:00:38 +11:00
Kevin Hao d7ce437749 powerpc/fsl_msi: mark the msi cascade handler IRQF_NO_THREAD
The commit 543c043cba ("powerpc/fsl_msi: change the irq handler from
chained to normal") changes the msi cascade handler from chained to
normal. Since cascade handler must run in hard interrupt context, this
will cause kernel panic if we force threading of all the interrupt
handler via kernel command parameter 'threadirqs'. So mark the irq
handler IRQF_NO_THREAD explicitly.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-11-17 22:00:30 -06:00
Prabhakar Kushwaha 76f3e2929b powerpc/config: Enable memory driver
As Freescale IFC controller has been moved to driver to driver/memory.

So enable memory driver in powerpc config

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-11-17 19:36:42 -06:00
Will Deacon fb7332a9fe mmu_gather: move minimal range calculations into generic code
On architectures with hardware broadcasting of TLB invalidation messages
, it makes sense to reduce the range of the mmu_gather structure when
unmapping page ranges based on the dirty address information passed to
tlb_remove_tlb_entry.

arm64 already does this by directly manipulating the start/end fields
of the gather structure, but this confuses the generic code which
does not expect these fields to change and can end up calculating
invalid, negative ranges when forcing a flush in zap_pte_range.

This patch moves the minimal range calculation out of the arm64 code
and into the generic implementation, simplifying zap_pte_range in the
process (which no longer needs to care about start/end, since they will
point to the appropriate ranges already). With the range being tracked
by core code, the need_flush flag is dropped in favour of checking that
the end of the range has actually been set.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-11-17 10:12:42 +00:00
Neelesh Gupta 16b1d26e77 rtc/tpo: Driver to support rtc and wakeup on PowerNV platform
The patch implements the OPAL rtc driver that binds with the rtc
driver subsystem. The driver uses the platform device infrastructure
to probe the rtc device and register it to rtc class framework. The
'wakeup' is supported depending upon the property 'has-tpo' present
in the OF node. It provides a way to load the generic rtc driver in
in the absence of an OPAL driver.

The patch also moves the existing OPAL rtc get/set time interfaces to the
new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.

Test results:
-------------
Host:
[root@tul169p1 ~]# ls -l /sys/class/rtc/
total 0
lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
[root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
08:10:07
[root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
[root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
1413274345
[root@tul169p1 ~]#

FSP:
$ smgr mfgState
standby
$ rtim timeofday

System time is valid: 2014/10/14 08:12:04.225115

$ smgr mfgState
ipling
$

CC: devicetree@vger.kernel.org
CC: tglx@linutronix.de
CC: rtc-linux@googlegroups.com
CC: a.zummo@towertech.it
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-17 18:04:01 +11:00
Vineeth Vijayan 59994fb01a powerpc: Use generic PIE randomization
Back in 2009 we merged 501cb16d3c "Randomise PIEs", which added support for
randomizing PIE (Position Independent Executable) binaries.

That commit added randomize_et_dyn(), which correctly randomized the addresses,
but failed to honor PF_RANDOMIZE. That means it was not possible to disable PIE
randomization via the personality flag, or /proc/sys/kernel/randomize_va_space.

Since then there has been generic support for PIE randomization added to
binfmt_elf.c, selectable via ARCH_BINFMT_ELF_RANDOMIZE_PIE.

Enabling that allows us to drop randomize_et_dyn(), which means we start
honoring PF_RANDOMIZE correctly.

It also causes a fairly major change to how we layout PIE binaries.

Currently we will place the binary at 512MB-520MB for 32 bit binaries, or
512MB-1.5GB for 64 bit binaries, eg:

    $ cat /proc/$$/maps
    4e550000-4e580000 r-xp 00000000 08:02 129813       /bin/dash
    4e580000-4e590000 rw-p 00020000 08:02 129813       /bin/dash
    10014110000-10014140000 rw-p 00000000 00:00 0      [heap]
    3fffaa3f0000-3fffaa5a0000 r-xp 00000000 08:02 921  /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fffaa5a0000-3fffaa5b0000 rw-p 001a0000 08:02 921  /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fffaa5c0000-3fffaa5d0000 rw-p 00000000 00:00 0
    3fffaa5d0000-3fffaa5f0000 r-xp 00000000 00:00 0    [vdso]
    3fffaa5f0000-3fffaa620000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
    3fffaa620000-3fffaa630000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so
    3ffffc340000-3ffffc370000 rw-p 00000000 00:00 0    [stack]

With this commit applied we don't do any special randomisation for the binary,
and instead rely on mmap randomisation. This means the binary ends up at high
addresses, eg:

    $ cat /proc/$$/maps
    3fff99820000-3fff999d0000 r-xp 00000000 08:02 921    /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fff999d0000-3fff999e0000 rw-p 001a0000 08:02 921    /lib/powerpc64le-linux-gnu/libc-2.19.so
    3fff999f0000-3fff99a00000 rw-p 00000000 00:00 0
    3fff99a00000-3fff99a20000 r-xp 00000000 00:00 0      [vdso]
    3fff99a20000-3fff99a50000 r-xp 00000000 08:02 1246   /lib/powerpc64le-linux-gnu/ld-2.19.so
    3fff99a50000-3fff99a60000 rw-p 00020000 08:02 1246   /lib/powerpc64le-linux-gnu/ld-2.19.so
    3fff99a60000-3fff99a90000 r-xp 00000000 08:02 129813 /bin/dash
    3fff99a90000-3fff99aa0000 rw-p 00020000 08:02 129813 /bin/dash
    3fffc3de0000-3fffc3e10000 rw-p 00000000 00:00 0      [stack]
    3fffc55e0000-3fffc5610000 rw-p 00000000 00:00 0      [heap]

Although this should be OK, it's possible it might break badly written
binaries that make assumptions about the address space layout.

Signed-off-by: Vineeth Vijayan <vvijayan@mvista.com>
[mpe: Rewrite changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-17 17:55:11 +11:00
Gavin Shan cf2b1e0eb0 powerpc/powernv: Fix potential zero devisor
If there're no PHBs under P5IOC2 HUB device tree node, we should
bail early to avoid zero devisor and allocating TCE tables.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:25 +11:00
Gavin Shan ec8e4e9d3d powerpc/powernv: Bail upon invalid master PE
When freezing compound PEs in pnv_ioda_freeze_pe(), we should bail
upon illegal master PE. We needn't freeze slave PE because it should
have been put into frozen state by hardware.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:25 +11:00
Gavin Shan 4773f76b61 powerpc/powernv: Simplify pnv_ioda_configure_pe()
Nested if statements are always bad and the patch avoids one by
checking PHB type and bail in advance if necessary.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:24 +11:00
Gavin Shan b131a8425c powerpc/powernv: Set PELTV for compound PEs
Commit 262af55 ("powerpc/powernv: Enable M64 aperatus for PHB3")
introduced compound PEs in order to support M64 aperatus on PHB3.
However, we never configured PELTV for compound PEs. The patch
fixes that by: parent PE can freeze all child compound PEs. Any
compound PE affects the group.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:24 +11:00
Gavin Shan 4b82ab1803 powerpc/powernv: Initialize M64 PE in time
The patch initializes PE instance when reserving PE number to
keep consistent things as we did before. Also, it replaces the
iteration on bridge's windows with the prefered way.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:23 +11:00
Gavin Shan 5ef7356781 powerpc/powernv: Rename alloc_m64_pe() to reserve_m64_pe()
The patch renames alloc_m64_pe() to reserve_m64_pe() to reflect
its real usage: We reserve PE numbers for M64 segments in advance
and then pick up the reserved PE numbers when building the mapping
between PE numbers and M64 segments.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14 17:24:23 +11:00