Commit Graph

58476 Commits

Author SHA1 Message Date
Linus Torvalds 5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Alex Elder 5522ae0b68 libceph: use slab cache for osd client requests
Create a slab cache to manage allocation of ceph_osdc_request
structures.

This resolves:
    http://tracker.ceph.com/issues/3926

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-02 11:58:41 -05:00
Linus Torvalds 99c6bcf46d ARM: arm-soc multiplatform updates for 3.10
More multiplatform enablement for ARM platforms. The ones converted in
 this branch are:
 - bcm2835
 - cns3xxx
 - sirf
 - nomadik
 - msx
 - spear
 - tegra
 - ux500
 
 We're getting close to having most of them converted!
 
 One of the larger platforms remaining is Samsung Exynos, and there are
 a bunch of supporting patches in this merge window for it. There was a
 patch in this branch to a early version of multiplatform conversion,
 but it ended up being reverted due to need of more bake time. The
 revert commit is part of the branch since it would have required
 rebasing multiple dependent branches and they were stable by then.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRgg99AAoJEIwa5zzehBx3n78P/j0w/8v+F4dM29ba5M/tqbFI
 e3wpeFykZ/HJH+FFIEYfIablpfHsLB0LEMh0dZmwHESFC6eR0RfGL2jOkpfcH9Ne
 7B/JIFN4l1iwqqKCXf+QbYL6e8YFxlJkg6BIB4KhNgliQoO/ASP/8EbcgROYuxmN
 KPVdw9laUCCvb5Ogh2NWVAkBHhVGAEiqK20r4TQz8alI8RUmMleWM3o+wLBWVhOO
 d3gtYSfuFSbrJfbpKSdycLizoV/NekdOC1A9Ov9YuOdw8DzNbrThCRQtu0tIUgxN
 JjfnGlEJLsJS9SESfr8SYWxTuhe/lB2dGqjQPvRtl2HGBhbtTlnWfQ0k2ZHdeJuD
 J50SLrGA2gN9E5PlHJXjYk8uhhGIq8bNTJ//CtDkfKTq1D7PuHVEpEctsaz3BBbM
 U+x9zP2v4FB+yrZu8w+gkQY/wDgHsxj08mT6BK0+l8ePdyQV22CvwmM5XlJFI03x
 5J0nLYiYfef+ZN9rGgVrQbn+yv+IEkE4DmeiscjeVJE5LVdVrDpYGfx7UA7V0UA7
 i3KRVpNKuy1v7GJDnKlEBPkmB+vgXTRXUPDVCuC4n0Hi5PYj4es1gY6AoXGF90wm
 vtKxGr/2XDLP7Ro+m0OXMttSgQShnmbrbOngfkWcFwUmG7cB3SSUUOGKM+2LNnXM
 MJTqVhPjkZ2GYBi/J6S/
 =4hSo
 -----END PGP SIGNATURE-----

Merge tag 'multiplatform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC multiplatform updates from Olof Johansson:
 "More multiplatform enablement for ARM platforms.  The ones converted
  in this branch are:

   - bcm2835
   - cns3xxx
   - sirf
   - nomadik
   - msx
   - spear
   - tegra
   - ux500

  We're getting close to having most of them converted!

  One of the larger platforms remaining is Samsung Exynos, and there are
  a bunch of supporting patches in this merge window for it.  There was
  a patch in this branch to a early version of multiplatform conversion,
  but it ended up being reverted due to need of more bake time.  The
  revert commit is part of the branch since it would have required
  rebasing multiple dependent branches and they were stable by then"

* tag 'multiplatform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (70 commits)
  mmc: sdhci-s3c: Fix operation on non-single image Samsung platforms
  clocksource: nomadik-mtu: fix up clocksource/timer
  Revert "ARM: exynos: enable multiplatform support"
  ARM: SPEAr13xx: Fix typo "ARCH_HAVE_CPUFREQ"
  ARM: exynos: enable multiplatform support
  rtc: s3c: make header file local
  mtd: onenand/samsung: make regs-onenand.h file local
  thermal/exynos: remove unnecessary header inclusions
  mmc: sdhci-s3c: remove platform dependencies
  ARM: samsung: move mfc device definition to s5p-dev-mfc.c
  ARM: exynos: move debug-macro.S to include/debug/
  ARM: exynos: prepare for sparse IRQ
  ARM: exynos: introduce EXYNOS_ATAGS symbol
  ARM: tegra: build assembly files with -march=armv7-a
  ARM: Push selects for TWD/SCU into machine entries
  ARM: ux500: build hotplug.o for ARMv7-a
  ARM: ux500: move to multiplatform
  ARM: ux500: make remaining headers local
  ARM: ux500: make irqs.h local to platform
  ARM: ux500: get rid of <mach/[hardware|db8500-regs].h>
  ...
2013-05-02 09:38:16 -07:00
Linus Torvalds 97b1007a29 ARM: arm-soc platform updates for 3.10, part 1
This branch contains platform updates for 3.10. Among the highlights:
 
 - Support for the new Atmel Cortex-A5 based platforms (SAMA5D3)
 - New support for CSR SiRFatlas6 SoCs
 - A handful of updates for NVidia T114 (a.k.a. Tegra 4)
 - A bunch of updates for the shmobile platforms
 - A handful of updates for davinci
 - A few updates for Qualcomm MSM
 - Plus a handful of other patches, defconfig updates, etc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRgg+LAAoJEIwa5zzehBx3ePcP/3NUsSOTRQ2SZIVpyjnWOhkf
 RMZiRaVsxrY0BPfDB9E2Vcb6lannKmACTujs/Ux7kJC22BreuFM1PnZoDfhkRuSE
 n/nVB1981XJS82z2uONRSZGlUPSGWYzhTTUDJ0nHiBGmIGf5ctnC0iYWp3As3lv9
 kNY14H7NkwQ4zBVNEMu7WfW8d2IJgqZJgR9xhZPv5fOZ+LlQmK6VaHWTmQtjyea1
 bG1qoJ0dPbfJB4Vnr3a49rBkSJxZUiv8xQucw9+vo+ADRi64M4sZ1Jj2vVyDpqZp
 F4fxBNMVvg7xM0TcBbItFFYJBXlUjeT4z+UI5iYjkbnE7EV9ndFeZXHCWX1qzOSy
 X/nrJKuoe7ISQanBE9SHS9DpDGlkPDO0Mn0vb1f2VUQOY513pt/D1iFYEucZ6WCN
 fWUYtvt5GayidUr55D1U8ssbE0oGt2rizd9x7GUk4KbRVAnUUNopIQAhXrefTrZm
 jfdZNDckJ2F3aq8IPjsKuyJTpe61xD4Wvb3P/pEE3Q8fowPF5WIxXV+qjqHQ9vtt
 Tz4LkP/YdynVFGmhOwz3QZmPaQItaabaYyCcZ5cVCvt5mdxx5VuHYppafhCPJz+V
 KCQpKi1azuIv+sDR+nlGOl6+Ideea3s7TsRudfbmQFp5GsqkqOdJzR9gbbKmJauQ
 4JPpRd+4W8wC8zXQnhVY
 =HXX3
 -----END PGP SIGNATURE-----

Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC platform updates from Olof Johansson:
 "This branch contains part 1 of the platform updates for 3.10.  Among
  the highlights:

   - Support for the new Atmel Cortex-A5 based platforms (SAMA5D3)
   - New support for CSR SiRFatlas6 SoCs
   - A handful of updates for NVidia T114 (a.k.a. Tegra 4)
   - A bunch of updates for the shmobile platforms
   - A handful of updates for davinci
   - A few updates for Qualcomm MSM
   - Plus a handful of other patches, defconfig updates, etc."

* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (135 commits)
  ARM: tegra: pm: fix build error w/o PM_SLEEP
  ARM: davinci: ensure global variables are declared
  ARM: davinci: sram.c: fix incorrect type in assignment
  ARM: davinci: da8xx dt: make file local symbols static
  ARM: davinci: da8xx: add remoteproc support
  ARM: socfpga: Upgrade clk driver for socfpga to make use of dts clock entries
  ARM: socfpga: Add clock entries into device tree
  ARM: socfpga: Enable soft reset
  ARM: EXYNOS: replace cpumask by the corresponding macro
  ARM: EXYNOS: handle properly the return values
  ARM: EXYNOS: factor out the idle states
  ARM: OMAP4: Enable fix for Cortex-A9 erratas
  ARM: OMAP2+: Export SoC information to userspace
  ARM: OMAP2+: SoC name and revision unification
  ARM: OMAP2+: Move common part of late init into common function
  ARM: tegra: pm: remove duplicated include from pm.c
  ARM: davinci: da850: override mmc DT node device name
  ARM: davinci: da850: add mmc DT entries
  mmc: davinci_mmc: add DT support
  ARM: SAMSUNG: check processor type before cache restoration in resume
  ...
2013-05-02 09:31:45 -07:00
Linus Torvalds dfab34aa61 ARM: arm-soc device-tree updates for 3.10, part 1
Device-tree updates for 3.10. The bulk of the churn in this branch is due
 to i.MX moving from C-defined pin control over to device tree, which is
 a one-time conversion that will allow greater flexibility down the road.
 
 Besides that, there's PCI-e bindings for Marvell mvebu platforms and a
 handful of cleanups to tegra due to the new include file functionality
 of the device tree compiler.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRgg+aAAoJEIwa5zzehBx3/q0P/RumfsMePxhmSU4HM16a3w0B
 9jg7wd9BxVrJUzTY9F7z+Q72x0u5USUtVnyoY5s68DQMkFyhBQUuKCCiwCqtpCBN
 2Uf0JQjYHdqEFKgN6DiPxSVRPXC8jmMzYGRk5RTI5kVWxaBEMdw9rTo0x4vol/Cv
 7Z+W+gixXZbgydH/ogqly1MQc9vWliRTfU2zv2WOZ7TLyyEd2lOjMMBIX/n3vI4l
 T32JOUDgIYK841s9n2eNQGEjqB/OghMMrQsdjUAd++je6QtqgZk9+uHfPFC1C0wQ
 3F93te9HleluYcOcxGmedK3B9QO2Y8y1XHe+uxLZVKXBR+6/5AtSwZFRQm10uMCI
 JUz3j6tRAWDAOin2vXZcf2CVPn5HZbh3D67WuUdfxMngH0XHvSZRC9eRd70jWvDe
 9FY4NRTjRSLu/VtgCzF8tSA3cEylhyKYdK6Cf0nbwQ26JTO2VNNCnjuCbRfWp+E1
 y0jIQwsaiNLEBwbesNbnFrj+YTTAZBI4+Y5HrSV7Og5/5X9BWs11KAkRppNOj0Uc
 WnqG26SssuBNBVHPOO2RrOwq3n2VphQ/BB8j9yrpWtcAlQxdjmVqFj/GIIiHr2Wm
 GuKWgM5fn+xF0oeCriq4Ti5eCJQ7Ev6Er46WrGQDBniZWVi05aP51ks1bfwbfHqn
 z1o5QfLpr4PkJPk0mnim
 =8X1b
 -----END PGP SIGNATURE-----

Merge tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC device-tree updates from Olof Johansson:
 "Part 1 of device-tree updates for 3.10.  The bulk of the churn in this
  branch is due to i.MX moving from C-defined pin control over to device
  tree, which is a one-time conversion that will allow greater
  flexibility down the road.

  Besides that, there's PCI-e bindings for Marvell mvebu platforms and a
  handful of cleanups to tegra due to the new include file functionality
  of the device tree compiler"

* tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (113 commits)
  arm: mvebu: PCIe Device Tree informations for Armada XP GP
  arm: mvebu: PCIe Device Tree informations for Armada 370 DB
  arm: mvebu: PCIe Device Tree informations for Armada 370 Mirabox
  arm: mvebu: PCIe Device Tree informations for Armada XP DB
  arm: mvebu: PCIe Device Tree informations for OpenBlocks AX3-4
  arm: mvebu: add PCIe Device Tree informations for Armada XP
  arm: mvebu: add PCIe Device Tree informations for Armada 370
  ARM: sunxi: unify osc24M_fixed and osc24M
  arm: vt8500: Add SDHC support to WM8505 DT
  ARM: dts: Add a 64 bits version of the skeleton device tree
  ARM: mvebu: Add Device Bus and CFI flash memory support to defconfig
  ARM: mvebu: Add support for NOR flash device on Openblocks AX3 board
  ARM: mvebu: Add support for NOR flash device on Armada XP-GP board
  ARM: mvebu: Add Device Bus support for Armada 370/XP SoC
  ARM: dts: imx6dl-wandboard: Add USB Host support
  ARM: dts: imx51 cpu node
  ARM: dts: Add missing imx27-phytec-phycore dtb target
  ARM: dts: Add NFC support for i.MX27 Phytec PCM038 module
  ARM: i.MX51: Add PATA support
  ARM: dts: Add initial support for Wandboard Dual-Lite
  ...
2013-05-02 09:28:03 -07:00
Linus Torvalds a7726350e0 ARM: arm-soc cleanup for 3.10
Here is a collection of cleanup patches. Among the pieces that stand out are:
 
 - The deletion of h720x platforms
 - Split of at91 non-dt platforms to their own Kconfig file to keep them separate
 - General cleanups and refactoring of i.MX and MXS platforms
 - Some restructuring of clock tables for OMAP
 - Convertion of PMC driver for Tegra to dt-only
 - Some renames of sunxi -> sun4i (Allwinner A10)
 - ... plus a bunch of other stuff that I haven't mentioned
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRggUqAAoJEIwa5zzehBx3HjEQAJwp7heRs/HwTDzmzcyHkRMV
 usbaa9dHBuAZ0DzsWjLK99xEn8VWD9TvbeP6hN5gNhxko06UVza3o8PI2iV1ztMB
 9K3u2+LS5on/5cOxnsU1va16h5hBZ0ZIgNx5NY+PZ5mBY6v1U3qTjljPP62iXp63
 w+sdXeZDe/c5JvuoDRbY0OBR++3Jp8cQg7KbU78jWz3r5D2rC1zwhkf2audcRY6b
 jIWTj9M8CHynh/D6OzKqDcOYorBHNSRj0YbiWS2nnMfm+0V8nya00EPRpCPRiBUb
 sobSy1CI9Qxiih3bOf6QCfzCRzJ5hbtE0zlI8g3bqtEZ1yOsE949HrKapWHJJdIU
 JNTXrxXORAnaRhbzvSPNpp/iJBSDQRsfEETgv5BuHg/4lzTQfzElySbcgb4EeoHr
 7Zt8ZR2/Du+u76qIPqs19ES3Wx+nOEOfSDAgZmlfPvlwmlGDYvqAXoeJ006VXnhG
 JacLuD/cFnJ1w00Bcl48ZXMIsVkoRqjvsCG5q688HGXMM1lU8DfgUpQY6OCWAbdu
 kFnBinJZk+HbE8FGS8O0BoQ+oiC0YIr2XhATL66PGHq7bLHb5ycwvZ7mrfC0AN9j
 M9hqTFednwfo9wF8vSj5nMsxXwP8/mky4ECGoFvLsMYDosunrNVnAHtTgDSE+ZgO
 6kQJ1P8jBBXn2LyjF88W
 =xCAx
 -----END PGP SIGNATURE-----

Merge tag 'cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC cleanup from Olof Johansson:
 "Here is a collection of cleanup patches.  Among the pieces that stand
  out are:

   - The deletion of h720x platforms
   - Split of at91 non-dt platforms to their own Kconfig file to keep
     them separate
   - General cleanups and refactoring of i.MX and MXS platforms
   - Some restructuring of clock tables for OMAP
   - Convertion of PMC driver for Tegra to dt-only
   - Some renames of sunxi -> sun4i (Allwinner A10)
   - ... plus a bunch of other stuff that I haven't mentioned"

* tag 'cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (119 commits)
  ARM: i.MX: remove unused ARCH_* configs
  ARM i.MX53: remove platform ahci support
  ARM: sunxi: Rework the restart code
  irqchip: sunxi: Rename sunxi to sun4i
  irqchip: sunxi: Make use of the IRQCHIP_DECLARE macro
  clocksource: sunxi: Rename sunxi to sun4i
  clocksource: sunxi: make use of CLKSRC_OF
  clocksource: sunxi: Cleanup the timer code
  ARM: at91: remove trailing semicolon from macros
  ARM: at91/setup: fix trivial typos
  ARM: EXYNOS: remove "config EXYNOS_DEV_DRM"
  ARM: EXYNOS: change the name of USB ohci header
  ARM: SAMSUNG: Remove unnecessary code for dma
  ARM: S3C24XX: Remove unused GPIO drive strength register definitions
  ARM: OMAP4+: PM: Restore CPU power state to ON with clockdomain force wakeup method
  ARM: S3C24XX: Removed unneeded dependency on CPU_S3C2412
  ARM: S3C24XX: Removed unneeded dependency on CPU_S3C2410
  ARM: S3C24XX: Removed unneeded dependency on ARCH_S3C24XX for boards
  ARM: SAMSUNG: Fix typo "CONFIG_SAMSUNG_DEV_RTC"
  ARM: S5P64X0: Fix typo "CONFIG_S5P64X0_SETUP_SDHCI"
  ...
2013-05-02 09:03:55 -07:00
Frederic Weisbecker c032862fba Merge commit '8700c95adb03' into timers/nohz
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
	include/linux/perf_event.h
	kernel/rcutree.h
	kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-05-02 17:54:19 +02:00
David Miller 4ada8db38a net: Restore NETIF_F_* bit ordering.
Commit 8ad227ff89 ("net: vlan: add 802.1ad support") added some new
NETIF_F_* features bits, but it added them in the middle of existing
values.

Userland depends upon the flag bits via the per-netdevice 'flags' sysfs
file.

So restore the previous ordering by adding the new flags at the end.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-02 07:34:58 -07:00
Alex Deucher 62d1f92e06 drm/radeon: add new richland pci ids
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-05-02 10:01:49 -04:00
Alex Deucher 18932a2841 drm/radeon: add some new SI PCI ids
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-05-02 10:01:48 -04:00
Paul Mackerras 5975a2e095 KVM: PPC: Book3S: Add API for in-kernel XICS emulation
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:36 +02:00
Michael S. Tsirkin 5012a3a384 tcm_vhost: header split up
move uapi parts to vhost.h
move .c private parts to .c itself

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-02 13:40:15 +03:00
Joerg Roedel 0c4513be3d Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and 'arm/tegra' into next 2013-05-02 12:10:19 +02:00
Alex Elder 4f0dcb10cf libceph: create source file "net/ceph/snapshot.c"
This creates a new source file "net/ceph/snapshot.c" to contain
utility routines related to ceph snapshot contexts.  The main
motivation was to define ceph_create_snap_context() as a common way
to create these structures, but I've moved the definitions of
ceph_get_snap_context() and ceph_put_snap_context() there too.
(The benefit of inlining those is very small, and I'd rather
keep this collection of functions together.)

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:20:08 -07:00
Alex Elder c3f56102f2 libceph: validate timespec conversions
A ceph timespec contains 32-bit unsigned values for its seconds and
nanoseconds components.  For a standard timespec, both fields are
signed, and the seconds field is almost surely 64 bits.

Add some explicit casts so the fact that this conversion is taking
place is obvious.  Also trip a bug if we ever try to put out of
range (negative or too big) values into a ceph timespec.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:17 -07:00
Alex Elder b587398a4f libceph: add signed type limits
Flesh out the limits defined in <linux/ceph/decode.h> to include the
maximum and minimum values for signed type S8, S16, S32, and S64.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:16 -07:00
Alex Elder 6c57b5545d libceph: support pages for class request data
Add the ability to provide an array of pages as outbound request
data for object class method calls.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:06 -07:00
Alex Elder 49719778bf libceph: support raw data requests
Allow osd request ops that aren't otherwise structured (not class,
extent, or watch ops) to specify "raw" data to be used to hold
incoming data for the op.  Make use of this capability for the osd
STAT op.

Prefix the name of the private function osd_req_op_init() with "_",
and expose a new function by that (earlier) name whose purpose is to
initialize osd ops with (only) implied data.

For now we'll just support the use of a page array for an osd op
with incoming raw data.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:00 -07:00
Alex Elder 406e2c9f92 libceph: kill off osd data write_request parameters
In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data.  Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:58 -07:00
Alex Elder 26be88087a libceph: change how "safe" callback is used
An osd request currently has two callbacks.  They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.

The only time the second callback is used is in the ceph file system
for a synchronous write.  There's a race that makes some handling of
this case unsafe.  This patch addresses this problem.  The error
handling for this callback is also kind of gross, and this patch
changes that as well.

In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list.  Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns.  The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.

To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*.  So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe).  The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time.  That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.

We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe.  This
will avoid the race because this call will be made under protection
of the osd client's request mutex.  It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.

The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose.  It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.

This resolves the original problem reportedin:
    http://tracker.ceph.com/issues/4706

Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01 21:18:52 -07:00
Alex Elder 04017e29bb libceph: make method call data be a separate data item
Right now the data for a method call is specified via a pointer and
length, and it's copied--along with the class and method name--into
a pagelist data item to be sent to the osd.  Instead, encode the
data in a data item separate from the class and method names.

This will allow large amounts of data to be supplied to methods
without copying.  Only rbd uses the class functionality right now,
and when it really needs this it will probably need to use a page
array rather than a page list.  But this simple implementation
demonstrates the functionality on the osd client, and that's enough
for now.

This resolves:
    http://tracker.ceph.com/issues/4104

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:35 -07:00
Alex Elder 90af36022a libceph: add, don't set data for a message
Change the names of the functions that put data on a pagelist to
reflect that we're adding to whatever's already there rather than
just setting it to the one thing.  Currently only one data item is
ever added to a message, but that's about to change.

This resolves:
    http://tracker.ceph.com/issues/2770

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:34 -07:00
Alex Elder ca8b3a6917 libceph: implement multiple data items in a message
This patch adds support to the messenger for more than one data item
in its data list.

A message data cursor has two more fields to support this:
    - a count of the number of bytes left to be consumed across
      all data items in the list, "total_resid"
    - a pointer to the head of the list (for validation only)

The cursor initialization routine has been split into two parts: the
outer one, which initializes the cursor for traversing the entire
list of data items; and the inner one, which initializes the cursor
to start processing a single data item.

When a message cursor is first initialized, the outer initialization
routine sets total_resid to the length provided.  The data pointer
is initialized to the first data item on the list.  From there, the
inner initialization routine finishes by setting up to process the
data item the cursor points to.

Advancing the cursor consumes bytes in total_resid.  If the resid
field reaches zero, it means the current data item is fully
consumed.  If total_resid indicates there is more data, the cursor
is advanced to point to the next data item, and then the inner
initialization routine prepares for using that.  (A check is made at
this point to make sure we don't wrap around the front of the list.)

The type-specific init routines are modified so they can be given a
length that's larger than what the data item can support.  The resid
field is initialized to the smaller of the provided length and the
length of the entire data item.

When total_resid reaches zero, we're done.

This resolves:
    http://tracker.ceph.com/issues/3761

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:33 -07:00
Alex Elder 5240d9f95d libceph: replace message data pointer with list
In place of the message data pointer, use a list head which links
through message data items.  For now we only support a single entry
on that list.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:32 -07:00
Alex Elder 8ae4f4f5c0 libceph: have cursor point to data
Rather than having a ceph message data item point to the cursor it's
associated with, have the cursor point to a data item.  This will
allow a message cursor to be used for more than one data item.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:30 -07:00
Alex Elder 36153ec9dd libceph: move cursor into message
A message will only be processing a single data item at a time, so
there's no need for each data item to have its own cursor.

Move the cursor embedded in the message data structure into the
message itself.  To minimize the impact, keep the data->cursor
field, but make it be a pointer to the cursor in the message.

Move the definition of ceph_msg_data above ceph_msg_data_cursor so
the cursor can point to the data without a forward definition rather
than vice-versa.

This and the upcoming patches are part of:
    http://tracker.ceph.com/issues/3761

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:29 -07:00
Alex Elder c851c49591 libceph: record bio length
The bio is the only data item type that doesn't record its full
length.  Fix that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:28 -07:00
Alex Elder ea96571f7b libceph: fix possible CONFIG_BLOCK build problem
This patch:
    15a0d7b libceph: record message data length
did not enclose some bio-specific code inside CONFIG_BLOCK as
it should have.  Fix that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:26 -07:00
Alex Elder 5476492fba libceph: kill off osd request r_data_in and r_data_out
Finally!  Convert the osd op data pointers into real structures, and
make the switch over to using them instead of having all ops share
the in and/or out data structures in the osd request.

Set up a new function to traverse the set of ops and release any
data associated with them (pages).

This and the patches leading up to it resolve:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:25 -07:00
Alex Elder ec9123c567 libceph: set the data pointers when encoding ops
Still using the osd request r_data_in and r_data_out pointer, but
we're basically only referring to it via the data pointers in the
osd ops.  And we're transferring that information to the request
or reply message only when the op indicates it's needed, in
osd_req_encode_op().

To avoid a forward reference, ceph_osdc_msg_data_set() was moved up
in the file.

Don't bother calling ceph_osd_data_init(), in ceph_osd_alloc(),
because the ops array will already be zeroed anyway.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:24 -07:00
Alex Elder a4ce40a9a7 libceph: combine initializing and setting osd data
This ends up being a rather large patch but what it's doing is
somewhat straightforward.

Basically, this is replacing two calls with one.  The first of the
two calls is initializing a struct ceph_osd_data with data (either a
page array, a page list, or a bio list); the second is setting an
osd request op so it associates that data with one of the op's
parameters.  In place of those two will be a single function that
initializes the op directly.

That means we sort of fan out a set of the needed functions:
    - extent ops with pages data
    - extent ops with pagelist data
    - extent ops with bio list data
and
    - class ops with page data for receiving a response

We also have define another one, but it's only used internally:
    - class ops with pagelist data for request parameters

Note that we *still* haven't gotten rid of the osd request's
r_data_in and r_data_out fields.  All the osd ops refer to them for
their data.  For now, these data fields are pointers assigned to the
appropriate r_data_* field when these new functions are called.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:23 -07:00
Alex Elder 5f562df5f5 libceph: format class info at init time
An object class method is formatted using a pagelist which contains
the class name, the method name, and the data concatenated into an
osd request's outbound data.

Currently when a class op is initialized in osd_req_op_cls_init(),
the lengths of and pointers to these three items are recorded.
Later, when the op is getting formatted into the request message, a
new pagelist is created and that is when these items get copied into
the pagelist.

This patch makes it so the pagelist to hold these items is created
when the op is initialized instead.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:19 -07:00
Alex Elder c99d2d4abb libceph: specify osd op by index in request
An osd request now holds all of its source op structures, and every
place that initializes one of these is in fact initializing one
of the entries in the the osd request's array.

So rather than supplying the address of the op to initialize, have
caller specify the osd request and an indication of which op it
would like to initialize.  This better hides the details the
op structure (and faciltates moving the data pointers they use).

Since osd_req_op_init() is a common routine, and it's not used
outside the osd client code, give it static scope.  Also make
it return the address of the specified op (so all the other
init routines don't have to repeat that code).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:15 -07:00
Alex Elder 8c042b0df9 libceph: add data pointers in osd op structures
An extent type osd operation currently implies that there will
be corresponding data supplied in the data portion of the request
(for write) or response (for read) message.  Similarly, an osd class
method operation implies a data item will be supplied to receive
the response data from the operation.

Add a ceph_osd_data pointer to each of those structures, and assign
it to point to eithre the incoming or the outgoing data structure in
the osd message.  The data is not always available when an op is
initially set up, so add two new functions to allow setting them
after the op has been initialized.

Begin to make use of the data item pointer available in the osd
operation rather than the request data in or out structure in
places where it's convenient.  Add some assertions to verify
pointers are always set the way they're expected to be.

This is a sort of stepping stone toward really moving the data
into the osd request ops, to allow for some validation before
making that jump.

This is the first in a series of patches that resolve:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:14 -07:00
Alex Elder 54d5064912 libceph: rename data out field in osd request op
There are fields "indata" and "indata_len" defined the ceph osd
request op structure.  The "in" part is with from the point of view
of the osd server, but is a little confusing here on the client
side.  Change their names to use "request" instead of "in" to
indicate that it defines data provided with the request (as opposed
the data returned in the response).

Rename the local variable in osd_req_encode_op() to match.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:13 -07:00
Alex Elder 79528734f3 libceph: keep source rather than message osd op array
An osd request keeps a pointer to the osd operations (ops) array
that it builds in its request message.

In order to allow each op in the array to have its own distinct
data, we will need to keep track of each op's data, and that
information does not go over the wire.

As long as we're tracking the data we might as well just track the
entire (source) op definition for each of the ops.  And if we're
doing that, we'll have no more need to keep a pointer to the
wire-encoded version.

This patch makes the array of source ops be kept with the osd
request structure, and uses that instead of the version encoded in
the message in places where that was previously used.  The array
will be embedded in the request structure, and the maximum number of
ops we ever actually use is currently 2.  So reduce CEPH_OSD_MAX_OP
to 2 to reduce the size of the structure.

The result of doing this sort of ripples back up, and as a result
various function parameters and local variables become unnecessary.

Make r_num_ops be unsigned, and move the definition of struct
ceph_osd_req_op earlier to ensure it's defined where needed.

It does not yet add per-op data, that's coming soon.

This resolves:
    http://tracker.ceph.com/issues/4656

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:12 -07:00
Alex Elder 43bfe5de9f libceph: define osd data initialization helpers
Define and use functions that encapsulate the initializion of a
ceph_osd_data structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:06 -07:00
Alex Elder e5975c7c8e ceph: build osd request message later for writepages
Hold off building the osd request message in ceph_writepages_start()
until just before it will be submitted to the osd client for
execution.

We'll still create the request and allocate the page pointer array
after we learn we have at least one page to write.  A local variable
will be used to keep track of the allocated array of pages.  Wait
until just before submitting the request for assigning that page
array pointer to the request message.

Create ands use a new function osd_req_op_extent_update() whose
purpose is to serve this one spot where the length value supplied
when an osd request's op was initially formatted might need to get
changed (reduced, never increased) before submitting the request.

Previously, ceph_writepages_start() assigned the message header's
data length because of this update.  That's no longer necessary,
because ceph_osdc_build_request() will recalculate the right
value to use based on the content of the ops in the request.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:02 -07:00
Alex Elder acead002b2 libceph: don't build request in ceph_osdc_new_request()
This patch moves the call to ceph_osdc_build_request() out of
ceph_osdc_new_request() and into its caller.

This is in order to defer formatting osd operation information into
the request message until just before request is started.

The only unusual (ab)user of ceph_osdc_build_request() is
ceph_writepages_start(), where the final length of write request may
change (downward) based on the current inode size or the oldest
snapshot context with dirty data for the inode.

The remaining callers don't change anything in the request after has
been built.

This means the ops array is now supplied by the caller.  It also
means there is no need to pass the mtime to ceph_osdc_new_request()
(it gets provided to ceph_osdc_build_request()).  And rather than
passing a do_sync flag, have the number of ops in the ops array
supplied imply adding a second STARTSYNC operation after the READ or
WRITE requested.

This and some of the patches that follow are related to having the
messenger (only) be responsible for filling the content of the
message header, as described here:
    http://tracker.ceph.com/issues/4589

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:58 -07:00
Alex Elder a193080481 libceph: record message data length
Keep track of the length of the data portion for a message in a
separate field in the ceph_msg structure.  This information has
been maintained in wire byte order in the message header, but
that's going to change soon.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:57 -07:00
Alex Elder fdce58ccb5 libceph: record length of bio list with bio
When assigning a bio pointer to an osd request, we don't have an
efficient way of knowing the total length bytes in the bio list.
That information is available at the point it's set up by the rbd
code, so record it with the osd data when it's set.

This and the next patch are related to maintaining the length of a
message's data independent of the message header, as described here:
    http://tracker.ceph.com/issues/4589

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:56 -07:00
Alex Elder ace6d3a96f libceph: drop ceph_osd_request->r_con_filling_msg
A field in an osd request keeps track of whether a connection is
currently filling the request's reply message.  This patch gets rid
of that field.

An osd request includes two messages--a request and a reply--and
they're both associated with the connection that existed to its
the target osd at the time the request was created.

An osd request can be dropped early, even when it's in flight.
And at that time both messages are released.  It's possible the
reply message has been supplied to its connection to receive
an incoming response message at the time the osd request gets
dropped.  So ceph_osdc_release_request() revokes that message
from the connection before releasing it so things get cleaned up
properly.

Previously this may have caused a problem, because the connection
that a message was associated with might have gone away before the
revoke request.  And to avoid any problems using that connection,
the osd client held a reference to it when it supplies its response
message.

However since this commit:
    38941f80 libceph: have messages point to their connection
all messages hold a reference to the connection they are associated
with whenever the connection is actively operating on the message
(i.e. while the message is queued to send or sending, and when it
data is being received into it).  And if a message has no connection
associated with it, ceph_msg_revoke_incoming() won't do anything
when asked to revoke it.

As a result, there is no need to keep an additional reference to the
connection associated with a message when we hand the message to the
messenger when it calls our alloc_msg() method to receive something.
If the connection *were* operating on it, it would have its own
reference, and if not, there's no work to be done when we need to
revoke it.

So get rid of the osd request's r_con_filling_msg field.

This resolves:
    http://tracker.ceph.com/issues/4647

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:54 -07:00
Alex Elder ef4859d647 libceph: define ceph_decode_pgid() only once
There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c".  Get rid of both, and instead define
a single inline version in "include/linux/ceph/osdmap.h".

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:52 -07:00
Alex Elder 33803f3300 libceph: define source request op functions
The rbd code has a function that allocates and populates a
ceph_osd_req_op structure (the in-core version of an osd request
operation).  When reviewed, Josh suggested two things: that the
big varargs function might be better split into type-specific
functions; and that this functionality really belongs in the osd
client rather than rbd.

This patch implements both of Josh's suggestions.  It breaks
up the rbd function into separate functions and defines them
in the osd client module as exported interfaces.  Unlike the
rbd version, however, the functions don't allocate an osd_req_op
structure; they are provided the address of one and that is
initialized instead.

The rbd function has been eliminated and calls to it have been
replaced by calls to the new routines.  The rbd code now now use a
stack (struct) variable to hold the op rather than allocating and
freeing it each time.

For now only the capabilities used by rbd are implemented.
Implementing all the other osd op types, and making the rest of the
code use it will be done separately, in the next few patches.

Note that only the extent, cls, and watch portions of the
ceph_osd_req_op structure are currently used.  Delete the others
(xattr, pgls, and snap) from its definition so nobody thinks it's
actually implemented or needed.  We can add it back again later
if needed, when we know it's been tested.

This (and a few follow-on patches) resolves:
    http://tracker.ceph.com/issues/3861

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:45 -07:00
Alex Elder adfe695a25 ceph: move max constant definitions
Move some definitions for max integer values out of the rbd code and
into the more central "decode.h" header file.  These really belong
in a Linux (or libc) header somewhere, but I haven't gotten around
to proposing that yet.

This is in preparation for moving some code out of rbd.c and into
the osd client.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:42 -07:00
Alex Elder 6644ed7b7e libceph: make message data be a pointer
Begin the transition from a single message data item to a list of
them by replacing the "data" structure in a message with a pointer
to a ceph_msg_data structure.

A null pointer will indicate the message has no data; replace the
use of ceph_msg_has_data() with a simple check for a null pointer.

Create functions ceph_msg_data_create() and ceph_msg_data_destroy()
to dynamically allocate and free a data item structure of a given type.

When a message has its data item "set," allocate one of these to
hold the data description, and free it when the last reference to
the message is dropped.

This partially resolves:
    http://tracker.ceph.com/issues/4429

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:37 -07:00
Alex Elder f5db90bcf2 libceph: kill last of ceph_msg_pos
The only remaining field in the ceph_msg_pos structure is
did_page_crc.  In the new cursor model of things that flag (or
something like it) belongs in the cursor.

Define a new field "need_crc" in the cursor (which applies to all
types of data) and initialize it to true whenever a cursor is
initialized.

In write_partial_message_data(), the data CRC still will be computed
as before, but it will check the cursor->need_crc field to determine
whether it's needed.  Any time the cursor is advanced to a new piece
of a data item, need_crc will be set, and this will cause the crc
for that entire piece to be accumulated into the data crc.

In write_partial_message_data() the intermediate crc value is now
held in a local variable so it doesn't have to be byte-swapped so
many times.  In read_partial_msg_data() we do something similar
(but mainly for consistency there).

With that, the ceph_msg_pos structure can go away,  and it no longer
needs to be passed as an argument to prepare_message_data().

This cleanup is related to:
    http://tracker.ceph.com/issues/4428

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:34 -07:00
Alex Elder 859a35d552 libceph: kill most of ceph_msg_pos
All but one of the fields in the ceph_msg_pos structure are now
never used (only assigned), so get rid of them.  This allows
several small blocks of code to go away.

This is cleanup of old code related to:
    http://tracker.ceph.com/issues/4428

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:33 -07:00
Alex Elder 4c59b4a278 libceph: collapse all data items into one
It turns out that only one of the data item types is ever used at
any one time in a single message (currently).
    - A page array is used by the osd client (on behalf of the file
      system) and by rbd.  Only one osd op (and therefore at most
      one data item) is ever used at a time by rbd.  And the only
      time the file system sends two, the second op contains no
      data.
    - A bio is only used by the rbd client (and again, only one
      data item per message)
    - A page list is used by the file system and by rbd for outgoing
      data, but only one op (and one data item) at a time.

We can therefore collapse all three of our data item fields into a
single field "data", and depend on the messenger code to properly
handle it based on its type.

This allows us to eliminate quite a bit of duplicated code.

This is related to:
    http://tracker.ceph.com/issues/4429

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:30 -07:00
Alex Elder 6518be47f9 libceph: kill ceph message bio_iter, bio_seg
The bio_iter and bio_seg fields in a message are no longer used, we
use the cursor instead.  So get rid of them and the functions that
operate on them them.

This is related to:
    http://tracker.ceph.com/issues/4428

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:26 -07:00
Alex Elder 25aff7c559 libceph: record residual bytes for all message data types
All of the data types can use this, not just the page array.  Until
now, only the bio type doesn't have it available, and only the
initiator of the request (the rbd client) is able to supply the
length of the full request without re-scanning the bio list.  Change
the cursor init routines so the length is supplied based on the
message header "data_len" field, and use that length to intiialize
the "resid" field of the cursor.

In addition, change the way "last_piece" is defined so it is based
on the residual number of bytes in the original request.  This is
necessary (at least for bio messages) because it is possible for
a read request to succeed without consuming all of the space
available in the data buffer.

This resolves:
    http://tracker.ceph.com/issues/4427

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:24 -07:00
Sage Weil e9966076cd libceph: wrap auth methods in a mutex
The auth code is called from a variety of contexts, include the mon_client
(protected by the monc's mutex) and the messenger callbacks (currently
protected by nothing).  Avoid chaos by protecting all auth state with a
mutex.  Nothing is blocking, so this should be simple and lightweight.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-05-01 21:17:15 -07:00
Sage Weil 27859f9773 libceph: wrap auth ops in wrapper functions
Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-05-01 21:17:14 -07:00
Sage Weil 0bed9b5c52 libceph: add update_authorizer auth method
Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-05-01 21:17:13 -07:00
Sage Weil 3a23083bda libceph: implement RECONNECT_SEQ feature
This is an old protocol extension that allows the client and server to
avoid resending old messages after a reconnect (following a socket error).
Instead, the exchange their sequence numbers during the handshake.  This
avoids sending a bunch of useless data over the socket.

It has been supported in the server code since v0.22 (Sep 2010).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-05-01 21:17:09 -07:00
Alex Elder 9d2a06c275 libceph: kill message trail
The wart that is the ceph message trail can now be removed, because
its only user was the osd client, and the previous patch made that
no longer the case.

The result allows write_partial_msg_pages() to be simplified
considerably.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:05 -07:00
Alex Elder 95e072eb38 libceph: kill osd request r_trail
The osd trail is a pagelist, used only for a CALL osd operation
to hold the class and method names, along with any input data for
the call.

It is only currently used by the rbd client, and when it's used it
is the only bit of outbound data in the osd request.  Since we
already support (non-trail) pagelist data in a message, we can
just save this outbound CALL data in the "normal" pagelist rather
than the trail, and get rid of the trail entirely.

The existing pagelist support depends on the pagelist being
dynamically allocated, and ownership of it is passed to the
messenger once it's been attached to a message.  (That is to say,
the messenger releases and frees the pagelist when it's done with
it).  That means we need to dynamically allocate the pagelist also.

Note that we simply assert that the allocation of a pagelist
structure succeeds.  Appending to a pagelist might require a dynamic
allocation, so we're already assuming we won't run into trouble
doing so (we're just ignore any failures--and that should be fixed
at some point).

This resolves:
    http://tracker.ceph.com/issues/4407

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:04 -07:00
Alex Elder 9a5e6d09dd libceph: have osd requests support pagelist data
Add support for recording a ceph pagelist as data associated with an
osd request.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:03 -07:00
Alex Elder 175face2ba libceph: let osd ops determine request data length
The length of outgoing data in an osd request is dependent on the
osd ops that are embedded in that request.  Each op is encoded into
a request message using osd_req_encode_op(), so that should be used
to determine the amount of outgoing data implied by the op as it
is encoded.

Have osd_req_encode_op() return the number of bytes of outgoing data
implied by the op being encoded, and accumulate and use that in
ceph_osdc_build_request().

As a result, ceph_osdc_build_request() no longer requires its "len"
parameter, so get rid of it.

Using the sum of the op lengths rather than the length provided is
a valid change because:
    - The only callers of osd ceph_osdc_build_request() are
      rbd and the osd client (in ceph_osdc_new_request() on
      behalf of the file system).
    - When rbd calls it, the length provided is only non-zero for
      write requests, and in that case the single op has the
      same length value as what was passed here.
    - When called from ceph_osdc_new_request(), (it's not all that
      easy to see, but) the length passed is also always the same
      as the extent length encoded in its (single) write op if
      present.

This resolves:
    http://tracker.ceph.com/issues/4406

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:02 -07:00
Alex Elder e766d7b55e libceph: implement pages array cursor
Implement and use cursor routines for page array message data items
for outbound message data.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:01 -07:00
Alex Elder 6aaa4511de libceph: implement bio message data item cursor
Implement and use cursor routines for bio message data items for
outbound message data.

(See the previous commit for reasoning in support of the changes
in out_msg_pos_next().)

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:59 -07:00
Alex Elder dd236fcb65 libceph: prepare for other message data item types
This just inserts some infrastructure in preparation for handling
other types of ceph message data items.  No functional changes,
just trying to simplify review by separating out some noise.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:57 -07:00
Alex Elder fe38a2b67b libceph: start defining message data cursor
This patch lays out the foundation for using generic routines to
manage processing items of message data.

For simplicity, we'll start with just the trail portion of a
message, because it stands alone and is only present for outgoing
data.

First some basic concepts.  We'll use the term "data item" to
represent one of the ceph_msg_data structures associated with a
message.  There are currently four of those, with single-letter
field names p, l, b, and t.  A data item is further broken into
"pieces" which always lie in a single page.  A data item will
include a "cursor" that will track state as the memory defined by
the item is consumed by sending data from or receiving data into it.

We define three routines to manipulate a data item's cursor: the
"init" routine; the "next" routine; and the "advance" routine.  The
"init" routine initializes the cursor so it points at the beginning
of the first piece in the item.  The "next" routine returns the
page, page offset, and length (limited by both the page and item
size) of the next unconsumed piece in the item.  It also indicates
to the caller whether the piece being returned is the last one in
the data item.

The "advance" routine consumes the requested number of bytes in the
item (advancing the cursor).  This is used to record the number of
bytes from the current piece that were actually sent or received by
the network code.  It returns an indication of whether the result
means the current piece has been fully consumed.  This is used by
the message send code to determine whether it should calculate the
CRC for the next piece processed.

The trail of a message is implemented as a ceph pagelist.  The
routines defined for it will be usable for non-trail pagelist data
as well.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:56 -07:00
Alex Elder 437945094f libceph: abstract message data
Group the types of message data into an abstract structure with a
type indicator and a union containing fields appropriate to the
type of data it represents.  Use this to represent the pages,
pagelist, bio, and trail in a ceph message.

Verify message data is of type NONE in ceph_msg_data_set_*()
routines.  Since information about message data of type NONE really
should not be interpreted, get rid of the other assertions in those
functions.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:55 -07:00
Alex Elder f9e15777af libceph: be explicit about message data representation
A ceph message has a data payload portion.  The memory for that data
(either the source of data to send or the location to place data
that is received) is specified in several ways.  The ceph_msg
structure includes fields for all of those ways, but this
mispresents the fact that not all of them are used at a time.

Specifically, the data in a message can be in:
    - an array of pages
    - a list of pages
    - a list of Linux bios
    - a second list of pages (the "trail")
(The two page lists are currently only ever used for outgoing data.)

Impose more structure on the ceph message, making the grouping of
some of these fields explicit.  Shorten the name of the
"page_alignment" field.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:54 -07:00
Alex Elder 97fb1c7f66 libceph: define ceph_msg_has_*() data macros
Define and use macros ceph_msg_has_*() to determine whether to
operate on the pages, pagelist, bio, and trail fields of a message.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:53 -07:00
Alex Elder 4a73ef27ad libceph: record message data byte length
Record the number of bytes of data in a page array rather than the
number of pages in the array.  It can be assumed that the page array
is of sufficient size to hold the number of bytes indicated (and
offset by the indicated alignment).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:42 -07:00
Alex Elder 27fa83852b libceph: isolate other message data fields
Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and
ceph_msg_data_set_trail() to clearly abstract the assignment of the
remaining data-related fields in a ceph message structure.  Use the
new functions in the osd client and mds client.

This partially resolves:
    http://tracker.ceph.com/issues/4263

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:40 -07:00
Alex Elder f1baeb2b9f libceph: set page info with byte length
When setting page array information for message data, provide the
byte length rather than the page count ceph_msg_data_set_pages().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:39 -07:00
Alex Elder 02afca6ca0 libceph: isolate message page field manipulation
Define a function ceph_msg_data_set_pages(), which more clearly
abstracts the assignment page-related fields for data in a ceph
message structure.  Use this new function in the osd client and mds
client.

Ideally, these fields would never be set more than once (with
BUG_ON() calls to guarantee that).  At the moment though the osd
client sets these every time it receives a message, and in the event
of a communication problem this can happen more than once.  (This
will be resolved shortly, but setting up these helpers first makes
it all a bit easier to work with.)

Rearrange the field order in a ceph_msg structure to group those
that are used to define the possible data payloads.

This partially resolves:
    http://tracker.ceph.com/issues/4263

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:38 -07:00
Alex Elder e0c594878e libceph: record byte count not page count
Record the byte count for an osd request rather than the page count.
The number of pages can always be derived from the byte count (and
alignment/offset) but the reverse is not true.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:36 -07:00
Alex Elder 7b11ba3758 libceph: define CEPH_MSG_MAX_MIDDLE_LEN
This is probably unnecessary but the code read as if it were wrong
in read_partial_message().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:29 -07:00
Alex Elder 0fff87ec79 libceph: separate read and write data
An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.

Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.

This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data.  It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.

This resolves:
    http://tracker.ceph.com/issues/4127

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:27 -07:00
Alex Elder 2ac2b7a6d4 libceph: distinguish page and bio requests
An osd request uses either pages or a bio list for its data.  Use a
union to record information about the two, and add a data type
tag to select between them.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:25 -07:00
Alex Elder 2794a82a11 libceph: separate osd request data info
Pull the fields in an osd request structure that define the data for
the request out into a separate structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:24 -07:00
Alex Elder 153e5167e0 libceph: don't assign page info in ceph_osdc_new_request()
Currently ceph_osdc_new_request() assigns an osd request's
r_num_pages and r_alignment fields.  The only thing it does
after that is call ceph_osdc_build_request(), and that doesn't
need those fields to be assigned.

Move the assignment of those fields out of ceph_osdc_new_request()
and into its caller.  As a result, the page_align parameter is no
longer used, so get rid of it.

Note that in ceph_sync_write(), the value for req->r_num_pages had
already been calculated earlier (as num_pages, and fortunately
it was computed the same way).  So don't bother recomputing it,
but because it's not needed earlier, move that calculation after the
call to ceph_osdc_new_request().  Hold off making the assignment to
r_alignment, doing it instead r_pages and r_num_pages are
getting set.

Similarly, in start_read(), nr_pages already holds the number of
pages in the array (and is calculated the same way), so there's no
need to recompute it.  Move the assignment of the page alignment
down with the others there as well.

This and the next few patches are preparation work for:
    http://tracker.ceph.com/issues/4127

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:23 -07:00
Alex Elder 41766f87f5 libceph: rename ceph_calc_object_layout()
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure.  Only update the ceph_pg if the pool is found
in the osd map.  Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:17 -07:00
Alex Elder ec02a2f2ff libceph: kill ceph_msg->pagelist_count
The pagelist_count field is never actually used, so get rid of it.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:16 -07:00
Alex Elder 2a24d1f4bd libceph: use (void *) for untyped data in osd ops
Two of the fields defining osd operations are defined using (char *)
while the data they represent are really untyped, not character
strings.  Change them to have type (void *).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:15 -07:00
Alex Elder 0d5af16435 libceph: complete lingering requests only once
An osd request marked to linger will be re-submitted in the event
a connection to the target osd gets dropped.  Currently, if there
is a callback function associated with a request it will be called
each time a request is submitted--which for lingering requests can
be more than once.

Change it so a request--including lingering ones--will get completed
(from the perspective of the user of the osd client) exactly once.

This resolves:
    http://tracker.ceph.com/issues/3967

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:12 -07:00
Alex Elder d4b515fa10 libceph: distinguish page array and pagelist count
Use distinct fields for tracking the number of pages in a message's
page array and in a message's page list.  Currently only one or the
other is used at a time, but that will be changing soon.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:14:28 -07:00
Alex Elder 07c09b7255 libceph: make ceph_msg->bio_seg be unsigned
The bio_seg field is used by the ceph messenger in iterating through
a bio.  It should never have a negative value, so make it an
unsigned.  (I contemplated making it unsigned short to match the
struct bio definition, but it offered no benefit.)

Change variables used to hold bio_seg values to all be unsigned as
well.  Change two variable names in init_bio_iter() to match the
convention used everywhere else.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:14:23 -07:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Linus Torvalds b9394d8a65 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/efi changes from Peter Anvin:
 "The bulk of these changes are cleaning up the efivars handling and
  breaking it up into a tree of files.  There are a number of fixes as
  well.

  The entire changeset is pretty big, but most of it is code movement.

  Several of these commits are quite new; the history got very messed up
  due to a mismerge with the urgent changes for rc8 which completely
  broke IA64, and so Ingo requested that we rebase it to straighten it
  out."

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: remove "kfree(NULL)"
  efi: locking fix in efivar_entry_set_safe()
  efi, pstore: Read data from variable store before memcpy()
  efi, pstore: Remove entry from list when erasing
  efi, pstore: Initialise 'entry' before iterating
  efi: split efisubsystem from efivars
  efivarfs: Move to fs/efivarfs
  efivars: Move pstore code into the new EFI directory
  efivars: efivar_entry API
  efivars: Keep a private global pointer to efivars
  efi: move utf16 string functions to efi.h
  x86, efi: Make efi_memblock_x86_reserve_range more readable
  efivarfs: convert to use simple_open()
2013-05-01 15:51:46 -07:00
James Hogan 126de6b20b linkage.h: fix build breakage due to symbol prefix handling
Al's commit e1b5bb6d12 ("consolidate cond_syscall and SYSCALL_ALIAS
declarations") broke the build on blackfin and metag due to the
following code:

  #ifndef SYMBOL_NAME
  #ifdef CONFIG_SYMBOL_PREFIX
  #define SYMBOL_NAME(x) CONFIG_SYMBOL_PREFIX ## x
  #else
  #define SYMBOL_NAME(x) x
  #endif
  #endif
  #define __SYMBOL_NAME(x) __stringify(SYMBOL_NAME(x))

__stringify literally stringifies CONFIG_SYMBOL_PREFIX ##x, so you get
lines like this in kernel/sys_ni.s:

  .weak CONFIG_SYMBOL_PREFIXsys_quotactl
  .set CONFIG_SYMBOL_PREFIXsys_quotactl,CONFIG_SYMBOL_PREFIXsys_ni_syscall

The patches in Rusty's modules-next tree such as "CONFIG_SYMBOL_PREFIX:
cleanup." cleans up the whole mess around symbol prefixes, so this patch
just attempts to fix the build in the meantime.

The intermediate definition of SYMBOL_NAME above isn't used and is
incorrect when CONFIG_SYMBOL_PREFIX is defined as CONFIG_SYMBOL_PREFIX
is a quoted string literal, so define __SYMBOL_NAME directly depending
on CONFIG_SYMBOL_PREFIX.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Mea-culpa-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 15:51:29 -07:00
Al Viro ac3e3c5b11 don't bother with deferred freeing of fdtables
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:31:42 -04:00
David Howells 59d8053f1e proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
Move non-public declarations and definitions from linux/proc_fs.h to
fs/proc/internal.h.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:47 -04:00
David Howells c30480b92c proc: Make the PROC_I() and PDE() macros internal to procfs
Make the PROC_I() and PDE() macros internal to procfs.  This means making
PDE_DATA() out of line.  This could be made more optimal by storing
PDE()->data into inode->i_private.

Also provide a __PDE_DATA() that is inline and internal to procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:47 -04:00
David Howells a8ca16ea7b proc: Supply a function to remove a proc entry by PDE
Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer.  This allows us to eliminate all remaining pde->name
accesses outside of procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Grant Likely <grant.likely@linaro.or>
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:46 -04:00
Al Viro 8d8b97ba49 take cgroup_open() and cpuset_open() to fs/proc/base.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:46 -04:00
David Howells b63e6aa502 drm: proc: Use minor->index to label things, not PDE->name
Use minor->index to label things, not the name field from the proc_dir_entry
of the /proc/dwm/<minor>/ directory.

Also, use "%u" not "%d" to render the value and use a 12-byte buffer in which
to render the integer, not a 16-byte buffer.  The longest string an unsigned
int can give you is 10 chars (4294967295) plus a NUL, so round up to 12 as the
stack is likely to be 4- or 8-byte aligned.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: dri-devel@lists.freedesktop.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:44 -04:00
David Howells ce089b5472 drm: Constify drm_proc_list[]
Constify drm_proc_list[] and related pointers.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: dri-devel@lists.freedesktop.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:43 -04:00
David Howells 4a520d2769 proc: Supply an accessor for getting the data from a PDE's parent
Supply an accessor function for getting the private data from the parent
proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

ReiserFS, for instance, stores the super_block pointer in the proc directory
it makes for that super_block, and a pointer to the respective seq_file show
function in each of the proc files in that directory.

This allows a reduction in the number of file_operations structs, open
functions and seq_operations structs required.  The problem otherwise is that
each show function requires two pieces of data but only has storage for one
per PDE (and this has no release function).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: Maxim Mikityanskiy <maxtram95@gmail.com>
cc: YAMANE Toshiaki <yamanetoshi@gmail.com>
cc: linux-wireless@vger.kernel.org
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:42 -04:00
David Howells 270b5ac215 proc: Add proc_mkdir_data()
Add proc_mkdir_data() to allow procfs directories to be created that are
annotated at the time of creation with private data rather than doing this
post-creation.  This means no access is then required to the proc_dir_entry
struct to set this.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Neela Syam Kolli <megaraidlinux@lsi.com>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
cc: linux-wireless@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:41 -04:00
David Howells 34db8aaf0f proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h.

Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as
they're internal to procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-arch@vger.kernel.org
cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jri Slaby <jslaby@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:40 -04:00
David Howells 4abfd02989 proc: Move PDE_NET() to fs/proc/proc_net.c
Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:40 -04:00
David Howells 0bb80f2405 proc: Split the namespace stuff out into linux/proc_ns.h
Split the proc namespace stuff out into linux/proc_ns.h.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:39 -04:00
David Howells 271a15eabe proc: Supply PDE attribute setting accessor functions
Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:18 -04:00
Linus Torvalds 73287a43cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights (1721 non-merge commits, this has to be a record of some
  sort):

   1) Add 'random' mode to team driver, from Jiri Pirko and Eric
      Dumazet.

   2) Make it so that any driver that supports configuration of multiple
      MAC addresses can provide the forwarding database add and del
      calls by providing a default implementation and hooking that up if
      the driver doesn't have an explicit set of handlers.  From Vlad
      Yasevich.

   3) Support GSO segmentation over tunnels and other encapsulating
      devices such as VXLAN, from Pravin B Shelar.

   4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

   5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
      Dukkipati.

   6) In the PHY layer, allow supporting wake-on-lan in situations where
      the PHY registers have to be written for it to be configured.

      Use it to support wake-on-lan in mv643xx_eth.

      From Michael Stapelberg.

   7) Significantly improve firewire IPV6 support, from YOSHIFUJI
      Hideaki.

   8) Allow multiple packets to be sent in a single transmission using
      network coding in batman-adv, from Martin Hundebøll.

   9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

  10) Generalize the VXLAN forwarding tables so that there is more
      flexibility in configurating various aspects of the endpoints.
      From David Stevens.

  11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
      from Dmitry Kravkov.

  12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
      Neira Ayuso.

  13) Start adding networking selftests.

  14) In situations of overload on the same AF_PACKET fanout socket, or
      per-cpu packet receive queue, minimize drop by distributing the
      load to other cpus/fanouts.  From Willem de Bruijn and Eric
      Dumazet.

  15) Add support for new payload offset BPF instruction, from Daniel
      Borkmann.

  16) Convert several drivers over to mdoule_platform_driver(), from
      Sachin Kamat.

  17) Provide a minimal BPF JIT image disassembler userspace tool, from
      Daniel Borkmann.

  18) Rewrite F-RTO implementation in TCP to match the final
      specification of it in RFC4138 and RFC5682.  From Yuchung Cheng.

  19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
      you like netlink, so I implemented netlink dumping of netlink
      sockets.") From Andrey Vagin.

  20) Remove ugly passing of rtnetlink attributes into rtnl_doit
      functions, from Thomas Graf.

  21) Allow userspace to be able to see if a configuration change occurs
      in the middle of an address or device list dump, from Nicolas
      Dichtel.

  22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
      Frederic Sowa.

  23) Increase accuracy of packet length used by packet scheduler, from
      Jason Wang.

  24) Beginning set of changes to make ipv4/ipv6 fragment handling more
      scalable and less susceptible to overload and locking contention,
      from Jesper Dangaard Brouer.

  25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
      instead.  From Hong Zhiguo.

  26) Optimize route usage in IPVS by avoiding reference counting where
      possible, from Julian Anastasov.

  27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

  28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
      Eitzenberger.

  29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
      nfnetlink_log, and nfnetlink_queue.  From Gao feng.

  30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

  31) Support several new r8169 chips, from Hayes Wang.

  32) Support tokenized interface identifiers in ipv6, from Daniel
      Borkmann.

  33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

  34) Add 802.1ad vlan offload support, from Patrick McHardy.

  35) Support mmap() based netlink communication, also from Patrick
      McHardy.

  36) Support HW timestamping in mlx4 driver, from Amir Vadai.

  37) Rationalize AF_PACKET packet timestamping when transmitting, from
      Willem de Bruijn and Daniel Borkmann.

  38) Bring parity to what's provided by /proc/net/packet socket dumping
      and the info provided by netlink socket dumping of AF_PACKET
      sockets.  From Nicolas Dichtel.

  39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
      Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  filter: fix va_list build error
  af_unix: fix a fatal race with bit fields
  bnx2x: Prevent memory leak when cnic is absent
  bnx2x: correct reading of speed capabilities
  net: sctp: attribute printl with __printf for gcc fmt checks
  netlink: kconfig: move mmap i/o into netlink kconfig
  netpoll: convert mutex into a semaphore
  netlink: Fix skb ref counting.
  net_sched: act_ipt forward compat with xtables
  mlx4_en: fix a build error on 32bit arches
  Revert "bnx2x: allow nvram test to run when device is down"
  bridge: avoid OOPS if root port not found
  drivers: net: cpsw: fix kernel warn on cpsw irq enable
  sh_eth: use random MAC address if no valid one supplied
  3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
  tg3: fix to append hardware time stamping flags
  unix/stream: fix peeking with an offset larger than data in queue
  unix/dgram: fix peeking with an offset larger than data in queue
  unix/dgram: peek beyond 0-sized skbs
  openvswitch: Remove unneeded ovs_netdev_get_ifindex()
  ...
2013-05-01 14:08:52 -07:00
Xi Wang 20074f357d filter: fix va_list build error
This patch fixes the following build error.

In file included from include/linux/filter.h:52:0,
                 from arch/arm/net/bpf_jit_32.c:14:
include/linux/printk.h:54:2: error: unknown type name ‘va_list’
include/linux/printk.h:105:21: error: unknown type name ‘va_list’
include/linux/printk.h:108:30: error: unknown type name ‘va_list’

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01 16:28:48 -04:00