To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.
This also makes it clear that it is checking the security of the link.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This reverts commit 2e48928d8a.
Those functions are needed and should not be removed, or
there is no way to set the rfkill led trigger name.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some devices like the current iwlwifi implementation
require that the P2P interface address match the P2P
Device address (only one P2P interface is supported.)
Add the HW flag IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
that allows drivers to request that P2P Interfaces
added while a P2P Device is active get the same MAC
address by default.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to support using a different MAC address
for the P2P Device address we must first have a
P2P Device abstraction that can be assigned a MAC
address.
This abstraction will also be useful to support
offloading P2P operations to the device, e.g.
periodic listen for discoverability.
Currently, the driver is responsible for assigning
a MAC address to the P2P Device, but this could be
changed by allowing a MAC address to be given to
the NEW_INTERFACE command.
As it has no associated netdev, a P2P Device can
only be identified by its wdev identifier but the
previous patches allowed using the wdev identifier
in various APIs, e.g. remain-on-channel.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Support getting A-MPDU status information from the
drivers and reporting it to userspace via radiotap
in the standard fields.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define the A-MPDU status field in radiotap, also
update the radiotap parser for it and the MCS field
that was apparently missed last time.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In IBSS it is possible that the supported rates set for a station changes over
time (e.g. it gets first initialised as an empty set because of no available
information about rates and updated later). In this case the driver has to be
notified about the change in order to update its internal table accordingly (if
needed).
This behaviour is needed by all those drivers that handle rc internally but
leave stations management to mac80211
Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[Johannes - add docs, validate IBSS mode only, fix compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We can not assume parallel flash is always present, there are boards
with *serial* flash and probably some without flash at all.
Define some bits by the way.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Reviewed-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We need to check the 'Role' parameter from the LE Connection
Complete Event in order to properly set 'out' and 'link_mode'
fields from hci_conn structure.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
lsof command can tell the type of socket processes are using.
Internal lsof uses inode numbers on socket fs to resolve the type of
sockets. Files under /proc/net/, such as tcp, udp, unix, etc provides
such inode information.
Unfortunately bluetooth related protocols don't provide such inode
information. This patch series introduces /proc/net files for the protocols.
This patch against af_bluetooth.c provides facility to the implementation
of protocols. This patch extends bt_sock_list and introduces two exported
function bt_procfs_init, bt_procfs_cleanup.
The type bt_sock_list is already used in some of implementation of
protocols. bt_procfs_init prepare seq_operations which converts
protocol own bt_sock_list data to protocol own proc entry when the
entry is accessed.
What I, lsof user, need is just inode number of bluetooth
socket. However, people may want more information. The bt_procfs_init
takes a function pointer for customizing the show handler of
seq_operations.
In v4 patch, __acquires and __releases attributes are added to suppress
sparse warning. Suggested by Andrei Emeltchenko.
In v5 patch, linux/proc_fs.h is included to use PDE. Build error is
reported by Fengguang Wu.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Move the l2cap channel list chan->global_l under the refcnt
protection and free it based on the refcnt.
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Refactor the code in order to use the l2cap_chan_destroy()
from l2cap_chan_put() under the refcnt protection.
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Return values are never used because callers hci_proto_connect_cfm
and hci_proto_disconn_cfm return void.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
AMP status codes copied from Bluez patch sent by Peter Krystad
<pkrystad@codeaurora.org>.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Change spaces to tabs in smp code
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Use the same style for refcnt printing through all Bluetooth code
taking the reference the l2cap_chan refcnt printing.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Add check that HCI controller is BR/EDR. AMP controller shall not be
managed by mgmt interface and consequently user space.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This patch removes the struct adv_entry since it is not used anymore.
This struct should have been removed in commit 479453d (Bluetooth:
Remove advertising cache).
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Currently the only way for wireless drivers to tell whether or not OFDM
is allowed on the current channel is to check the regulatory
information. However, this requires hodling cfg80211_mutex, which is not
visible to the drivers.
Other regulatory restrictions are provided as flags in the channel
definition, so let's do similarly with OFDM. This patch adds a new flag,
IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not
allowed. This flag is set on any channels for which regulatory indicates
that OFDM is prohibited.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.
Signed-off-by: Fan Du <fdu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs. This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps). Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once. This
results in a single skb that expands to 861 segments.
In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring. The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset). This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.
Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
limit.
2. In netif_skb_features(), if the number of segments is too high then
mask out GSO features to force fall back to software GSO.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM DMA engine updates from Russell King:
"This looks scary at first glance, but what it is is:
- a rework of the sa11x0 DMA engine driver merged during the previous
cycle, to extract a common set of helper functions for DMA engine
implementations.
- conversion of amba-pl08x.c to use these helper functions.
- addition of OMAP DMA engine driver (using these helper functions),
and conversion of some of the OMAP DMA users to use DMA engine.
Nothing in the helper functions is ARM specific, so I hope that other
implementations can consolidate some of their code by making use of
these helpers.
This has been sitting in linux-next most of the merge cycle, and has
been tested by several OMAP folk. I've tested it on sa11x0 platforms,
and given it my best shot on my broken platforms which have the
amba-pl08x controller.
The last point is the addition to feature-removal-schedule.txt, which
will have a merge conflict. Between myself and TI, we're planning to
remove the old TI DMA implementation next year."
Fix up trivial add/add conflicts in Documentation/feature-removal-schedule.txt
and drivers/dma/{Kconfig,Makefile}
* 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm: (53 commits)
ARM: 7481/1: OMAP2+: omap2plus_defconfig: enable OMAP DMA engine
ARM: 7464/1: mmc: omap_hsmmc: ensure probe returns error if DMA channel request fails
Add feature removal of old OMAP private DMA implementation
mtd: omap2: remove private DMA API implementation
mtd: omap2: add DMA engine support
spi: omap2-mcspi: remove private DMA API implementation
spi: omap2-mcspi: add DMA engine support
ARM: omap: remove mmc platform data dma_mask and initialization
mmc: omap: remove private DMA API implementation
mmc: omap: add DMA engine support
mmc: omap_hsmmc: remove private DMA API implementation
mmc: omap_hsmmc: add DMA engine support
dmaengine: omap: add support for cyclic DMA
dmaengine: omap: add support for setting fi
dmaengine: omap: add support for returning residue in tx_state method
dmaengine: add OMAP DMA engine driver
dmaengine: sa11x0-dma: add cyclic DMA support
dmaengine: sa11x0-dma: fix DMA residue support
dmaengine: PL08x: ensure all descriptors are freed when channel is released
dmaengine: PL08x: get rid of write only pool_ctr and free_txd locking
...
It includes:
- large updates for OMAP
- support for LCD3 overlay manager (omap5)
- omapdss output cleanup
- removal of passive matrix LCD support as there are no drivers for
such panels for DSS or DSS2 and nobody complained (cleanup)
- large updates for SH Mobile
- overlay support
- separating MERAM (cache) from framebuffer driver
- some updates for Exynos and da8xx-fb
- various other small patches
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (GNU/Linux)
iQIcBAABAgAGBQJQGSWqAAoJECSVL5KnPj1PnEQP/RQ5NWKlgkPloLkWrLc8stN/
0AKWgxBJ0BuJBW6CJCKVy76kkCBW2PZ7bHsFLNuX490KnPwt1cz3sLni78UiW1CZ
bNlN5UKshOfC511BxF5GZjtLvvkj5+ocmoybq27MhBoJn/7EVbli7lSeYjEWeWuk
sTq6wJTBJ8Nc4PEWdPhIbiWe7NgnCge27AOHrzUV5cRHxdHtl+mqD99Ky7UdMsHz
qBVTVmtEmLh2g3KMuu5rByuDDlUqhpi0sKorGsNWk92rpUnVsc4E4/v06JJqB3n8
ef3q352GK32LKpWwX78pm5+DJMhpSMFJg6UrvQu03gQSU5Pw3O4Dl8g+hh6FgcMo
niYZ+g07K0D8BSqdTwy9gwRTSWLPHplR8xz9VsW3+jdmFFuQgB6suA2Dk2E/K3Cf
12jurwegypfI5KutRyTz7IuTw/8OhHs4x0PJWcSw3lq1czUM212vqDWKYJFbgznq
6s8sHLWnQg0U47LYTCNV/mA4QRlE4ewE3B6wIVrzPvcTIinKqO1hky10fZ4+sEw6
gH6bnSBvlRgYOYhy/MPInz0Gt6YfzK26M6ZPMq2DU7gZ6OFcL9IrnqWDdnZMwwD9
j15DWeU2XPMc8vAGpdg2giq3VmQ53rviwLgp7Ht6fGrTLk1z4q167zGUSOvo3KQi
Ai90ycCBgAGLorHhdCI1
=22b7
-----END PGP SIGNATURE-----
Merge tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6
Pull fbdev updates from Florian Tobias Schandinat:
- large updates for OMAP
- support for LCD3 overlay manager (omap5)
- omapdss output cleanup
- removal of passive matrix LCD support as there are no drivers for
such panels for DSS or DSS2 and nobody complained (cleanup)
- large updates for SH Mobile
- overlay support
- separating MERAM (cache) from framebuffer driver
- some updates for Exynos and da8xx-fb
- various other small patches
* tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6: (78 commits)
da8xx-fb: fix compile issue due to missing include
fbdev: Make pixel_to_pat() failure mode more friendly
da8xx-fb: do not turn ON LCD backlight unless LCDC is enabled
fbdev: sh_mobile_lcdc: Fix vertical panning step
video: exynos mipi dsi: Fix mipi dsi regulators handling issue
video: da8xx-fb: do clock reset of revision 2 LCDC before enabling
arm: da850: configure LCDC fifo threshold
video: da8xx-fb: configure FIFO threshold to reduce underflow errors
video: da8xx-fb: fix flicker due to 1 frame delay in updated frame
video: da8xx-fb rev2: fix disabling of palette completion interrupt
da8xx-fb: add missing FB_BLANK operations
video: exynos_dp: use usleep_range instead of delay
video: exynos_dp: check the only INTERLANE_ALIGN_DONE bit during Link Training
fb: epson1355fb: Fix section mismatch
video: exynos_dp: fix wrong DPCD address during Link Training
video/smscufx: fix line counting in fb_write
aty128fb: Fix coding style issues
fbdev: sh_mobile_lcdc: Fix pan offset computation in YUV mode
fbdev: sh_mobile_lcdc: Fix overlay registers update during pan operation
fbdev: sh_mobile_lcdc: Support horizontal panning
...
A collection of small fixes that have been found recently.
Most of the commits are regression fixes in HD-audio and some other
random drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQGOkdAAoJEGwxgFQ9KSmk5REP/0OH5srTWkSGDJqWK0m0Z0A6
vkZE9KXm/cKcw59MEBhZrE28G4K8fI28XLj6iEuhzcuv7XsUTo9d24Uvvv1pWaEy
p2GFMRNc5QrXtprnckL+HPA4+asmiyEpXpYC7D4YH1N6ofYuNJfh0QIgQKG0R2Oz
8Ekdwuuzu0gfNYcN7aWDFiDwNID8hRiW4RVf9V5mNOGtO9Z+82o7u2pnr74vu6FG
C07DrpKXauGhGDIgfoNn30HwifSWvPm/rpPWwxUucPLAjiE25/70hTjnZZYWtRbe
g9o9INh3F72aBv23zTQzjkOr9/hhc4/j9zxZ1cMSjTKdvSdoFa5QuQTfCct7z7Fd
GcdXtMMNSF+FLNC4TyOlyMLoEFaHhv9uBMVk0rBe+y1/urzf4aH+PfI1B42meSI5
tHiGVvTdhktA2NGp1kf24b88db5ZoNPk2Kmzzn8xHxZsQTjjaUriMAtM/CgmLoBj
sOjMEkHZpcmAWCOqZDhb9U7QDZNp3h6TBG2/j/PerN/mt5pAVdoxzECDbswm/8My
g/ujPJFe/2NpBRsDqTI2Lb1H5Xy1tLAnwz5NA4+aiEQjaCRNGLYUvnlcrgdwOmaE
bk1OmKWTE2ck6rU+edsyPOSWzFEyU1hL1UDcqIyeBsZbh+pvFh+dxEbQFckhR6o4
fXqmVya1YWUrl2vF99QW
=cSAm
-----END PGP SIGNATURE-----
Merge tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes that have been found recently. Most of
the commits are regression fixes in HD-audio and some other random
drivers."
* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: snd-usb: fix clock source validity index
ALSA: hda - Fix mute-LED GPIO initialization for IDT codecs
ALSA: hda - Add descriptions for missing IDT 92HD83x models
ALSA: hda - Fix polarity of mute LED on HP Mini 210
ALSA: es1688 - freeup resources on init failure
ALSA: hda - Workaround for silent output on VAIO Z with ALC889
ALSA: hda - Fix WARNING from HDMI/DP parser
ALSA: hda - Detach from converter at closing in patch_hdmi.c
ALSA: hda - Fix mute-LED GPIO setup for HP Mini 210
ALSA: mpu401: Fix missing initialization of irq field
ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs
Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.
Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."
Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...
Pull block driver changes from Jens Axboe:
- Making the plugging support for drivers a bit more sane from Neil.
This supersedes the plugging change from Shaohua as well.
- The usual round of drbd updates.
- Using a tail add instead of a head add in the request completion for
ndb, making us find the most completed request more quickly.
- A few floppy changes, getting rid of a duplicated flag and also
running the floppy init async (since it takes forever in boot terms)
from Andi.
* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
floppy: remove duplicated flag FD_RAW_NEED_DISK
blk: pass from_schedule to non-request unplug functions.
block: stack unplug
blk: centralize non-request unplug handling.
md: remove plug_cnt feature of plugging.
block/nbd: micro-optimization in nbd request completion
drbd: announce FLUSH/FUA capability to upper layers
drbd: fix max_bio_size to be unsigned
drbd: flush drbd work queue before invalidate/invalidate remote
drbd: fix potential access after free
drbd: call local-io-error handler early
drbd: do not reset rs_pending_cnt too early
drbd: reset congestion information before reporting it in /proc/drbd
drbd: report congestion if we are waiting for some userland callback
drbd: differentiate between normal and forced detach
drbd: cleanup, remove two unused global flags
floppy: Run floppy initialization asynchronous
Pull core block IO bits from Jens Axboe:
"The most complicated part if this is the request allocation rework by
Tejun, which has been queued up for a long time and has been in
for-next ditto as well.
There are a few commits from yesterday and today, mostly trivial and
obvious fixes. So I'm pretty confident that it is sound. It's also
smaller than usual."
* 'for-3.6/core' of git://git.kernel.dk/linux-block:
block: remove dead func declaration
block: add partition resize function to blkpg ioctl
block: uninitialized ioc->nr_tasks triggers WARN_ON
block: do not artificially constrain max_sectors for stacking drivers
blkcg: implement per-blkg request allocation
block: prepare for multiple request_lists
block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
blkcg: inline bio_blkcg() and friends
block: allocate io_context upfront
block: refactor get_request[_wait]()
block: drop custom queue draining used by scsi_transport_{iscsi|fc}
mempool: add @gfp_mask to mempool_create_node()
blkcg: make root blkcg allocation use %GFP_KERNEL
blkcg: __blkg_lookup_create() doesn't need radix preload
In commit 3b6e2723f3 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.
This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.
Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().
v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Round of refactoring and enhancements to irq_domain infrastructure. This
series starts the process of simplifying irqdomain. The ultimate goal is
to merge LEGACY, LINEAR and TREE mappings into a single system, but had
to back off from that after some last minute bugs. Instead it mainly
reorganizes the code and ensures that the reverse map gets populated
when the irq is mapped instead of the first time it is looked up.
Merging of the irq_domain types is deferred to v3.7
In other news, this series adds helpers for creating static mappings on
a linear or tree mapping.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQGKKkAAoJEEFnBt12D9kB59gQAJnTjrihej1tr0OEkffIthGK
RyVI/DMo0jMgLs4K/rIo3Y+PdTSsNYd8x4R7ln8O7rNRQn8W6jE6NQgoMh51EvNc
FAltmTsBldq6hUNuz2FEnbmojBP4QklTzL8bAiXtX5EufWQsgMsP4guOuHXLCjEV
CkWYVk/slXEWJ8yYJc6GKVRvL+CNeiXVCTcOsYA0CI3ofN7O0rd+YAL314CRllIc
e5uARbWM+s9FJ/eXwCZP4+3jCmdI/CHJb284WldMc/mBD8Rbiqpb4kH6AZI+TH2O
CyiNEPWs6FG5eJPTID7HrOarXGzwYq/pvv8iG7Mh8NiKSae1C1HdkHelCjbLQ+pU
POya0fWF1Gvzlmw0gHik86dqaKjwb29btjj7SFg8KnQExWn2ifhsY70mM9wCTo3s
cwcQlssDIsARE83nttTFCoV/iAWh9AvTxafrXu/+9OKTjpsYlC8kgzdVjq5aAxON
JaAUK1OduTWRsd1TabKlh6naRXr9nRcLKikwKri2oYVKkj97wahBuib4ffzAcNqz
VklRBxTH6M+dz/t5NpcVyLXJpqzTN++QNdTAmeQG6LOnHJL4tpFTsx5sMa7ghmzX
LNpmp/AkVfP0MT7Drf0FUUx6iFA7sjANYzcepUVDrPGKHx0E3LyqbG5JKcC5LgM6
+UIoKAktF3vY7pdZJL9z
=ZUF/
-----END PGP SIGNATURE-----
Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
Pull irqdomain changes from Grant Likely:
"Round of refactoring and enhancements to irq_domain infrastructure.
This series starts the process of simplifying irqdomain. The ultimate
goal is to merge LEGACY, LINEAR and TREE mappings into a single
system, but had to back off from that after some last minute bugs.
Instead it mainly reorganizes the code and ensures that the reverse
map gets populated when the irq is mapped instead of the first time it
is looked up.
Merging of the irq_domain types is deferred to v3.7
In other news, this series adds helpers for creating static mappings
on a linear or tree mapping."
* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
irqdomain: Improve diagnostics when a domain mapping fails
irqdomain: eliminate slow-path revmap lookups
irqdomain: Fix irq_create_direct_mapping() to test irq_domain type.
irqdomain: Eliminate dedicated radix lookup functions
irqdomain: Support for static IRQ mapping and association.
irqdomain: Always update revmap when setting up a virq
irqdomain: Split disassociating code into separate function
irq_domain: correct a minor wrong comment for linear revmap
irq_domain: Standardise legacy/linear domain selection
irqdomain: Make ops->map hook optional
irqdomain: Remove unnecessary test for IRQ_DOMAIN_MAP_LEGACY
irqdomain: Simple NUMA awareness.
devicetree: add helper inline for retrieving a node's full name
Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQF+koAAoJECObm247sIsiNjQP/Rra8MHYEtUYlIDlx/MU5aPV
dVOQ1rbHqLH6vnqBESuzwWYd1Eg9pI0yjMoQKuOrxg0Ri/cAhf3WE8xTzb/HTZ6h
npBvVmYIrqAFvE8Tqd0L9XZXNES5dy47F7a3Sp7vXAflQTQ7NAgG3idhzcmom5Zp
6Z7ETrtcaLnu0wr67eb96ICpq9QgPd9rWy5Fn9RAo7NzOb3SJp9CXT02vShytQLk
4sCcez4DTXf3IQjvjgVwV1OfvmcvG1lhI8R1Wvj1TqfqJVC0ZmKQYpmR9rlnPkLB
SB51h0v6a64YaKBpD7R7wuKUGWn2CEg4zAPuFxeACZ99pyx+eulPqZOTX2mAl+gQ
aW5Ec7Au76zvqnKgPVt2nzFMCXYZpHlWRQHqQNon5xXVdqJRS4cJAev7o9udyD6q
SQZIKBoZrwQtA3YbukXd91JKnxliH23MxCaL2u5C5ix6BSKsRs7djHboe210GAcQ
bEjQnhHujETJcFzm1O9HbypJhfg3Klw5o/SKYi3pylURDRMmKVhR3vP9cxdloxAC
QGh3klwQncKN9RxSACsfcNAh0V7RqkufcLXPpEcCwAuyUIrYniwWxiroco/bYE24
DK8g6bi+cY9At2iMALTd2J7RzZcrertQVmRtmXaOrLwSZvWrdizc6qNBpwTIozIT
+V/e1WcSau1C4zU4T0Z0
=fx5A
-----END PGP SIGNATURE-----
Merge tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio
Pull VFIO core from Alex Williamson:
"This series includes the VFIO userspace driver interface for the 3.6
kernel merge window. This driver is intended to provide a secure
interface for device access using IOMMU protection for applications
like assignment of physical devices to virtual machines.
Qemu will be the first user of this interface, enabling assignment of
PCI devices to Qemu guests. This interface is intended to eventually
replace the x86-specific assignment mechanism currently available in
KVM.
This interface has the advantage of being more secure, by working with
IOMMU groups to ensure device isolation and providing it's own
filtered resource access mechanism, and also more flexible, in not
being x86 or KVM specific (extensions to enable POWER are already
working).
This driver is originally the work of Tom Lyon, but has since been
handed over to me and gone through a complete overhaul thanks to the
input from David Gibson, Ben Herrenschmidt, Chris Wright, Joerg
Roedel, and others. This driver has been available in linux-next for
the last month."
Paul Mackerras says:
"I would be glad to see it go in since we want to use it with KVM on
PowerPC. If possible we'd like the PowerPC bits for it to go in as
well."
* tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio:
vfio: Add PCI device driver
vfio: Type1 IOMMU implementation
vfio: Add documentation
vfio: VFIO core
from interrupts for /dev/random and /dev/urandom. The goal is to
addresses weaknesses discussed in the paper "Mining your Ps and Qs:
Detection of Widespread Weak Keys in Network Devices", by Nadia
Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will
be published in the Proceedings of the 21st Usenix Security Symposium,
August 2012. (See https://factorable.net for more information and an
extended version of the paper.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8
lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+
LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T
6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw
03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi
3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67
AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd
QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K
iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0
NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0
TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC
bi5FC1OolqhvzVIdsqgt
=u7KM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.
The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"
Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...
Pull second set of media updates from Mauro Carvalho Chehab:
- radio API: add support to work with radio frequency bands
- new AM/FM radio drivers: radio-shark, radio-shark2
- new Remote Controller USB driver: iguanair
- conversion of several drivers to the v4l2 core control framework
- new board additions at existing drivers
- the remaining (and vast majority of the patches) are due to
drivers/DocBook fixes/cleanups.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (154 commits)
[media] radio-tea5777: use library for 64bits div
[media] tlg2300: Declare MODULE_FIRMWARE usage
[media] lgs8gxx: Declare MODULE_FIRMWARE usage
[media] xc5000: Add MODULE_FIRMWARE statements
[media] s2255drv: Add MODULE_FIRMWARE statement
[media] dib8000: move dereference after check for NULL
[media] Documentation: Update cardlists
[media] bttv: add support for Aposonic W-DVR
[media] cx25821: Remove bad strcpy to read-only char*
[media] pms.c: remove duplicated include
[media] smiapp-core.c: remove duplicated include
[media] via-camera: pass correct format settings to sensor
[media] rtl2832.c: minor cleanup
[media] Add support for the IguanaWorks USB IR Transceiver
[media] Minor cleanups for MCE USB
[media] drivers/media/dvb/siano/smscoreapi.c: use list_for_each_entry
[media] Use a named union in struct v4l2_ioctl_info
[media] mceusb: Add Twisted Melon USB IDs
[media] staging/media/solo6x10: use module_pci_driver macro
[media] staging/media/dt3155v4l: use module_pci_driver macro
...
Conflicts:
Documentation/feature-removal-schedule.txt
Features include:
- Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
separate modules.
- Fix Oopses in the NFSv4 idmapper
- Fix a deadlock whereby rpciod tries to allocate a new socket and
ends up recursing into the NFS code due to memory reclaim.
- Increase the number of permitted callback connections.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb
XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg
scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik
hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk
2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6
KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc
mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt
uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu
zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S
jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe
sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM
cV9Ex8zh2Y4aiUXCy3uA
=KSix
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull second wave of NFS client updates from Trond Myklebust:
- Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
separate modules.
- Fix Oopses in the NFSv4 idmapper
- Fix a deadlock whereby rpciod tries to allocate a new socket and ends
up recursing into the NFS code due to memory reclaim.
- Increase the number of permitted callback connections.
* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: explicitly reject LOCK_MAND flock() requests
nfs: increase number of permitted callback connections.
SUNRPC: return negative value in case rpcbind client creation error
NFS: Convert v4 into a module
NFS: Convert v3 into a module
NFS: Convert v2 into a module
NFS: Keep module parameters in the generic NFS client
NFS: Split out remaining NFS v4 inode functions
NFS: Pass super operations and xattr handlers in the nfs_subversion
NFS: Only initialize the ACL client in the v3 case
NFS: Create a try_mount rpc op
NFS: Remove the NFS v4 xdev mount function
NFS: Add version registering framework
NFS: Fix a number of bugs in the idmapper
nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
sunrpc: clarify comments on rpc_make_runnable
pnfsblock: bail out partial page IO
Pull networking update from David S. Miller:
"I think Eric Dumazet and I have dealt with all of the known routing
cache removal fallout. Some other minor fixes all around.
1) Fix RCU of cached routes, particular of output routes which require
liberation via call_rcu() instead of call_rcu_bh(). From Eric
Dumazet.
2) Make sure we purge net device references in cached routes properly.
3) TG3 driver bug fixes from Michael Chan.
4) Fix reported 'expires' value in ipv6 routes, from Li Wei.
5) TUN driver ioctl leaks kernel bytes to userspace, from Mathias
Krause."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
ipv4: Properly purge netdev references on uncached routes.
ipv4: Cache routes in nexthop exception entries.
ipv4: percpu nh_rth_output cache
ipv4: Restore old dst_free() behavior.
bridge: make port attributes const
ipv4: remove rt_cache_rebuild_count
net: ipv4: fix RCU races on dst refcounts
net: TCP early demux cleanup
tun: Fix formatting.
net/tun: fix ioctl() based info leaks
tg3: Update version to 3.124
tg3: Fix race condition in tg3_get_stats64()
tg3: Add New 5719 Read DMA workaround
tg3: Fix Read DMA workaround for 5719 A0.
tg3: Request APE_LOCK_PHY before PHY access
ipv6: fix incorrect route 'expires' value passed to userspace
mISDN: Bugfix only few bytes are transfered on a connection
seeq: use PTR_RET at init_module of driver
bnx2x: remove cast around the kmalloc in bnx2x_prev_mark_path
ipv4: clean up put_child
...
If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted. Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log. The final teardown will trigger a
BUG_ON in mm/filemap.c.
This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37. It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages
look like this;
[ ..........] Lots of bad pmd messages followed by this
[ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[ 127.186778] ------------[ cut here ]------------
[ 127.186781] kernel BUG at mm/filemap.c:134!
[ 127.186782] invalid opcode: 0000 [#1] SMP
[ 127.186783] CPU 7
[ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[ 127.186801]
[ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[ 127.186804] RIP: 0010:[<ffffffff810ed6ce>] [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002
[ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[ 127.186821] Stack:
[ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[ 127.186827] Call Trace:
[ 127.186829] [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[ 127.186832] [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[ 127.186834] [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[ 127.186837] [<ffffffff811655c7>] evict+0xa7/0x1b0
[ 127.186839] [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[ 127.186840] [<ffffffff811658f9>] iput+0x39/0x50
[ 127.186842] [<ffffffff81162708>] d_kill+0xf8/0x130
[ 127.186843] [<ffffffff81162812>] dput+0xd2/0x1a0
[ 127.186845] [<ffffffff8114e2d0>] __fput+0x170/0x230
[ 127.186848] [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[ 127.186849] [<ffffffff8114e3ad>] fput+0x1d/0x30
[ 127.186851] [<ffffffff81117db7>] remove_vma+0x37/0x80
[ 127.186853] [<ffffffff81119182>] do_munmap+0x2d2/0x360
[ 127.186855] [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[ 127.186857] [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[ 127.186868] RIP [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186870] RSP <ffff8804144b5c08>
[ 127.186871] ---[ end trace 7cbac5d1db69f426 ]---
The bug is a race and not always easy to reproduce. To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.
$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done
On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted. The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON. shutdown will also hang and needs a
hard reset or a sysrq-b.
The basic problem is a race between page table sharing and teardown. For
the most part page table sharing depends on i_mmap_mutex. In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.
Unfortunately it appears to be also insufficient. Consider the following
situation
Process A Process B
--------- ---------
hugetlb_fault shmdt
LockWrite(mmap_sem)
do_munmap
unmap_region
unmap_vmas
unmap_single_vma
unmap_hugepage_range
Lock(i_mmap_mutex)
Lock(mm->page_table_lock)
huge_pmd_unshare/unmap tables <--- (1)
Unlock(mm->page_table_lock)
Unlock(i_mmap_mutex)
huge_pte_alloc ...
Lock(i_mmap_mutex) ...
vma_prio_walk, find svma, spte ...
Lock(mm->page_table_lock) ...
share spte ...
Unlock(mm->page_table_lock) ...
Unlock(i_mmap_mutex) ...
hugetlb_no_page <--- (2)
free_pgtables
unlink_file_vma
hugetlb_free_pgd_range
remove_vma_list
In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down. The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables. At (1) above,
the page tables are not shared yet so it unmaps the PMDs. Process A sets
up page table sharing and at (2) faults a new entry. Process B then trips
up on it in free_pgtables.
This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed. This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex. This makes the VMA
ineligible for sharing and avoids the race. Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).
This should be treated as a -stable candidate if it is merged.
Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.
==== CUT HERE ====
static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;
unsigned int get_random(unsigned int max)
{
struct timeval tv;
gettimeofday(&tv, NULL);
srandom(tv.tv_usec);
return random() % max;
}
static void play(void *addr, size_t size)
{
unsigned char *start = addr,
*end = start + size,
*a;
start += get_random(size/2);
/* we could itterate on huge pages but let's give it more time. */
for (a = start; a < end; a += 4096)
*a = 0;
}
int main(int argc, char **argv)
{
key_t key = IPC_PRIVATE;
size_t sizeA = nr_huge_page_A * huge_page_size;
size_t sizeB = nr_huge_page_B * huge_page_size;
int shmidA, shmidB;
void *addrA = NULL, *addrB = NULL;
int nr_children = 300, n = 0;
if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
fork_child:
switch(fork()) {
case 0:
switch (n%3) {
case 0:
play(addrA, sizeA);
break;
case 1:
play(addrB, sizeB);
break;
case 2:
break;
}
break;
case -1:
perror("fork:");
break;
default:
if (++n < nr_children)
goto fork_child;
play(addrA, sizeA);
break;
}
shmdt(addrA);
shmdt(addrB);
do {
wait(NULL);
} while (--n > 0);
shmctl(shmidA, IPC_RMID, NULL);
shmctl(shmidB, IPC_RMID, NULL);
return 0;
}
[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pg_data_t is zeroed before reaching free_area_init_core(), so remove the
now unnecessary initializations.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>