linux

Commit Graph

Author	SHA1	Message	Date
Paul E. McKenney	54ef6df3f3	rcu: Provide counterpart to rcu_dereference() for non-RCU situations Although rcu_dereference() and friends can be used in situations where object lifetimes are being managed by something other than RCU, the resulting sparse and lockdep-RCU noise can be annoying. This commit therefore supplies a lockless_dereference(), which provides the protection for dereferences without the RCU-related debugging noise. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-28 18:25:17 -04:00
Daniel Thompson	5fecf3a1e1	staging: android: logger: Fix log corruption regression Since commit `cd678fce42` ("switch logger to ->write_iter()"), any attempt to write to the log results in the log data being written over its own metadata, thus rendering the log unreadable. The problem was first detected when I ran an Android userspace on the v3.18-rc1 kernel. However the issue can also be observed with a non-Android userspace by using echo/cat to write to/from /dev/log_main . This patch resolves the problem by using a temporary to track the status of not-yet-committed writes to the log buffer. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-28 18:24:30 -04:00
David S. Miller	068301f2be	Merge branch 'cdc-ether' Olivier Blin says: ==================== cdc-ether: handle promiscuous mode Since kernel 3.16, my Lenovo USB network adapters (RTL8153) using cdc-ether are not working anymore in a bridge. This is due to commit `c472ab68ad`, which resets the packet filter when the device is bound. The default packet filter set by cdc-ether does not include promiscuous, while the adapter seemed to have promiscuous enabled by default. This patch series allows to support promiscuous mode for cdc-ether, by hooking into set_rx_mode. Incidentally, maybe this device should be handled by the r8152 driver, but this patch series is still nice for other adapters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Oliver Neukum <oneukum@suse.de>	2014-10-28 17:26:24 -04:00
Olivier Blin	b77e26d191	cdc-ether: handle promiscuous mode with a set_rx_mode callback Promiscuous mode was not supported anymore with my Lenovo adapters (RTL8153) since commit `c472ab68ad` (cdc-ether: clean packet filter upon probe). It was not possible to use them in a bridge anymore. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Also-analyzed-by: Loïc Yhuel <loic.yhuel@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:26:20 -04:00
Olivier Blin	d80c679bc1	cdc-ether: extract usbnet_cdc_update_filter function This will be used by the set_rx_mode callback. Also move a comment about multicast filtering in this new function. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:26:19 -04:00
Olivier Blin	1efed2d06c	usbnet: add a callback for set_rx_mode To delegate promiscuous mode and multicast filtering to the subdriver. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:26:19 -04:00
David S. Miller	9ffa1fcaef	Merge branch 'systemport-net' Florian Fainelli says: ==================== net: systemport: RX path and suspend fixes These two patches fix a race condition where we have our RX interrupts enabled, but not NAPI for the RX path, and the second patch fixes an issue for packets stuck in RX fifo during a suspend/resume cycle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:08:56 -04:00
Florian Fainelli	704d33e700	net: systemport: reset UniMAC coming out of a suspend cycle bcm_sysport_resume() was missing an UniMAC reset which can lead to various receive FIFO corruptions coming out of a suspend cycle. If the RX FIFO is stuck, it will deliver corrupted/duplicate packets towards the host CPU interface. This could be reproduced on crowded network and when Wake-on-LAN is enabled for this particular interface because the switch still forwards packets towards the host CPU interface (SYSTEMPORT), and we had to leave the UniMAC RX enable bit on to allow matching MagicPackets. Once we re-enter the resume function, there is a small window during which the UniMAC receive is still enabled, and we start queueing packets, but the RDMA and RBUF engines are not ready, which leads to having packets stuck in the UniMAC RX FIFO, ultimately delivered towards the host CPU as corrupted. Fixes: `40755a0fce` ("net: systemport: add suspend and resume support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:08:47 -04:00
Florian Fainelli	8edf0047f4	net: systemport: enable RX interrupts after NAPI There is currently a small window during which the SYSTEMPORT adapter enables its RX interrupts without having enabled its NAPI handler, which can result in packets to be discarded during interface bringup. A similar but more serious window exists in bcm_sysport_resume() during which we can have the RDMA engine not fully prepared to receive packets and yet having RX interrupts enabled. Fix this my moving the RX interrupt enable down to bcm_sysport_netif_start() after napi_enable() for the RX path is called, which fixes both call sites: bcm_sysport_open() and bcm_sysport_resume(). Fixes: `b02e6d9ba7` ("net: systemport: add bcm_sysport_netif_{enable,stop}") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:08:47 -04:00
Randy Dunlap	ebcf34f3d4	skbuff.h: fix kernel-doc warning for headers_end Fix kernel-doc warning in <linux/skbuff.h> by making both headers_start and headers_end private fields. Warning(..//include/linux/skbuff.h:654): No description found for parameter 'headers_end[0]' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:02:29 -04:00
Vince Bridgers	99d881f993	net: phy: Add SGMII Configuration for Marvell 88E1145 Initialization Marvell phy 88E1145 configuration & initialization was missing a case for initializing SGMII mode. This patch adds that case. Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 17:00:22 -04:00
Filipe Manana	d05a2b4cd9	Btrfs: fix race that makes btrfs_lookup_extent_info miss skinny extent items We have a race that can lead us to miss skinny extent items in the function btrfs_lookup_extent_info() when the skinny metadata feature is enabled. So basically the sequence of steps is: 1) We search in the extent tree for the skinny extent, which returns > 0 (not found); 2) We check the previous item in the returned leaf for a non-skinny extent, and we don't find it; 3) Because we didn't find the non-skinny extent in step 2), we release our path to search the extent tree again, but this time for a non-skinny extent key; 4) Right after we released our path in step 3), a skinny extent was inserted in the extent tree (delayed refs were run) - our second extent tree search will miss it, because it's not looking for a skinny extent; 5) After the second search returned (with ret > 0), we look for any delayed ref for our extent's bytenr (and we do it while holding a read lock on the leaf), but we won't find any, as such delayed ref had just run and completed after we released out path in step 3) before doing the second search. Fix this by removing completely the path release and re-search logic. This is safe, because if we seach for a metadata item and we don't find it, we have the guarantee that the returned leaf is the one where the item would be inserted, and so path->slots[0] > 0 and path->slots[0] - 1 must be the slot where the non-skinny extent item is if it exists. The only case where path->slots[0] is zero is when there are no smaller keys in the tree (i.e. no left siblings for our leaf), in which case the re-search logic isn't needed as well. This race has been present since the introduction of skinny metadata (change `3173a18f70`). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-10-28 13:59:54 -07:00
Linus Torvalds	9f76628da2	Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux Pull two nfsd fixes from Bruce Fields: "One regression from the 3.16 xdr rewrite, one an older bug exposed by a separate bug in the client's new SEEK code" * 'for-3.18' of git://linux-nfs.org/~bfields/linux: nfsd4: fix crash on unknown operation number nfsd4: fix response size estimation for OP_SEQUENCE	2014-10-28 13:32:06 -07:00
Linus Torvalds	6234056e13	Adding the new code for 3.19, I discovered a couple of minor bugs with the accounting of the ftrace_ops trampoline logic. One was that the old hash was not updated before calling the modify code for an ftrace_ops. The second bug was what let the first bug go unnoticed, as the update would check the current hash for all ftrace_ops (where it should only check the old hash for modified ones). This let things work when only one ftrace_ops was registered to a function, but could break if more than one was registered depending on the order of the look ups. The worse thing that can happen if this bug triggers is that the ftrace self checks would find an anomaly and shut itself down. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUToYWAAoJEEjnJuOKh9ldfS8H/36CL5E+4itux9tIhf13Untj FSi3EzvEdrTYu7IhdyRB6N7cp07g79jU3v40ZDLxDHzG2i4VLft/Z3uzIC0Z6mhL kJZCCWpUTAKJO/UPFcenEZ7eiL+B+5QVOc1Oxcet0odG5HWkEZG62va/MrhB9k/7 uUNRqXNjg7w2rG0TK2qjcTHiPGJ9h7/wG9RgYktAIs27BUmip5sRS1IMyFL51Gpo UNtIKGtG6/4hizdlHhWBuAa6ErM37GPskx3iP/45xiAu3J8SIbOk1FBe+4Xk+DZQ hZK479hzlk6OU/M2vDJefG1d6zeQ7y00LMkUIAPiUEgayXAXpYX7UjV13CLQeGU= =HrhJ -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace trampoline accounting fixes from Steven Rostedt: "Adding the new code for 3.19, I discovered a couple of minor bugs with the accounting of the ftrace_ops trampoline logic. One was that the old hash was not updated before calling the modify code for an ftrace_ops. The second bug was what let the first bug go unnoticed, as the update would check the current hash for all ftrace_ops (where it should only check the old hash for modified ones). This let things work when only one ftrace_ops was registered to a function, but could break if more than one was registered depending on the order of the look ups. The worse thing that can happen if this bug triggers is that the ftrace self checks would find an anomaly and shut itself down" * tag 'trace-fixes-v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix checking of trampoline ftrace_ops in finding trampoline ftrace: Set ops->old_hash on modifying what an ops hooks to	2014-10-28 13:27:19 -07:00
Paul E. McKenney	d7e2993396	rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit `35ce7f29a4` (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit `35ce7f29a4`, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti <yaneti@declera.com> Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Eric B Munson <emunson@akamai.com> Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Yanko Kaneti <yaneti@declera.com> Tested-by: Kevin Fenzi <kevin@scrye.com> Tested-by: Meelis Roos <mroos@linux.ee>	2014-10-28 13:24:13 -07:00
Linus Torvalds	6e2028aaa1	Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm Pull ARM fixes from Russell King: "A couple of ARM fixes. We fix some printk formats for ptrdiff_t quantities which cause GCC 4.9 to complain, and we also blacklist known buggy GCC 4.8.x compilers as their miscompilation is serious enough to cause filesystem corruption, even through many distros have fixed their versions" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: fix some printk formats ARM: Blacklist GCC 4.8.0 to GCC 4.8.2 - PR58854	2014-10-28 13:17:11 -07:00
Will Deacon	ce9ec37bdd	zap_pte_range: update addr when forcing flush after TLB batching faiure When unmapping a range of pages in zap_pte_range, the page being unmapped is added to an mmu_gather_batch structure for asynchronous freeing. If we run out of space in the batch structure before the range has been completely unmapped, then we break out of the loop, force a TLB flush and free the pages that we have batched so far. If there are further pages to unmap, then we resume the loop where we left off. Unfortunately, we forget to update addr when we break out of the loop, which causes us to truncate the range being invalidated as the end address is exclusive. When we re-enter the loop at the same address, the page has already been freed and the pte_present test will fail, meaning that we do not reconsider the address for invalidation. This patch fixes the problem by incrementing addr by the PAGE_SIZE before breaking out of the loop on batch failure. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-28 13:16:28 -07:00
Mugunthan V N	47276fcc2d	drivers: net:cpsw: fix probe_dt when only slave 1 is pinned out when slave 0 has no phy and slave 1 connected to phy, driver probe will fail as there is no phy id present for slave 0 device tree, so continuing even though no phy-id found, also moving mac-id read later to ensure mac-id is read from device tree even when phy-id entry in not found. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:44:35 -04:00
David S. Miller	25946f20b7	Merge tag 'master-2014-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-10-28 Please pull this batch of fixes intended for the 3.18 stream! For the mac80211 bits, Johannes says: "Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes." For the iwlwifi bits, Emmanuel says: "I revert here a patch that caused interoperability issues. dvm gets a fix for a bug that was reported by many users. Two minor fixes for BT Coex and platform power fix that helps reducing latency when the PCIe link goes to low power states." In addition... Felix Fietkau adds a couple of ath code fixes related to regulatory rule enforcement. Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS is not set. Karsten Wiese provides a trio of minor fixes for rtl8192cu. Kees Cook prevents a potential information leak in rtlwifi. Larry Finger also brings a trio of minor fixes for rtlwifi. Rafał Miłecki adds a device ID to the bcma bus driver. Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac to eliminate non-terminated string issues. Sujith Manoharan avoids some ath9k stalls by enabling HW queue control only for MCC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:30:15 -04:00
David S. Miller	3923d68dc0	Merge branch 'dsa-net' Andrew Lunn says: ==================== DSA tagging mismatches The second patch is a fix, which should be applied to -rc. It is possible to get a DSA configuration which does not work. The patch stops this happening. The first patch detects this situation, and errors out the probe of DSA, making it more obvious something is wrong. It is not required to apply it -rc. v2 fixes the use case pointed out by Florian, that a switch driver may use DSA_TAG_PROTO_NONE which the patch did not correctly handle. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:28:30 -04:00
Andrew Lunn	c146b7788e	dsa: mv88e6171: Fix tagging protocol/Kconfig The mv88e6171 can support two different tagging protocols, DSA and EDSA. The switch driver structure only allows one protocol to be enumerated, and DSA was chosen. However the Kconfig entry ensures the EDSA tagging code is built. With a minimal configuration, we then end up with a mismatch. The probe is successful, EDSA tagging is used, but the switch is configured for DSA, resulting in mangled packets. Change the switch driver structure to enumerate EDSA, fixing the mismatch. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: `42f2725394` ("net: DSA: Marvell mv88e6171 switch driver") Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:27:54 -04:00
Andrew Lunn	ae439286a0	net: dsa: Error out on tagging protocol mismatches If there is a mismatch between enabled tagging protocols and the protocol the switch supports, error out, rather than continue with a situation which is unlikely to work. Signed-off-by: Andrew Lunn <andrew@lunn.ch> cc: alexander.h.duyck@intel.com Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 15:27:54 -04:00
Dmitry Torokhov	185af4d666	Input: psmouse - remove unneeded check in psmouse_reconnect() psmouse_reconnect() will not be called if psmouse driver is not bound to the serio port, so there is no point in checking that. Also, as coded, it introduces potential NULL dereference in psmouse_dbg() in case psmouse is indeed NULL. Let's just remove it. Detected by Coverity: CID 146528 Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2014-10-28 11:42:56 -07:00
David S. Miller	c20ce79303	sparc: Hook up bpf system call. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-28 11:30:43 -07:00
Dmitry Torokhov	4db1f47c29	Input: vsxxxaa - fix code dropping bytes from queue I believe the intent of the code was to drop oldest bytes from the queue, not the latest if we drop one byte and both latest and some oldest of we are dropping more than one. Acked-by: Jan-Benedict Glaw <jbglaw@lug-owl.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2014-10-28 10:33:48 -07:00
Dmitry Torokhov	60183a6e93	Input: ims-pcu - fix dead code in ims_pcu_ofn_reg_addr_store() Coverity pointed out that at return point error is always 0 so the conditional is not needed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2014-10-28 10:33:36 -07:00
Dmitry Torokhov	42b63e603e	Input: opencores-kbd - fix error handling When I was adjusting patch in `848d479361` to use devm_ioremap_resource() I messed it up. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>	2014-10-28 10:32:59 -07:00
Tony Battersby	c21e59d8dc	lib/scatterlist: fix memory leak with scsi-mq Fix a memory leak with scsi-mq triggered by commands with large data transfer length. Fixes: `c53c6d6a68` ("scatterlist: allow chaining to preallocated chunks") Cc: <stable@vger.kernel.org> # 3.17.x Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-10-28 10:27:10 -06:00
Alex Deucher	8c3e434769	drm/radeon: remove invalid pci id 0x4c6e is a secondary device id so should not be used by the driver. Noticed-by: Mark Kettenis <mark.kettenis@xs4all.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2014-10-28 10:44:36 -04:00
Alex Deucher	72b3f9183e	drm/radeon: dpm fixes for asrock systems - bapm seems to cause CPU stuck messages so disable it. - nb dpm seems to prevent GPU dpm from getting enabled, so disable it. bug: https://bugs.freedesktop.org/show_bug.cgi?id=85107 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2014-10-28 10:44:35 -04:00
Wilfried Klaebe	c9cb57fe8b	radeon: clean up coding style differences in radeon_get_bios() Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2014-10-28 10:44:34 -04:00
Michel Dänzer	e5a5fd4df2	drm/radeon: Use drm_malloc_ab instead of kmalloc_array Should avoid kmalloc failures due to large number of array entries. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81991 Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2014-10-28 10:44:34 -04:00
Alex Deucher	6fa455935a	drm/radeon/dpm: disable ulv support on SI Causes problems on some boards. bug: https://bugs.freedesktop.org/show_bug.cgi?id=82889 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2014-10-28 10:44:33 -04:00
Dmitry Kasatkin	3b1deef6b1	evm: check xattr value length and type in evm_inode_setxattr() evm_inode_setxattr() can be called with no value. The function does not check the length so that following command can be used to produce the kernel oops: setfattr -n security.evm FOO. This patch fixes it. Changes in v3: * there is no reason to return different error codes for EVM_XATTR_HMAC and non EVM_XATTR_HMAC. Remove unnecessary test then. Changes in v2: * testing for validity of xattr type [ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1106.398192] IP: [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.399244] PGD 29048067 PUD 290d7067 PMD 0 [ 1106.399953] Oops: 0000 [#1] SMP [ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse [ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936 [ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1106.400020] task: ffff8800291a0000 ti: ffff88002917c000 task.ti: ffff88002917c000 [ 1106.400020] RIP: 0010:[<ffffffff812af7b8>] [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.400020] RSP: 0018:ffff88002917fd50 EFLAGS: 00010246 [ 1106.400020] RAX: 0000000000000000 RBX: ffff88002917fdf8 RCX: 0000000000000000 [ 1106.400020] RDX: 0000000000000000 RSI: ffffffff818136d3 RDI: ffff88002917fdf8 [ 1106.400020] RBP: ffff88002917fd68 R08: 0000000000000000 R09: 00000000003ec1df [ 1106.400020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800438a0a00 [ 1106.400020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1106.400020] FS: 00007f7dfa7d7740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000 [ 1106.400020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1106.400020] CR2: 0000000000000000 CR3: 000000003763e000 CR4: 00000000000006f0 [ 1106.400020] Stack: [ 1106.400020] ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98 [ 1106.400020] ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000 [ 1106.400020] 0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8 [ 1106.400020] Call Trace: [ 1106.400020] [<ffffffff812a1030>] security_inode_setxattr+0x5d/0x6a [ 1106.400020] [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f [ 1106.400020] [<ffffffff8116d1e0>] setxattr+0x122/0x16c [ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 1106.400020] [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143 [ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 1106.400020] [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f [ 1106.400020] [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0 [ 1106.400020] [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b [ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 <41> 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83 [ 1106.400020] RIP [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.400020] RSP <ffff88002917fd50> [ 1106.400020] CR2: 0000000000000000 [ 1106.428061] ---[ end trace ae08331628ba3050 ]--- Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Cc: stable@vger.kernel.org Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>	2014-10-28 10:06:31 -04:00
Dmitry Kasatkin	a48fda9de9	ima: check xattr value length and type in the ima_inode_setxattr() ima_inode_setxattr() can be called with no value. Function does not check the length so that following command can be used to produce kernel oops: setfattr -n security.ima FOO. This patch fixes it. Changes in v3: * for stable reverted "allow setting hash only in fix or log mode" It will be a separate patch. Changes in v2: * testing validity of xattr type * allow setting hash only in fix or log mode (Mimi) [ 261.562522] BUG: unable to handle kernel NULL pointer dereference at (null) [ 261.564109] IP: [<ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a [ 261.564109] PGD 3112f067 PUD 42965067 PMD 0 [ 261.564109] Oops: 0000 [#1] SMP [ 261.564109] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse [ 261.564109] CPU: 0 PID: 3299 Comm: setxattr Not tainted 3.16.0-kds+ #2924 [ 261.564109] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 261.564109] task: ffff8800428c2430 ti: ffff880042be0000 task.ti: ffff880042be0000 [ 261.564109] RIP: 0010:[<ffffffff812af272>] [<ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a [ 261.564109] RSP: 0018:ffff880042be3d50 EFLAGS: 00010246 [ 261.564109] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000015 [ 261.564109] RDX: 0000001500000000 RSI: 0000000000000000 RDI: ffff8800375cc600 [ 261.564109] RBP: ffff880042be3d68 R08: 0000000000000000 R09: 00000000004d6256 [ 261.564109] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88002149ba00 [ 261.564109] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 261.564109] FS: 00007f6c1e219740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000 [ 261.564109] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.564109] CR2: 0000000000000000 CR3: 000000003b35a000 CR4: 00000000000006f0 [ 261.564109] Stack: [ 261.564109] ffff88002149ba00 ffff880042be3df8 0000000000000000 ffff880042be3d98 [ 261.564109] ffffffff812a101b ffff88002149ba00 ffff880042be3df8 0000000000000000 [ 261.564109] 0000000000000000 ffff880042be3de0 ffffffff8116d08a ffff880042be3dc8 [ 261.564109] Call Trace: [ 261.564109] [<ffffffff812a101b>] security_inode_setxattr+0x48/0x6a [ 261.564109] [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f [ 261.564109] [<ffffffff8116d1e0>] setxattr+0x122/0x16c [ 261.564109] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 261.564109] [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143 [ 261.564109] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 261.564109] [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f [ 261.564109] [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0 [ 261.564109] [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b [ 261.564109] Code: 48 89 f7 48 c7 c6 58 36 81 81 53 31 db e8 73 27 04 00 85 c0 75 28 bf 15 00 00 00 e8 8a a5 d9 ff 84 c0 75 05 83 cb ff eb 15 31 f6 <41> 80 7d 00 03 49 8b 7c 24 68 40 0f 94 c6 e8 e1 f9 ff ff 89 d8 [ 261.564109] RIP [<ffffffff812af272>] ima_inode_setxattr+0x3e/0x5a [ 261.564109] RSP <ffff880042be3d50> [ 261.564109] CR2: 0000000000000000 [ 261.599998] ---[ end trace 39a89a3fc267e652 ]--- Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Cc: stable@vger.kernel.org Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>	2014-10-28 10:03:49 -04:00
Jim Davis	ace80793d1	Documentation: remove outdated references to the linux-next wiki The linux-next wiki at http://linux.f-seidel.de/linux-next/pmwiki has been gone for several months now. Signed-off-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2014-10-28 09:06:11 -04:00
Alexander Graf	371aedbfbb	Documentation: Restrict TSC test code to x86 The prctl test code in Documentation/ tries to show how to use a call that only makes sense on x86. Restrict it there so that other platforms don't try to call asm("rdtsc"). Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Peter Foley <pefoley2@pefoley.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2014-10-28 08:46:27 -04:00
Takashi Iwai	317168d0c7	ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode In compat mode, we copy each field of snd_pcm_status struct but don't touch the reserved fields, and this leaves uninitialized values there. Meanwhile the native ioctl does zero-clear the whole structure, so we should follow the same rule in compat mode, too. Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2014-10-28 12:45:53 +01:00
Maciej W. Rozycki	60e684f0d6	x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems that make use of 8259A Programmable Interrupt Controllers. Specifically convert output like this: CPU0 0: 76573 XT-PIC-XT-PIC timer 1: 11 XT-PIC-XT-PIC i8042 2: 0 XT-PIC-XT-PIC cascade 4: 8 XT-PIC-XT-PIC serial 6: 3 XT-PIC-XT-PIC floppy 7: 0 XT-PIC-XT-PIC parport0 8: 1 XT-PIC-XT-PIC rtc0 10: 448 XT-PIC-XT-PIC fddi0 12: 23 XT-PIC-XT-PIC eth0 14: 2464 XT-PIC-XT-PIC ide0 NMI: 0 Non-maskable interrupts ERR: 0 to one like this: CPU0 0: 122033 XT-PIC timer 1: 11 XT-PIC i8042 2: 0 XT-PIC cascade 4: 8 XT-PIC serial 6: 3 XT-PIC floppy 7: 0 XT-PIC parport0 8: 1 XT-PIC rtc0 10: 145 XT-PIC fddi0 12: 31 XT-PIC eth0 14: 2245 XT-PIC ide0 NMI: 0 Non-maskable interrupts ERR: 0 that is one like we used to have from ~2.2 till it was changed sometime. The rationale is there is no value in this duplicate information, it merely clutters output and looks ugly. We only have one handler for 8259A interrupts so there is no need to give it a name separate from the name already given to irq_chip. We could define meaningful names for handlers based on bits in the ELCR register on systems that have it or the value of the LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with MCA support gone), to tell edge-triggered and level-triggered inputs apart. While that information does not affect 8259A interrupt handlers it could help people determine which lines are shareable and which are not. That is material for a separate change though. Any tools that parse /proc/interrupts are supposed not to be affected since it was many years we used the format this change converts back to. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 12:01:08 +01:00
Steven Noonan	5631b8fba6	compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles The bug referenced by the comment in this commit was not completely fixed in GCC 4.8.2, as I mentioned in a thread back in February: https://lkml.org/lkml/2014/2/12/797 The conclusion at that time was to make the quirk unconditional until the bug could be found and fixed in GCC. Unfortunately, when I submitted the patch (commit `a9f18034`) I left a comment in that claimed the bug was fixed in GCC 4.8.2+. This comment is inaccurate, and should be removed. Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net Cc: Ingo Molnar <mingo@kernel.org>	2014-10-28 11:03:40 +01:00
Peter Zijlstra	7fb0f1de49	perf/x86: Fix compile warnings for intel_uncore The uncore drivers require PCI and generate compile time warnings when !CONFIG_PCI. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:03 +01:00
Andy Lutomirski	b438b1ab35	perf: Fix typos in sample code in the perf_event.h header struct perf_event_mmap_page has members called "index" and "cap_user_rdpmc". Spell them correctly in the examples. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/320ba26391a8123cc16e5f02d24d34bd404332fd.1412313343.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:02 +01:00
Peter Zijlstra	c719f56092	perf: Fix and clean up initialization of pmu::event_idx Andy reported that the current state of event_idx is rather confused. So remove all but the x86_pmu implementation and change the default to return 0 (the safe option). Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Cody P Schafer <dev@codyps.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Himangi Saraogi <himangi774@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com> Cc: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux390@de.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:01 +01:00
Peter Zijlstra (Intel)	65d71fe137	perf: Fix bogus kernel printk Andy spotted the fail in what was intended as a conditional printk level. Reported-by: Andy Lutomirski <luto@amacapital.net> Fixes: `cc6cd47e73` ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:51:01 +01:00
Kirill Tkhai	f3a7e1a9c4	sched/dl: Fix preemption checks 1) switched_to_dl() check is wrong. We reschedule only if rq->curr is deadline task, and we do not reschedule if it's a lower priority task. But we must always preempt a task of other classes. 2) dl_task_timer(): Policy does not change in case of priority inheritance. rt_mutex_setprio() changes prio, while policy remains old. So we lose some balancing logic in dl_task_timer() and switched_to_dl() when we check policy instead of priority. Boosted task may be rq->curr. (I didn't change switched_from_dl() because no check is necessary there at all). I've looked at this place(switched_to_dl) several times and even fixed this function, but found just now... I suppose some performance tests may work better after this. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413909356.19914.128.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:10 +01:00
Chen Hanxiao	fcd964dda5	sched: Update comments for CLONE_NEWNS Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/1412674147-8941-1-git-send-email-chenhanxiao@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:08 +01:00
Oleg Nesterov	009f60e276	sched: stop the unbound recursion in preempt_schedule_context() preempt_schedule_context() does preempt_enable_notrace() at the end and this can call the same function again; exception_exit() is heavy and it is quite possible that need-resched is true again. 1. Change this code to dec preempt_count() and check need_resched() by hand. 2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid the enable/disable dance around __schedule(). But in this case we need to move into sched/core.c. 3. Cosmetic, but x86 forgets to declare this function. This doesn't really matter because it is only called by asm helpers, still it make sense to add the declaration into asm/preempt.h to match preempt_schedule(). Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:05 +01:00
Kirill Tkhai	6419265899	sched/fair: Fix division by zero sysctl_numa_balancing_scan_size File /proc/sys/kernel/numa_balancing_scan_size_mb allows writing of zero. This bash command reproduces problem: $ while :; do echo 0 > /proc/sys/kernel/numa_balancing_scan_size_mb; \ echo 256 > /proc/sys/kernel/numa_balancing_scan_size_mb; done divide error: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 24112 Comm: bash Not tainted 3.17.0+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88013c852600 ti: ffff880037a68000 task.ti: ffff880037a68000 RIP: 0010:[<ffffffff81074191>] [<ffffffff81074191>] task_scan_min+0x21/0x50 RSP: 0000:ffff880037a6bce0 EFLAGS: 00010246 RAX: 0000000000000a00 RBX: 00000000000003e8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013c852600 RBP: ffff880037a6bcf0 R08: 0000000000000001 R09: 0000000000015c90 R10: ffff880239bf6c00 R11: 0000000000000016 R12: 0000000000003fff R13: ffff88013c852600 R14: ffffea0008d1b000 R15: 0000000000000003 FS: 00007f12bb048700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000001505678 CR3: 0000000234770000 CR4: 00000000000006f0 Stack: ffff88013c852600 0000000000003fff ffff880037a6bd18 ffffffff810741d1 ffff88013c852600 0000000000003fff 000000000002bfff ffff880037a6bda8 ffffffff81077ef7 ffffea0008a56d40 0000000000000001 0000000000000001 Call Trace: [<ffffffff810741d1>] task_scan_max+0x11/0x40 [<ffffffff81077ef7>] task_numa_fault+0x1f7/0xae0 [<ffffffff8115a896>] ? migrate_misplaced_page+0x276/0x300 [<ffffffff81134a4d>] handle_mm_fault+0x62d/0xba0 [<ffffffff8103e2f1>] __do_page_fault+0x191/0x510 [<ffffffff81030122>] ? native_smp_send_reschedule+0x42/0x60 [<ffffffff8106dc00>] ? check_preempt_curr+0x80/0xa0 [<ffffffff8107092c>] ? wake_up_new_task+0x11c/0x1a0 [<ffffffff8104887d>] ? do_fork+0x14d/0x340 [<ffffffff811799bb>] ? get_unused_fd_flags+0x2b/0x30 [<ffffffff811799df>] ? __fd_install+0x1f/0x60 [<ffffffff8103e67c>] do_page_fault+0xc/0x10 [<ffffffff8150d322>] page_fault+0x22/0x30 RIP [<ffffffff81074191>] task_scan_min+0x21/0x50 RSP <ffff880037a6bce0> ---[ end trace 9a826d16936c04de ]--- Also fix race in task_scan_min (it depends on compiler behaviour). Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: David Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@fb.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1413455977.24793.78.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:04 +01:00
Yasuaki Ishimatsu	2847c90e1b	sched/fair: Care divide error in update_task_scan_period() While offling node by hot removing memory, the following divide error occurs: divide error: 0000 [#1] SMP [...] Call Trace: [...] handle_mm_fault [...] ? try_to_wake_up [...] ? wake_up_state [...] __do_page_fault [...] ? do_futex [...] ? put_prev_entity [...] ? __switch_to [...] do_page_fault [...] page_fault [...] RIP [<ffffffff810a7081>] task_numa_fault RSP <ffff88084eb2bcb0> The issue occurs as follows: 1. When page fault occurs and page is allocated from node 1, task_struct->numa_faults_buffer_memory[] of node 1 is incremented and p->numa_faults_locality[] is also incremented as follows: o numa_faults_buffer_memory[] o numa_faults_locality[] NR_NUMA_HINT_FAULT_TYPES \| 0 \| 1 \| ---------------------------------- ---------------------- node 0 \| 0 \| 0 \| remote \| 0 \| node 1 \| 0 \| 1 \| locale \| 1 \| ---------------------------------- ---------------------- 2. node 1 is offlined by hot removing memory. 3. When page fault occurs, fault_types[] is calculated by using p->numa_faults_buffer_memory[] of all online nodes in task_numa_placement(). But node 1 was offline by step 2. So the fault_types[] is calculated by using only p->numa_faults_buffer_memory[] of node 0. So both of fault_types[] are set to 0. 4. The values(0) of fault_types[] pass to update_task_scan_period(). 5. numa_faults_locality[1] is set to 1. So the following division is calculated. static void update_task_scan_period(struct task_struct p, unsigned long shared, unsigned long private){ ... ratio = DIV_ROUND_UP(private NUMA_PERIOD_SLOTS, (private + shared)); } 6. But both of private and shared are set to 0. So divide error occurs here. The divide error is rare case because the trigger is node offline. This patch always increments denominator for avoiding divide error. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/54475703.8000505@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:03 +01:00
Kirill Tkhai	1effd9f193	sched/numa: Fix unsafe get_task_struct() in task_numa_assign() Unlocked access to dst_rq->curr in task_numa_compare() is racy. If curr task is exiting this may be a reason of use-after-free: task_numa_compare() do_exit() ... current->flags \|= PF_EXITING; ... release_task() ... ~~delayed_put_task_struct()~~ ... schedule() rcu_read_lock() ... cur = ACCESS_ONCE(dst_rq->curr) ... ... rq->curr = next; ... context_switch() ... finish_task_switch() ... put_task_struct() ... __put_task_struct() ... free_task_struct() task_numa_assign() ... get_task_struct() ... As noted by Oleg: <<The lockless get_task_struct(tsk) is only safe if tsk == current and didn't pass exit_notify(), or if this tsk was found on a rcu protected list (say, for_each_process() or find_task_by_vpid()). IOW, it is only safe if release_task() was not called before we take rcu_read_lock(), in this case we can rely on the fact that delayed_put_pid() can not drop the (potentially) last reference until rcu_read_unlock(). And as Kirill pointed out task_numa_compare()->task_numa_assign() path does get_task_struct(dst_rq->curr) and this is not safe. The task_struct itself can't go away, but rcu_read_lock() can't save us from the final put_task_struct() in finish_task_switch(); this reference goes away without rcu gp>> The patch provides simple check of PF_EXITING flag. If it's not set, this guarantees that call_rcu() of delayed_put_task_struct() callback hasn't happened yet, so we can safely do get_task_struct() in task_numa_assign(). Locked dst_rq->lock protects from concurrency with the last schedule(). Reusing or unmapping of cur's memory may happen without it. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-10-28 10:46:02 +01:00

... 5 6 7 8 9 ...

481452 Commits All Branches Search

481452 Commits

All Branches