linux

Commit Graph

Author	SHA1	Message	Date
Takashi Iwai	4fe2ca1467	ALSA: hda - More coverage for odd-number channels elimination for HDMI The commit `ad09fc9d21` didn't cover the case for Intel and Nvidia HDMIs, where hdmi_pcm_open() is called. Put the hw_constraint there, too. Signed-off-by: Takashi Iwai <tiwai@suse.de>	2011-01-14 10:33:26 +01:00
Takashi Iwai	639cef0eb6	ALSA: hda - Store PCM parameters properly in HDMI open callback In hdmi_pcm_open(), the evaluated PCM hw parameters are stored in hinfo, but these aren't properly set back to the current runtime record since these have been set beforehand in azx_pcm_open(). This patch fixes the behavior. Signed-off-by: Takashi Iwai <tiwai@suse.de>	2011-01-14 10:30:46 +01:00
Sascha Hauer	b9214b9780	ARM mxs: clkdev related compile fixes Since commit `6d803ba` (ARM: 6483/1: arm & sh: factorised duplicated clkdev.c) platforms need to select CLKDEV_LOOKUP instead of COMMON_CLKDEV and need to include <linux/clkdev.h>. Cc: Shawn Guo <shawn.guo@freescale.com> Cc: Lothar Waßmann <LW@KARO-electronics.de> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2011-01-14 10:19:27 +01:00
Nicolas Pitre	4d2692a736	ARM: 6624/1: fix dependency for CONFIG_SMP_ON_UP This depends on !XIP_KERNEL and not !XIP. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-01-14 09:00:46 +00:00
Dave Martin	874d5d3ccc	ARM: 6623/1: Thumb-2: Fix out-of-range offset for Thumb-2 in proc-v7.S Commit `d30e45e` (ARM: pgtable: switch order of Linux vs hardware page tables) introduced a pre-increment addressing offset which is out of range for Thumb-2. Thumb-2 only permits offsets <256. So split the intruction in two for Thumb-2. Signed-off-by: Dave Martin <dave.martin@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-01-14 09:00:30 +00:00
Takashi Iwai	361fe6e908	ALSA: hda - Rearrange fixup struct in patch_realtek.c Signed-off-by: Takashi Iwai <tiwai@suse.de>	2011-01-14 09:55:32 +01:00
Clemens Ladisch	f8fe80e438	ALSA: oxygen: Xonar DG: fix CS4245 register writes Accidentally exchanging register addresses and register values leads to many strange errors ... Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2011-01-14 09:50:01 +01:00
Sascha Hauer	c97b73939a	ARM i.MX mx31_3ds: Fix MC13783 regulator names Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>	2011-01-14 09:47:48 +01:00
Li Zefan	3ec762ad8b	cgroups: Fix a lockdep warning at cgroup removal Commit `2fd6b7f5` ("fs: dcache scale subdirs") forgot to annotate a dentry lock, which caused a lockdep warning. Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2011-01-14 08:46:29 +00:00
Nick Piggin	7b9337aaf9	fs: namei fix ->put_link on wrong inode in do_filp_open J. R. Okajima noticed that ->put_link is being attempted on the wrong inode, and suggested the way to fix it. I changed it a bit according to Al's suggestion to keep an explicit link path around. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 08:42:43 +00:00
Takashi Iwai	ad09fc9d21	ALSA: hda - Suppress the odd number of channels for HDMI It looks like that HDMI codecs don't support the odd number of channels although HD-audio spec doesn't have the restriction. Add the hw_constraint to limit to only the even number of channels. Signed-off-by: Takashi Iwai <tiwai@suse.de>	2011-01-14 09:42:27 +01:00
Abhilash Kesavan	5f35765d83	spi: Enable SPI driver for S5P6440 and S5P6450 This patch enables the existing S3C64XX series SPI driver for S5P64X0 and removed dependency on EXPERIMENTAL because we don't need it now. v3: Changed dependency of S3C64XX_DMA v2: Removed dependency on EXPERIMENTAL Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com> Signed-off-by: Sangbeom Kim <sbkim73@samsung.com> Acked-by: Jassi Brar <jassi.brar@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-01-14 01:01:51 -07:00
Shaohua Li	c553f8e335	block cfq: compensate preempted queue even if it has no slice assigned If a queue is preempted before it gets slice assigned, the queue doesn't get compensation, which looks unfair. For such queue, we compensate it for a whole slice. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-14 08:41:03 +01:00
Shaohua Li	f8ae6e3eb8	block cfq: make queue preempt work for queues from different workload I got this: fio-874 [007] 2157.724514: 8,32 m N cfq874 preempt fio-874 [007] 2157.724519: 8,32 m N cfq830 slice expired t=1 fio-874 [007] 2157.724520: 8,32 m N cfq830 sl_used=1 disp=0 charge=1 iops=0 sect=0 fio-874 [007] 2157.724521: 8,32 m N cfq830 set_active wl_prio:0 wl_type:0 fio-874 [007] 2157.724522: 8,32 m N cfq830 Not idling. st->count:1 cfq830 is an async queue, and preempted by a sync queue cfq874. But since we have cfqg->saved_workload_slice mechanism, the preempt is a nop. Looks currently our preempt is totally broken if the two queues are not from the same workload type. Below patch fixes it. This will might make async queue starvation, but it's what our old code does before cgroup is added. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-14 08:41:02 +01:00
Rob Herring	c289ef4143	mmc: sdhci-of: fix build on non-powerpc platforms Explicitly include err.h, of_address.h and of_irq.h. Make use of machine_is() conditional on PPC. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-01-14 00:22:44 -07:00
françois romieu	f1e02ed109	r8169: keep firmware in memory. The firmware agent is not available during resume. Loading the firmware during open() (see `eee3a96c63`) is not enough. close() is run during resume through rtl8169_reset_task(), whence the mildly natural release of firmware in the driver removal method instead. It will help with http://bugs.debian.org/609538. It will not avoid the 60 seconds delay when: - there is no firmware - the driver is loaded and the device is not up before a suspend/resume Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Tested-by: Jarek Kamiński <jarek@vilo.eu.org> Cc: Hayes <hayeswang@realtek.com> Cc: Ben Hutchings <benh@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:49:57 -08:00
Tobias Klauser	d0f49157d1	netdev: tilepro: Use is_unicast_ether_addr helper Use is_unicast_ether_addr from linux/etherdevice.h instead of custom macros. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:49:56 -08:00
Tobias Klauser	51e7eed79c	etherdevice.h: Add is_unicast_ether_addr function From a check for !is_multicast_ether_addr it is not always obvious that we're checking for a unicast address. So add this helper function to make those code paths easier to read. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:49:56 -08:00
Ben Hutchings	5cd8a77df3	ks8695net: Use default implementation of ethtool_ops::get_link This is completely untested as I don't have an ARM build environment. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:49:55 -08:00
Ben Hutchings	c3d2a7309c	ks8695net: Disable non-working ethtool operations Some ethtool operations can only be implemented for the WAN port, and not all such operations are allowed to return an error code such as -EOPNOTSUPP. Therefore, define two separate ethtool_ops structures for WAN and non-WAN ports; simplify and rename the WAN-only functions. This is completely untested as I don't have an ARM build environment. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:48:27 -08:00
Jesper Juhl	9e56790ad3	USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable. skb_clone() dynamically allocates memory and may fail. If it does it returns NULL. This means we'll dereference a NULL pointer in drivers/net/usb/cdc_ncm.c::cdc_ncm_rx_fixup(). As far as I can tell, the proper way to deal with this is simply to goto the error label. Furthermore gcc complains that 'skb' may be used uninitialized: drivers/net/usb/cdc_ncm.c: In function ‘cdc_ncm_rx_fixup’: drivers/net/usb/cdc_ncm.c:922:18: warning: ‘skb’ may be used uninitialized in this function and I believe it is right. On the line where we pr_debug("invalid frame detected (ignored)" ... we are using the local variable 'skb' but nothing has ever been assigned to that variable yet. I believe the correct fix for that is to use 'skb_in' instead. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:48:26 -08:00
Jesper Juhl	e84f885ebf	vxge: Remember to release firmware after upgrading firmware Regardless of whether the firmware update being performed by vxge_fw_upgrade() is a success or not we must still remember to always release_firmware() before returning. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Ram Vepa <ram.vepa@exar.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:48:26 -08:00
Joe Perches	f767b6df8a	netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr Remove code that has no effect. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:48:25 -08:00
Nicolas Dichtel	78d0736946	ipsec: update MAX_AH_AUTH_LEN to support sha512 icv_truncbits is set to 256 for sha512, so update MAX_AH_AUTH_LEN to 64. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:48:25 -08:00
Eric Dumazet	1ac9ad1394	net: remove dev_txq_stats_fold() After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert `ab35cd4b8f` (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:44:34 -08:00
Linus Torvalds	52cfd503ad	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits) ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework ACPI: fix resource check message ACPI / Battery: Update information on info notification and resume ACPI: Drop device flag wake_capable ACPI: Always check if _PRW is present before trying to evaluate it ACPI / PM: Check status of power resources under mutexes ACPI / PM: Rename acpi_power_off_device() ACPI / PM: Drop acpi_power_nocheck ACPI / PM: Drop acpi_bus_get_power() Platform / x86: Make fujitsu_laptop use acpi_bus_update_power() ACPI / Fan: Rework the handling of power resources ACPI / PM: Register power resource devices as soon as they are needed ACPI / PM: Register acpi_power_driver early ACPI / PM: Add function for updating device power state consistently ACPI / PM: Add function for device power state initialization ACPI / PM: Introduce __acpi_bus_get_power() ACPI / PM: Introduce function for refcounting device power resources ACPI / PM: Add functions for manipulating lists of power resources ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes ACPICA: Update version to 20101209 ...	2011-01-13 20:15:35 -08:00
Linus Torvalds	dc8e7e3ec6	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer intel_idle: open broadcast clock event cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW cpuidle: delete NOP CPUIDLE_FLAG_POLL ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL ACPI, intel_idle: Cleanup idle= internal variables cpuidle: Make cpuidle_enable_device() call poll_idle_init() intel_idle: update Sandy Bridge core C-state residency targets	2011-01-13 20:15:18 -08:00
Linus Torvalds	2c79c69adc	Merge branch 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 * 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6: SFI: use ioremap_cache() instead of ioremap()	2011-01-13 20:15:02 -08:00
Linus Torvalds	db9effe99a	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: fs: fix do_last error case when need_reval_dot nfs: add missing rcu-walk check fs: hlist UP debug fixup fs: fix dropping of rcu-walk from force_reval_path fs: force_reval_path drop rcu-walk before d_invalidate fs: small rcu-walk documentation fixes Fixed up trivial conflicts in Documentation/filesystems/porting	2011-01-13 20:14:13 -08:00
J. R. Okajima	f20877d94a	fs: fix do_last error case when need_reval_dot When open(2) without O_DIRECTORY opens an existing dir, it should return EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it is changed by d_revalidate() which returns any positive to represent 'the target dir is valid.' Should we keep and return the initialized 'error' in this case. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 03:56:04 +00:00
Nick Piggin	657e94b673	nfs: add missing rcu-walk check Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:48:39 +00:00
Linus Torvalds	9c4bc1c2be	Merge branch 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m: Fix module linking error. xen p2m: clear the old pte when adding a page to m2p_override xen gntdev: use gnttab_map_refs and gnttab_unmap_refs xen: introduce gnttab_map_refs and gnttab_unmap_refs xen p2m: transparently change the p2m mappings in the m2p override xen/gntdev: Fix circular locking dependency xen/gntdev: stop using "token" argument xen: gntdev: move use of GNTMAP_contains_pte next to the map_op xen: add m2p override mechanism xen: move p2m handling to separate file xen/gntdev: add VM_PFNMAP to vma xen/gntdev: allow usermode to map granted pages xen: define gnttab_set_map_op/unmap_op Fix up trivial conflict in drivers/xen/Kconfig	2011-01-13 18:46:48 -08:00
Linus Torvalds	2c0076d8c7	Merge branch 'stable/platform-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/platform-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen-platform: Fix compile errors if CONFIG_PCI is not enabled. xen: rename platform-pci module to xen-platform-pci. xen-platform: use PCI interfaces to request IO and MEM resources.	2011-01-13 18:44:52 -08:00
Nick Piggin	2c6755988a	fs: hlist UP debug fixup Po-Yu Chuang <ratbert.chuang@gmail.com> noticed that hlist_bl_set_first could crash on a UP system when LIST_BL_LOCKMASK is 0, because LIST_BL_BUG_ON(!((unsigned long)h->first & LIST_BL_LOCKMASK)); always evaulates to true. Fix the expression, and also avoid a dependency between bit spinlock implementation and list bl code (list code shouldn't know anything except that bit 0 is set when adding and removing elements). Eventually if a good use case comes up, we might use this list to store 1 or more arbitrary bits of data, so it really shouldn't be tied to locking either, but for now they are helpful for debugging. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:36:43 +00:00
Nick Piggin	90dbb77ba4	fs: fix dropping of rcu-walk from force_reval_path As J. R. Okajima noted, force_reval_path passes in the same dentry to d_revalidate as the one in the nameidata structure (other callers pass in a child), so the locking breaks. This can oops with a chrooted nfs mount, for example. Similarly there can be other problems with revalidating a dentry which is already in nameidata of the path walk. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:36:19 +00:00
Nick Piggin	bb20c18db6	fs: force_reval_path drop rcu-walk before d_invalidate d_revalidate can return in rcu-walk mode even when it returns 0. We can't just call any old dcache function on rcu-walk dentry (the dentry is unstable, so even through d_lock can safely be taken, the result may no longer be what we expect -- careful re-checks would be required). So just drop rcu in this case. (I missed this conversion when switching to the rcu-walk convention that Linus suggested) Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:35:53 +00:00
Nick Piggin	a82416da83	fs: small rcu-walk documentation fixes Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:26:53 +00:00
J. Bruce Fields	4795bb37ef	nfsd: break lease on unlink, link, and rename Any change to any of the links pointing to an entry should also break delegations. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:09 -05:00
J. Bruce Fields	6a76bebefe	nfsd4: break lease on nfsd setattr Leases (delegations) should really be broken on any metadata change, not just on size change. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:08 -05:00
J. Bruce Fields	9ce137eee4	nfsd: don't support msnfs export option We've long had these pointless #ifdef MSNFS's sprinkled throughout the code--pointless because MSNFS is always defined (and we give no config option to make that easy to change). So we could just remove the ifdef's and compile the resulting code unconditionally. But as long as we're there: why not just rip out this code entirely? The only purpose is to implement the "msnfs" export option which turns on Windows-like behavior in some cases, and: - the export option isn't documented anywhere; - the userland utilities (which would need to be able to parse "msnfs" in an export file) don't support it; - I don't know how to maintain this, as I don't know what the proper behavior is; and - google shows no evidence that anyone has ever used this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:07 -05:00
J. Bruce Fields	9ee1ba5402	nfsd4: initialize cb_per_client Otherwise a callback that is aborted before it runs will result in a list_del on an uninitialized list head. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-13 21:04:06 -05:00
Daisuke Nishimura	50de1dd967	memcg: fix memory migration of shmem swapcache In the current implementation mem_cgroup_end_migration() decides whether the page migration has succeeded or not by checking "oldpage->mapping". But if we are tring to migrate a shmem swapcache, the page->mapping of it is NULL from the begining, so the check would be invalid. As a result, mem_cgroup_end_migration() assumes the migration has succeeded even if it's not, so "newpage" would be freed while it's not uncharged. This patch fixes it by passing mem_cgroup_end_migration() the result of the page migration. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:51 -08:00
Jesper Juhl	17295c88a1	memcg: use [kv]zalloc[_node] rather than [kv]malloc+memset In mem_cgroup_alloc() we currently do either kmalloc() or vmalloc() then followed by memset() to zero the memory. This can be more efficiently achieved by using kzalloc() and vzalloc(). There's also one situation where we can use kzalloc_node() - this is what's new in this version of the patch. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:51 -08:00
Daisuke Nishimura	dfe076b097	memcg: fix deadlock between cpuset and memcg Commit `b1dd693e` ("memcg: avoid deadlock between move charge and try_charge()") can cause another deadlock about mmap_sem on task migration if cpuset and memcg are mounted onto the same mount point. After the commit, cgroup_attach_task() has sequence like: cgroup_attach_task() ss->can_attach() cpuset_can_attach() mem_cgroup_can_attach() down_read(&mmap_sem) (1) ss->attach() cpuset_attach() mpol_rebind_mm() down_write(&mmap_sem) (2) up_write(&mmap_sem) cpuset_migrate_mm() do_migrate_pages() down_read(&mmap_sem) up_read(&mmap_sem) mem_cgroup_move_task() mem_cgroup_clear_mc() up_read(&mmap_sem) We can cause deadlock at (2) because we've already aquire the mmap_sem at (1). But the commit itself is necessary to fix deadlocks which have existed before the commit like: Ex.1) move charge \| try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() \| down_write(&mmap_sem) mc.moving_task = current \| .. mem_cgroup_precharge_mc() \| __mem_cgroup_try_charge() mem_cgroup_count_precharge() \| prepare_to_wait() down_read(&mmap_sem) \| if (mc.moving_task) -> cannot aquire the lock \| -> true \| schedule() \| -> move charge should wake it up Ex.2) move charge \| try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() \| mc.moving_task = current \| mem_cgroup_precharge_mc() \| mem_cgroup_count_precharge() \| down_read(&mmap_sem) \| .. \| up_read(&mmap_sem) \| \| down_write(&mmap_sem) mem_cgroup_move_task() \| .. mem_cgroup_move_charge() \| __mem_cgroup_try_charge() down_read(&mmap_sem) \| prepare_to_wait() -> cannot aquire the lock \| if (mc.moving_task) \| -> true \| schedule() \| -> move charge should wake it up This patch fixes all of these problems by: 1. revert the commit. 2. To fix the Ex.1, we set mc.moving_task after mem_cgroup_count_precharge() has released the mmap_sem. 3. To fix the Ex.2, we use down_read_trylock() instead of down_read() in mem_cgroup_move_charge() and, if it has failed to aquire the lock, cancel all extra charges, wake up all waiters, and retry trylock. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: Ben Blum <bblum@andrew.cmu.edu> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com> Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:51 -08:00
Minchan Kim	043d18b1e5	memcg: remove unnecessary return from void-returning mem_cgroup_del_lru_list() Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:50 -08:00
Johannes Weiner	f3e8eb70b1	memcg: fix unit mismatch in memcg oom limit calculation Adding the number of swap pages to the byte limit of a memory control group makes no sense. Convert the pages to bytes before adding them. The only user of this code is the OOM killer, and the way it is used means that the error results in a higher OOM badness value. Since the cgroup limit is the same for all tasks in the cgroup, the error should have no practical impact at the moment. But let's not wait for future or changing users to trip over it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com> Cc: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:50 -08:00
KAMEZAWA Hiroyuki	dbd4ea78f0	memcg: add lock to synchronize page accounting and migration Introduce a new bit spin lock, PCG_MOVE_LOCK, to synchronize the page accounting and migration code. This reworks the locking scheme of _update_stat() and _move_account() by adding new lock bit PCG_MOVE_LOCK, which is always taken under IRQ disable. 1. If pages are being migrated from a memcg, then updates to that memcg page statistics are protected by grabbing PCG_MOVE_LOCK using move_lock_page_cgroup(). In an upcoming commit, memcg dirty page accounting will be updating memcg page accounting (specifically: num writeback pages) from IRQ context (softirq). Avoid a deadlocking nested spin lock attempt by disabling irq on the local processor when grabbing the PCG_MOVE_LOCK. 2. lock for update_page_stat is used only for avoiding race with move_account(). So, IRQ awareness of lock_page_cgroup() itself is not a problem. The problem is between mem_cgroup_update_page_stat() and mem_cgroup_move_account_page(). Trade-off: * Changing lock_page_cgroup() to always disable IRQ (or local_bh) has some impacts on performance and I think it's bad to disable IRQ when it's not necessary. * adding a new lock makes move_account() slower. Score is here. Performance Impact: moving a 8G anon process. Before: real 0m0.792s user 0m0.000s sys 0m0.780s After: real 0m0.854s user 0m0.000s sys 0m0.842s This score is bad but planned patches for optimization can reduce this impact. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Andrea Righi <arighi@develer.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:50 -08:00
Greg Thelen	2a7106f2cb	memcg: create extensible page stat update routines Replace usage of the mem_cgroup_update_file_mapped() memcg statistic update routine with two new routines: * mem_cgroup_inc_page_stat() * mem_cgroup_dec_page_stat() As before, only the file_mapped statistic is managed. However, these more general interfaces allow for new statistics to be more easily added. New statistics are added with memcg dirty page accounting. Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Andrea Righi <arighi@develer.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:50 -08:00
Greg Thelen	ece72400c2	memcg: document cgroup dirty memory interfaces Document cgroup dirty memory interfaces and statistics. [akpm@linux-foundation.org: fix use_hierarchy description] Signed-off-by: Andrea Righi <arighi@develer.com> Signed-off-by: Greg Thelen <gthelen@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:50 -08:00
Greg Thelen	db16d5ec1f	memcg: add page_cgroup flags for dirty page tracking This patchset provides the ability for each cgroup to have independent dirty page limits. Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim) page cache used by a cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. The patches are based on a series proposed by Andrea Righi in Mar 2010. Overview: - Add page_cgroup flags to record when pages are dirty, in writeback, or nfs unstable. - Extend mem_cgroup to record the total number of pages in each of the interesting dirty states (dirty, writeback, unstable_nfs). - Add dirty parameters similar to the system-wide /proc/sys/vm/dirty_* limits to mem_cgroup. The mem_cgroup dirty parameters are accessible via cgroupfs control files. - Consider both system and per-memcg dirty limits in page writeback when deciding to queue background writeback or block for foreground writeback. Known shortcomings: - When a cgroup dirty limit is exceeded, then bdi writeback is employed to writeback dirty inodes. Bdi writeback considers inodes from any cgroup, not just inodes contributing dirty pages to the cgroup exceeding its limit. - When memory.use_hierarchy is set, then dirty limits are disabled. This is a implementation detail. An enhanced implementation is needed to check the chain of parents to ensure that no dirty limit is exceeded. Performance data: - A page fault microbenchmark workload was used to measure performance, which can be called in read or write mode: f = open(foo. $cpu) truncate(f, 4096) alarm(60) while (1) { p = mmap(f, 4096) if (write) p = 1 else x = p munmap(p) } - The workload was called for several points in the patch series in different modes: - s_read is a single threaded reader - s_write is a single threaded writer - p_read is a 16 thread reader, each operating on a different file - p_write is a 16 thread writer, each operating on a different file - Measurements were collected on a 16 core non-numa system using "perf stat --repeat 3". The -a option was used for parallel (p_*) runs. - All numbers are page fault rate (M/sec). Higher is better. - To compare the performance of a kernel without non-memcg compare the first and last rows, neither has memcg configured. The first row does not include any of these memcg patches. - To compare the performance of using memcg dirty limits, compare the baseline (2nd row titled "w/ memcg") with the the code and memcg enabled (2nd to last row titled "all patches"). root_cgroup child_cgroup s_read s_write p_read p_write s_read s_write p_read p_write mmotm w/o memcg 0.428 0.390 0.429 0.388 mmotm w/ memcg 0.411 0.378 0.391 0.362 0.412 0.377 0.385 0.363 all patches 0.384 0.360 0.370 0.348 0.381 0.363 0.368 0.347 all patches 0.431 0.402 0.427 0.395 w/o memcg This patch: Add additional flags to page_cgroup to track dirty pages within a mem_cgroup. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrea Righi <arighi@develer.com> Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:50 -08:00

... 9 10 11 12 13 ...

232023 Commits All Branches Search

232023 Commits

All Branches