linux

Commit Graph

Author	SHA1	Message	Date
Ben Hutchings	53cb13c680	sfc: Replace tso_state::full_packet_space with ip_base_len We only use tso_state::full_packet_space to calculate the IPv4 tot_len or IPv6 payload_len, not to set tso_state::packet_space. Replace it with an ip_base_len field holding the value of tot_len or payload_len before including the TCP payload, which is much more useful when constructing the new headers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>	2012-08-24 20:10:17 +01:00
Ben Hutchings	f7251a9ce9	sfc: Simplify TSO header buffer allocation TSO header buffers contain a control structure immediately followed by the packet headers, and are kept on a free list when not in use. This complicates buffer management and tends to result in cache read misses when we recycle such buffers (particularly if DMA-coherent memory requires caches to be disabled). Replace the free list with a simple mapping by descriptor index. We know that there is always a payload descriptor between any two descriptors with TSO header buffers, so we can allocate only one such buffer for each two descriptors. While we're at it, use a standard error code for allocation failure, not -1. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>	2012-08-24 20:10:11 +01:00
Ben Hutchings	14bf718fb9	sfc: Stop TX queues before they fill up We now have a definite upper bound on the number of descriptors per skb; use that to stop the queue when the next packet might not fit. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>	2012-08-24 19:00:27 +01:00
Ben Hutchings	7668ff9c2a	sfc: Refactor struct efx_tx_buffer to use a flags field Add a flags field to struct efx_tx_buffer, replacing the continuation and map_single booleans. Since a single descriptor cannot be both a TSO header and the last descriptor for an skb, unionise efx_tx_buffer::{skb,tsoh} and add flags for validity of these fields. Clear all flags in free buffers (whereas previously the continuation flag would be set). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>	2012-08-24 19:00:26 +01:00
Ben Hutchings	8f4cccbbd9	net: Set device operstate at registration time The operstate of a device is initially IF_OPER_UNKNOWN and is updated asynchronously by linkwatch after each change of carrier state reported by the driver. The default carrier state of a net device is on, and this will never be changed on drivers that do not support carrier detection, thus the operstate remains IF_OPER_UNKNOWN. For devices that do support carrier detection, the driver must set the carrier state to off initially, then poll the hardware state when the device is opened. However, we must not activate linkwatch for a unregistered device, and commit `b473001` ('net: Do not fire linkwatch events until the device is registered.') ensured that we don't. But this means that the operstate for many devices that support carrier detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN. The same issue exists with the dormant state. The proper initialisation sequence, avoiding a race with opening of the device, is: rtnl_lock(); rc = register_netdevice(dev); if (rc) goto out_unlock; netif_carrier_off(dev); /* or netif_dormant_on(dev) */ rtnl_unlock(); but it seems silly that this should have to be repeated in so many drivers. Further, the operstate seen immediately after opening the device may still be IF_OPER_UNKNOWN due to the asynchronous nature of linkwatch. Commit `22604c8` ('net: Fix for initial link state in 2.6.28') attempted to fix this by setting the operstate synchronously, but it was reverted as it could lead to deadlock. This initialises the operstate synchronously at registration time only. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-24 12:46:13 -04:00
Timur Tabi	9f35a7342c	net/fsl: introduce Freescale 10G MDIO driver Similar to fsl_pq_mdio.c, this driver is for the 10G MDIO controller on Freescale Frame Manager Ethernet controllers. Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-24 12:42:42 -04:00
Neil Horman	3afa6d00fb	cls_cgroup: Allow classifier cgroups to have their classid reset to 0 The network classifier cgroup initalizes each cgroups instance classid value to 0. However, the sock_update_classid function only updates classid's in sockets if the tasks cgroup classid is not zero, and if it differs from the current classid. The later check is to prevent cache line dirtying, but the former is detrimental, as it prevents resetting a classid for a cgroup to 0. While this is not a common action, it has administrative usefulness (if the admin wants to disable classification of a certain group temporarily for instance). Easy fix, just remove the zero check. Tested successfully by myself Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-24 12:41:17 -04:00
David S. Miller	e6e94e392f	Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== Included changes: - a set of codestyle rearrangements/fixes - new feature to early detect new joining (mesh-unaware) clients - a minor fix for the gw-feature - substitution of shift operations with the BIT() macro - reorganization of the main batman-adv structure (struct batadv_priv) - some more (very) minor cleanups and fixes =================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-24 11:30:50 -04:00
Rami Rosen	f63c45e0e6	packet: fix broken build. This patch fixes a broken build due to a missing header: ... CC net/ipv4/proc.o In file included from include/net/net_namespace.h:15, from net/ipv4/proc.c:35: include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type ... The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock, but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well. See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f: packet: Protect packet sk list with mutex (v2) patch, Signed-off-by: Rami Rosen <rosenr@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-23 09:29:45 -07:00
Eric Dumazet	748e2d9396	net: reinstate rtnl in call_netdevice_notifiers() Eric Biederman pointed out that not holding RTNL while calling call_netdevice_notifiers() was racy. This patch is a direct transcription his feedback against commit `0115e8e30d` (net: remove delay at device dismantle) Thanks Eric ! Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-23 09:24:42 -07:00
Sven Eckelmann	fa4f0afcf4	batman-adv: Start new development cycle Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:23 +02:00
Antonio Quartulli	371351731e	batman-adv: change interface_rx to get orig node In order to understand where a broadcast packet is coming from and use this information to detect not yet announced clients, this patch modifies the interface_rx() function by passing a new argument: the orig node corresponding to the node that originated the received packet (if known). This new argument if not NULL for broadcast packets only (other packets does not have source field). Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:22 +02:00
Antonio Quartulli	30cfd02b60	batman-adv: detect not yet announced clients With the current TT mechanism a new client joining the network is not immediately able to communicate with other hosts because its MAC address has not been announced yet. This situation holds until the first OGM containing its joining event will be spread over the mesh network. This behaviour can be acceptable in networks where the originator interval is a small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs) the client could suffer from several malfunctions like DHCP client timeouts, etc. This patch adds an early detection mechanism that makes nodes in the network able to recognise "not yet announced clients" by means of the broadcast packets they emitted on connection (e.g. ARP or DHCP request). The added client will then be confirmed upon receiving the OGM claiming it or purged if such OGM is not received within a fixed amount of time. Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:22 +02:00
Sven Eckelmann	c67893d17a	batman-adv: Reduce accumulated length of simple statements Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:21 +02:00
Sven Eckelmann	bbb1f90efb	batman-adv: Don't break statements after assignment operator Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:20 +02:00
Sven Eckelmann	8de47de575	batman-adv: Use BIT(x) macro to calculate bit positions Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:19 +02:00
Martin Hundebøll	74ee3634dc	batman-adv: Drop tt queries with foreign dest When enabling promiscuous mode, tt queries for other hosts might be received. Before this patch, "foreign" tt queries were processed like any other query and thus forwarded to its destination again and thereby causing a loop. This patch adds a check to drop foreign tt queries. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:19 +02:00
Martin Hundebøll	ff51fd70ad	batman-adv: Move batadv_check_unicast_packet() batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so move the former to before the latter. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:18 +02:00
Sven Eckelmann	807736f6e0	batman-adv: Split batadv_priv in sub-structures for features The structure batadv_priv grows everytime a new feature is introduced. It gets hard to find the parts of the struct that belongs to a specific feature. This becomes even harder by the fact that not every feature uses a prefix in the member name. The variables for bridge loop avoidence, gateway handling, translation table and visualization server are moved into separate structs that are included in the bat_priv main struct. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:20:13 +02:00
Simon Wunderlich	624463079e	batman-adv: check batadv_orig_hash_add_if() return code If this call fails, some of the orig_nodes spaces may have been resized for the increased number of interface, and some may not. If we would just continue with the larger number of interfaces, this would lead to access to not allocated memory later. We better check the return code, and don't add the interface if no memory is available. OTOH, keeping some of the orig_nodes with too much memory allocated should hurt no one (except for a few too many bytes allocated). Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:46 +02:00
Antonio Quartulli	a51fb9b2ac	batman-adv: fix typos in comments the word millisecond is misspelled in several comments. This patch fixes it. Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:45 +02:00
Antonio Quartulli	d657e621a0	batman-adv: add reference counting for type batadv_tt_orig_list_entry The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so far. This patch introduces it and makes the structure being usable in much more complex context. Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:44 +02:00
Jonathan Corbet	3a7f291bf6	batman-adv: remove a misleading comment As much as I'm happy to see LWN links sprinkled through the kernel by the dozen, this one in particular reflects a very old state of reality; the associated comment is now incorrect. So just delete it. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:44 +02:00
Marek Lindner	1c9b0550f4	batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:43 +02:00
Simon Wunderlich	3eb8773e3a	batman-adv: rename bridge loop avoidance claim types for consistency reasons within the code and with the documentation, we should always call it "claim" and "unclaim". Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:42 +02:00
Simon Wunderlich	99e966fc96	batman-adv: correct comments in bridge loop avoidance Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:41 +02:00
Simon Wunderlich	536a23f119	batman-adv: Add the backbone gateway list to debugfs This is especially useful if there are no claims yet, but we still want to know which gateways are using bridge loop avoidance in the network. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:41 +02:00
Antonio Quartulli	c70437289c	batman-adv: move function arguments on one line Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:40 +02:00
Pavel Emelyanov	0fa7fa98db	packet: Protect packet sk list with mutex (v2) Change since v1: * Fixed inuse counters access spotted by Eric In patch `eea68e2f` (packet: Report socket mclist info via diag module) I've introduced a "scheduling in atomic" problem in packet diag module -- the socket list is traversed under rcu_read_lock() while performed under it sk mclist access requires rtnl lock (i.e. -- mutex) to be taken. [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002 [152363.820573] 4 locks held by crtools/12517: [152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e [152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a [152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab [152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af Similar thing was then re-introduced by further packet diag patches (fanount mutex and pgvec mutex for rings) :( Apart from being terribly sorry for the above, I propose to change the packet sk list protection from spinlock to mutex. This lock currently protects two modifications: * sklist * prot inuse counters The sklist modifications can be just reprotected with mutex since they already occur in a sleeping context. The inuse counters modifications are trickier -- the __this_cpu_-s are used inside, thus requiring the caller to handle the potential issues with contexts himself. Since packet sockets' counters are modified in two places only (packet_create and packet_release) we only need to protect the context from being preempted. BH disabling is not required in this case. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:58:27 -07:00
Allan, Bruce W	b32607dd47	mdio: translation of MMD EEE registers to/from ethtool settings The helper functions which translate IEEE MDIO Manageable Device (MMD) Energy-Efficient Ethernet (EEE) registers 3.20, 7.60 and 7.61 to and from the comparable ethtool supported/advertised settings will be needed by drivers other than those in PHYLIB (e.g. e1000e in a follow-on patch). In the same fashion as similar translation functions in linux/mii.h, move these functions from the PHYLIB core to the linux/mdio.h header file so the code will not have to be duplicated in each driver needing MMD-to-ethtool (and vice-versa) translations. The function and some variable names have been renamed to be more descriptive. Not tested on the only hardware that currently calls the related functions, stmmac, because I don't have access to any. Has been compile tested and the translations have been tested on a locally modified version of e1000e. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:58:27 -07:00
danborkmann@iogearbox.net	9e67030af3	af_packet: use define instead of constant Instead of using a hard-coded value for the status variable, it would make the code more readable to use its destined define from linux/if_packet.h. Signed-off-by: daniel.borkmann@tik.ee.ethz.ch Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:58:27 -07:00
Ying Xue	bfdc587c5a	rds: Don't disable BH on BH context Since we have already in BH context when _write_space(), _data_ready() as well as *_state_change() are called, it's unnecessary to disable BH. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:52:04 -07:00
John Eaglesham	6b923cb718	bonding: support for IPv6 transmit hashing Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:49:30 -07:00
Eric Dumazet	b87fb39e39	ipv6: gre: fix ip6gre_err() ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8, not 40 as expected), and should take into account 'offset' parameter. Also uses pskb_may_pull() to cope with some fragged skbs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:48:32 -07:00
Eric Dumazet	ef8531b64c	xfrm: fix RCU bugs This patch reverts commit `56892261ed` (xfrm: Use rcu_dereference_bh to deference pointer protected by rcu_read_lock_bh), and fixes bugs introduced in commit `418a99ac6a` ( Replace rwlock on xfrm_policy_afinfo with rcu ) 1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH 2) We must defer some writes after the synchronize_rcu() call or a reader can crash dereferencing NULL pointer. 3) Now we use the xfrm_policy_afinfo_lock spinlock only from process context, we no longer need to block BH in xfrm_policy_register_afinfo() and xfrm_policy_unregister_afinfo() 4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in xfrm_policy_unregister_afinfo() 5) Remove a forward inline declaration (xfrm_policy_put_afinfo()), and also move xfrm_policy_get_afinfo() declaration. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Fan Du <fan.du@windriver.com> Cc: Priyanka Jain <Priyanka.Jain@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:39:46 -07:00
Eric Dumazet	0115e8e30d	net: remove delay at device dismantle I noticed extra one second delay in device dismantle, tracked down to a call to dst_dev_event() while some call_rcu() are still in RCU queues. These call_rcu() were posted by rt_free(struct rtable *rt) calls. We then wait a little (but one second) in netdev_wait_allrefs() before kicking again NETDEV_UNREGISTER. As the call_rcu() are now completed, dst_dev_event() can do the needed device swap on busy dst. To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called after a rcu_barrier(), but outside of RTNL lock. Use NETDEV_UNREGISTER_FINAL with care ! Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after IP cache removal. With help from Gao feng Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 21:50:36 -07:00
David S. Miller	bf277b0cce	Merge git://1984.lsi.us.es/nf-next Pablo Neira Ayuso says: ==================== This is the first batch of Netfilter and IPVS updates for your net-next tree. Mostly cleanups for the Netfilter side. They are: * Remove unnecessary RTNL locking now that we have support for namespace in nf_conntrack, from Patrick McHardy. * Cleanup to eliminate unnecessary goto in the initialization path of several Netfilter tables, from Jean Sacren. * Another cleanup from Wu Fengguang, this time to PTR_RET instead of if IS_ERR then return PTR_ERR. * Use list_for_each_entry_continue_rcu in nf_iterate, from Michael Wang. * Add pmtu_disc sysctl option to disable PMTU in their tunneling transmitter, from Julian Anastasov. * Generalize application protocol registration in IPVS and modify IPVS FTP helper to use it, from Julian Anastasov. * update Kconfig. The IPVS FTP helper depends on the Netfilter FTP helper for NAT support, from Julian Anastasov. * Add logic to update PMTU for IPIP packets in IPVS, again from Julian Anastasov. * A couple of sparse warning fixes for IPVS and Netfilter from Claudiu Ghioc and Patrick McHardy respectively. Patrick's IPv6 NAT changes will follow after this batch, I need to flush this batch first before refreshing my tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 18:48:52 -07:00
David S. Miller	bba6ec7e49	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== This series contains updates to ethtool.h, e1000, e1000e, and igb to implement MDI/MDIx control. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 14:23:43 -07:00
David S. Miller	1304a7343b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-08-22 14:21:38 -07:00
Jean Sacren	90efbed18a	netfilter: remove unnecessary goto statement for error recovery Usually it's a good practice to use goto statement for error recovery when initializing the module. This approach could be an overkill if: 1) there is only one fail case; 2) success and failure use the same return statement. For a cleaner approach, remove the unnecessary goto statement and directly implement error recovery. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-22 19:17:38 +02:00
Michael Wang	6705e86724	netfilter: replace list_for_each_continue_rcu with new interface This patch replaces list_for_each_continue_rcu() with list_for_each_entry_continue_rcu() to allow removing list_for_each_continue_rcu(). Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-22 19:17:20 +02:00
Linus Torvalds	23dcfa61ba	Merge branch 'akpm' (Andrew's patch-bomb) Merge fixes from Andrew Morton. Random drivers and some VM fixes. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (17 commits) mm: compaction: Abort async compaction if locks are contended or taking too long mm: have order > 0 compaction start near a pageblock with free pages rapidio/tsi721: fix unused variable compiler warning rapidio/tsi721: fix inbound doorbell interrupt handling drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode mm: correct page->pfmemalloc to fix deactivate_slab regression drivers/rtc/rtc-pcf2123.c: initialize dynamic sysfs attributes mm/compaction.c: fix deferring compaction mistake drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources string: do not export memweight() to userspace hugetlb: update hugetlbpage.txt checkpatch: add control statement test to SINGLE_STATEMENT_DO_WHILE_MACRO mm: hugetlbfs: correctly populate shared pmd cciss: fix incorrect scsi status reporting Documentation: update mount option in filesystem/vfat.txt mm: change nr_ptes BUG_ON to WARN_ON cs5535-clockevt: typo, it's MFGPT, not MFPGT	2012-08-21 17:22:22 -07:00
Linus Torvalds	a484147a52	Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For bug fixes, at soc_camera, si470x, uvcvideo, iguanaworks IR driver, radio_shark Kbuild fixes, and at the V4L2 core (radio fixes)." * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] media: soc_camera: don't clear pix->sizeimage in JPEG mode [media] media: mx2_camera: Fix clock handling for i.MX27 [media] video: mx2_camera: Use clk_prepare_enable/clk_disable_unprepare [media] video: mx1_camera: Use clk_prepare_enable/clk_disable_unprepare [media] media: mx3_camera: buf_init() add buffer state check [media] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set [media] radio-shark: Only compile led support when CONFIG_LED_CLASS is set [media] radio-shark: Call cancel_work_sync from disconnect rather then release [media] radio-shark: Remove work-around for dangling pointer in usb intfdata [media] Add USB dependency for IguanaWorks USB IR Transceiver [media] Add missing logging for rangelow/high of hwseek [media] VIDIOC_ENUM_FREQ_BANDS fix [media] mem2mem_testdev: fix querycap regression [media] si470x: v4l2-compliance fixes [media] DocBook: Remove a spurious character [media] uvcvideo: Reset the bytesused field when recycling an erroneous buffer	2012-08-21 16:54:38 -07:00
Linus Torvalds	8f8ba75ee2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking update from David Miller: "A couple weeks of bug fixing in there. The largest chunk is all the broken crap Amerigo Wang found in the netpoll layer." 1) netpoll and it's users has several serious bugs: a) uses GFP_KERNEL with locks held b) interfaces requiring interrupts disabled are called with them enabled c) and vice versa d) VLAN tag demuxing, as per all other RX packet input paths, is not applied All from Amerigo Wang. 2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for good, from Neal Cardwell. 3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials when the user doesn't specify one explicitly during sendmsg(). Instead we attach an empty (zero) SCM credential block which is definitely not what we want. Fix from Eric Dumazet. 4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix from Ben Hutchings. 5) inet_csk_route_child_sock() checks wrong inet options pointer, fix from Christoph Paasch. 6) When AF_PACKET is used for transmit, packet loopback doesn't behave properly when a socket fanout is enabled, from Eric Leblond. 7) On bluetooth l2cap channel create failure, we leak the socket, from Jaganath Kanakkassery. 8) Fix all the netprio file handling bugs found by Al Viro, from John Fastabend. 9) Several error return and NULL deref bug fixes in networking drivers from Julia Lawall. 10) A large smattering of struct padding et al. kernel memory leaks to userspace found of Mathias Krause. 11) Conntrack expections in netfilter can access an uninitialized timer, fix from Pablo Neira Ayuso. 12) Several netfilter SIP tracker bug fixes from Patrick McHardy. 13) IPSEC ipv6 routes are not initialized correctly all the time, resulting in an OOPS in inet_putpeer(). Also from Patrick McHardy. 14) Bridging does rcu_dereference() outside of RCU protected area, from Stephen Hemminger. 15) Fix routing cache removal performance regression when looking up output routes that have a local destination. From Zheng Yan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) af_netlink: force credentials passing [CVE-2012-3520] ipv4: fix ip header ident selection in __ip_make_skb() ipv4: Use newinet->inet_opt in inet_csk_route_child_sock() tcp: fix possible socket refcount problem net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child() net/core/dev.c: fix kernel-doc warning netconsole: remove a redundant netconsole_target_put() net: ipv6: fix oops in inet_putpeer() net/stmmac: fix issue of clk_get for Loongson1B. caif: Do not dereference NULL in chnl_recv_cb() af_packet: don't emit packet on orig fanout group drivers/net/irda: fix error return code drivers/net/wan/dscc4.c: fix error return code drivers/net/wimax/i2400m/fw.c: fix error return code smsc75xx: add missing entry to MAINTAINERS net: qmi_wwan: new devices: UML290 and K5006-Z net: sh_eth: Add eth support for R8A7779 device netdev/phy: skip disabled mdio-mux nodes dt: introduce for_each_available_child_of_node, of_get_next_available_child net: netprio: fix cgrp create and write priomap race ...	2012-08-21 16:46:08 -07:00
Mel Gorman	c67fe3752a	mm: compaction: Abort async compaction if locks are contended or taking too long Jim Schutt reported a problem that pointed at compaction contending heavily on locks. The workload is straight-forward and in his own words; The systems in question have 24 SAS drives spread across 3 HBAs, running 24 Ceph OSD instances, one per drive. FWIW these servers are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160 Ceph Linux clients doing dd simultaneously to a Ceph file system backed by 12 of these servers. Early in the test everything looks fine procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 31 15 0 287216 576 38606628 0 0 2 1158 2 14 1 3 95 0 0 27 15 0 225288 576 38583384 0 0 18 2222016 203357 134876 11 56 17 15 0 28 17 0 219256 576 38544736 0 0 11 2305932 203141 146296 11 49 23 17 0 6 18 0 215596 576 38552872 0 0 7 2363207 215264 166502 12 45 22 20 0 22 18 0 226984 576 38596404 0 0 3 2445741 223114 179527 12 43 23 22 0 and then it goes to pot procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 163 8 0 464308 576 36791368 0 0 11 22210 866 536 3 13 79 4 0 207 14 0 917752 576 36181928 0 0 712 1345376 134598 47367 7 90 1 2 0 123 12 0 685516 576 36296148 0 0 429 1386615 158494 60077 8 84 5 3 0 123 12 0 598572 576 36333728 0 0 1107 1233281 147542 62351 7 84 5 4 0 622 7 0 660768 576 36118264 0 0 557 1345548 151394 59353 7 85 4 3 0 223 11 0 283960 576 36463868 0 0 46 1107160 121846 33006 6 93 1 1 0 Note that system CPU usage is very high blocks being written out has dropped by 42%. He analysed this with perf and found perf record -g -a sleep 10 perf report --sort symbol --call-graph fractal,5 34.63% [k] _raw_spin_lock_irqsave \| \|--97.30%-- isolate_freepages \| compaction_alloc \| unmap_and_move \| migrate_pages \| compact_zone \| compact_zone_order \| try_to_compact_pages \| __alloc_pages_direct_compact \| __alloc_pages_slowpath \| __alloc_pages_nodemask \| alloc_pages_vma \| do_huge_pmd_anonymous_page \| handle_mm_fault \| do_page_fault \| page_fault \| \| \| \|--87.39%-- skb_copy_datagram_iovec \| \| tcp_recvmsg \| \| inet_recvmsg \| \| sock_recvmsg \| \| sys_recvfrom \| \| system_call \| \| __recv \| \| \| \| \| --100.00%-- (nil) \| \| \| --12.61%-- memcpy --2.70%-- [...] There was other data but primarily it is all showing that compaction is contended heavily on the zone->lock and zone->lru_lock. commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled while isolating pages for migration] noted that it was possible for migration to hold the lru_lock for an excessive amount of time. Very broadly speaking this patch expands the concept. This patch introduces compact_checklock_irqsave() to check if a lock is contended or the process needs to be scheduled. If either condition is true then async compaction is aborted and the caller is informed. The page allocator will fail a THP allocation if compaction failed due to contention. This patch also introduces compact_trylock_irqsave() which will acquire the lock only if it is not contended and the process does not need to schedule. Reported-by: Jim Schutt <jaschut@sandia.gov> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:03 -07:00
Mel Gorman	de74f1cc3b	mm: have order > 0 compaction start near a pageblock with free pages Commit `7db8889ab0` ("mm: have order > 0 compaction start off where it left") introduced a caching mechanism to reduce the amount work the free page scanner does in compaction. However, it has a problem. Consider two process simultaneously scanning free pages C Process A M S F \|---------------------------------------\| Process B M FS C is zone->compact_cached_free_pfn S is cc->start_pfree_pfn M is cc->migrate_pfn F is cc->free_pfn In this diagram, Process A has just reached its migrate scanner, wrapped around and updated compact_cached_free_pfn accordingly. Simultaneously, Process B finishes isolating in a block and updates compact_cached_free_pfn again to the location of its free scanner. Process A moves to "end_of_zone - one_pageblock" and runs this check if (cc->order > 0 && (!cc->wrapped \|\| zone->compact_cached_free_pfn > cc->start_free_pfn)) pfn = min(pfn, zone->compact_cached_free_pfn); compact_cached_free_pfn is above where it started so the free scanner skips almost the entire space it should have scanned. When there are multiple processes compacting it can end in a situation where the entire zone is not being scanned at all. Further, it is possible for two processes to ping-pong update to compact_cached_free_pfn which is just random. Overall, the end result wrecks allocation success rates. There is not an obvious way around this problem without introducing new locking and state so this patch takes a different approach. First, it gets rid of the skip logic because it's not clear that it matters if two free scanners happen to be in the same block but with racing updates it's too easy for it to skip over blocks it should not. Second, it updates compact_cached_free_pfn in a more limited set of circumstances. If a scanner has wrapped, it updates compact_cached_free_pfn to the end of the zone. When a wrapped scanner isolates a page, it updates compact_cached_free_pfn to point to the highest pageblock it can isolate pages from. If a scanner has not wrapped when it has finished isolated pages it checks if compact_cached_free_pfn is pointing to the end of the zone. If so, the value is updated to point to the highest pageblock that pages were isolated from. This value will not be updated again until a free page scanner wraps and resets compact_cached_free_pfn. This is not optimal and it can still race but the compact_cached_free_pfn will be pointing to or very near a pageblock with free pages. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:03 -07:00
Alexandre Bounine	9a9a9a7ada	rapidio/tsi721: fix unused variable compiler warning Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG option off. This patch is applicable to kernel versions starting from v3.2 Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:03 -07:00
Alexandre Bounine	3670e7e12e	rapidio/tsi721: fix inbound doorbell interrupt handling Make sure that there is no doorbell messages left behind due to disabled interrupts during inbound doorbell processing. The most common case for this bug is loss of rionet JOIN messages in systems with three or more rionet participants and MSI or MSI-X enabled. As result, requests for packet transfers may finish with "destination unreachable" error message. This patch is applicable to kernel versions starting from v3.2. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:03 -07:00
Atsushi Nemoto	7dbfb315b2	drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode Correct the offset by subtracting 20 from tm_hour before taking the modulo 12. [ "Why 20?" I hear you ask. Or at least I did. Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly - included in the RS5C348_HOURS_MASK define. So it's really subtracting out that bit to get "hour+12". But then because it does things modulo 12, it needs to add the 12 in again afterwards anyway. This code is confused. It would be much clearer if RS5C348_HOURS_MASK just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't need to do the silly subtract either. Whatever. It's all just math, the end result is the same. - Linus ] Reported-by: James Nute <newten82@gmail.com> Tested-by: James Nute <newten82@gmail.com> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:03 -07:00
Alex Shi	b121186ab1	mm: correct page->pfmemalloc to fix deactivate_slab regression Commit `cfd19c5a9e` ("mm: only set page->pfmemalloc when ALLOC_NO_WATERMARKS was used") tried to narrow down page->pfmemalloc setting, but it missed some places the pfmemalloc should be set. So, in __slab_alloc, the unalignment pfmemalloc and ALLOC_NO_WATERMARKS cause incorrect deactivate_slab() on our core2 server: 64.73% fio [kernel.kallsyms] [k] _raw_spin_lock \| --- _raw_spin_lock \| \|---0.34%-- deactivate_slab \| __slab_alloc \| kmem_cache_alloc \| \| That causes our fio sync write performance to have a 40% regression. Move the checking in get_page_from_freelist() which resolves this issue. Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: David Miller <davem@davemloft.net Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-08-21 16:45:03 -07:00

1 2 3 4 5 ...

322336 Commits All Branches Search

322336 Commits

All Branches