linux

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	72cdd1d971	net: get rid of rtable->idev It seems idev field in struct rtable has no special purpose, but adding extra atomic ops. We hold refcounts on the device itself (using percpu data, so pretty cheap in current kernel). infiniband case is solved using dst.dev instead of idev->dev Removal of this field means routing without route cache is now using shared data, percpu data, and only potential contention is a pair of atomic ops on struct neighbour per forwarded packet. About 5% speedup on routing test. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-11 10:29:40 -08:00
Al Viro	fc14f2fef6	convert get_sb_single() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:28 -04:00
Linus Torvalds	426e1f5cec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...	2010-10-26 17:58:44 -07:00
Linus Torvalds	9e5fca251f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits) IB/qib: clean up properly if pci_set_consistent_dma_mask() fails IB/qib: Allow driver to load if PCIe AER fails IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set IB/qib: Fix extra log level in qib_early_err() RDMA/cxgb4: Remove unnecessary KERN_<level> use RDMA/cxgb3: Remove unnecessary KERN_<level> use IB/core: Add link layer type information to sysfs IB/mlx4: Add VLAN support for IBoE IB/core: Add VLAN support for IBoE IB/mlx4: Add support for IBoE mlx4_en: Change multicast promiscuous mode to support IBoE mlx4_core: Update data structures and constants for IBoE mlx4_core: Allow protocol drivers to find corresponding interfaces IB/uverbs: Return link layer type to userspace for query port operation IB/srp: Sync buffer before posting send IB/srp: Use list_first_entry() IB/srp: Reduce number of BUSY conditions IB/srp: Eliminate two forward declarations IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144 IB: Replace EXTRA_CFLAGS with ccflags-y ...	2010-10-26 17:54:22 -07:00
Hagen Paul Pfeifer	732eacc054	replace nested max/min macros with {max,min}3 macro Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:12 -07:00
Roland Dreier	116e9535fe	Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', 'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next	2010-10-26 16:09:11 -07:00
Jason Gunthorpe	2ca78d23a7	IB/qib: clean up properly if pci_set_consistent_dma_mask() fails Clean up properly if pci_set_consistent_dma_mask() fails. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 16:09:02 -07:00
Ralph Campbell	5d26a1df23	IB/qib: Allow driver to load if PCIe AER fails Some PCIe root complex chip sets don't support advanced error reporting. Allow the driver to load OK if pci_enable_pcie_error_reporting() fails. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 16:09:02 -07:00
Ralph Campbell	9e43e0106d	IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the pointer "dd" is uninitialized. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 16:09:02 -07:00
Jason Gunthorpe	82fdb0ab54	IB/qib: Fix extra log level in qib_early_err() Noticed this odd looking thing in dmesg: ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5 which is due to a bad use of dev_info. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 16:09:02 -07:00
Joe Perches	aa1ad26089	RDMA/cxgb4: Remove unnecessary KERN_<level> use Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 13:45:59 -07:00
Joe Perches	ca7cf94f8b	RDMA/cxgb3: Remove unnecessary KERN_<level> use Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-26 13:45:49 -07:00
Christoph Hellwig	85fe4025c6	fs: do not assign default i_ino in new_inode Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Eli Cohen	8ad330a002	IB/core: Add link layer type information to sysfs Since an IB transport port may use either IB or Ethernet as its link layer, add the file /sys/class/infiniband/<device>/ports/<port_num>/link_layer to show the link layer for the port. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-25 10:20:39 -07:00
Eli Cohen	4c3eb3ca13	IB/mlx4: Add VLAN support for IBoE This patch allows IBoE traffic to be encapsulated in 802.1Q tagged VLAN frames. The VLAN tag is encoded in the GID and derived from it by a simple computation. The netdev notifier callback is modified to catch VLAN device addition/removal and the port's GID table is updated to reflect the change, so that for each netdevice there is an entry in the GID table. When the port's GID table is exhausted, GID entries will not be added. Only children of the main interfaces can add to the GID table; if a VLAN interface is added on another VLAN interface (e.g. "vconfig add eth2.6 8"), then that interfaces will not add an entry to the GID table. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-25 10:20:39 -07:00
Eli Cohen	af7bd46376	IB/core: Add VLAN support for IBoE Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the GID derived from a link local address in the following way: GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN. The 3 bits user priority field of the packets are identical to the 3 bits of the SL. In case of rdma_cm apps, the TOS field is used to generate the SL field by doing a shift right of 5 bits effectively taking to 3 MS bits of the TOS field. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-25 10:20:39 -07:00
Eli Cohen	fa417f7b52	IB/mlx4: Add support for IBoE Add support for IBoE to mlx4_ib. The bulk of the code is handling the new address vector fields; mlx4 needs the MAC address of a remote node to include it in a WQE (for datagrams) or in the QP context (for connected QPs). Address resolution is done by assuming all unicast GIDs are either link-local IPv6 addresses. Multicast group attach/detach needs to update the NIC's multicast filters; but since attaching a QP to a multicast group can be done before the QP is bound to a port, for IBoE we need to keep track of all multicast groups that a QP is attached too before it transitions from INIT to RTR (since it does not have a port in the INIT state). Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Many things cleaned up and otherwise monkeyed with; hope I didn't introduce too many bugs. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-25 10:20:39 -07:00
Eli Cohen	2420b60b1d	IB/uverbs: Return link layer type to userspace for query port operation Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-25 10:20:39 -07:00
David Dillow	19081f31ce	IB/srp: Sync buffer before posting send srp_send_tsk_mgmt() was missing the proper DMA sync calls before posting the buffer to the device. Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-24 22:14:23 -07:00
Bart Van Assche	21c1a90769	IB/srp: Use list_first_entry() Use the list_first_entry() macro in ib_srp instead of open-coding the equivalent, which makes the source code slightly more descriptive. The list_first_entry() macro itself was introduced in kernel 2.6.22. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-24 22:14:19 -07:00
Bart Van Assche	7ade400aba	IB/srp: Reduce number of BUSY conditions As proposed by the SRP (draft) standard, ib_srp reserves one ring element for SRP_TSK_MGMT requests. This patch makes sure that the SCSI mid-layer never tries to queue more than (SRP request limit) - 1 SCSI commands to ib_srp. This improves performance for targets whose request limit is less than or equal to SRP_NORMAL_REQ_SQ_SIZE by reducing the number of BUSY responses reported by ib_srp to the SCSI mid-layer. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-24 22:14:14 -07:00
David Dillow	05a1d7504f	IB/srp: Eliminate two forward declarations Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-24 22:14:08 -07:00
Linus Torvalds	229aebb873	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing\|ed] => interest[ing\|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c	2010-10-24 13:41:39 -07:00
Jack Morgenstein	d0d68b8693	IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144 The Node Description cannot be changed via MADs (it is read-only). Until now, it was changed in the driver via sysfs, and the new Node Description was simply inserted by the driver into MAD responses (replacing the description returned by FW). System startup scripts use the sysfs interface to change the node description at driver startup to show the hostname, etc. However, this has a race condition: the SM could discover the original FW node description rather than the system-specific description if it queried the port before the startup scripts finish running. For mlx4, we fix this with a new FW command (SET_NODE) that allows passing the new node description to FW. When this command is invoked, FW sends a trap 144 to the SM. When it gets this trap, the SM can query the node to obtain the new node description -- thus eliminating the effects of the race. This patch simply calls SET_NODE command when a new node description is entered via sysfs (thus causing trap 144 to be issued by the FW). We ignore all failures of the SET_NODE command (including those caused by using a device FW that predates the SET_NODE command), since in that case things work just as before. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-23 13:53:09 -07:00
matt mooney	7454159d3c	IB: Replace EXTRA_CFLAGS with ccflags-y Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-23 13:45:03 -07:00
Steve Wise	97cb7e40c6	RDMA/ucma: Allow tuning the max listen backlog For iWARP connections, the connect request is carried in a TCP payload on an already established TCP connection. So if the ucma's backlog is full, the connection request is transmitted and acked at the TCP level by the time the connect request gets dropped in the ucma. The end result is the connection gets rejected by the iWARP provider. Further, a 32 node 256NP OpenMPI job will generate > 128 connect requests on some ranks. This patch increases the default max backlog to 1024, and adds a sysctl variable so the backlog can be adjusted at run time. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-23 13:41:40 -07:00
Eli Cohen	c3aa9b186b	IPoIB: Set dev_id field of net_device Use the net device's dev_id field to encode the port number of the pci device. This can be used to to associate a net device with the pci device's port. The encoding is: dev_id = port - 1. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-23 13:35:48 -07:00
Linus Torvalds	5f05647dd8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David	2010-10-23 11:47:02 -07:00
David Dillow	bb12588a38	IB/srp: Implement SRP_CRED_REQ and SRP_AER_REQ This patch adds support for SRP_CRED_REQ to avoid a lockup by targets that use that mechanism to return credits to the initiator. This prevents a lockup observed in the field where we would never add the credits from the SRP_CRED_REQ to our current count, and would therefore never send another command to the target. Minimal support for SRP_AER_REQ is also added, as these messages can also be used to convey additional credits to the initiator. Based upon extensive debugging and code by Bart Van Assche and a bug report by Chris Worley. Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-22 22:19:10 -07:00
Bart Van Assche	dd5e6e38b2	IB/srp: Preparation for transmit ring response allocation The transmit ring in ib_srp (srp_target.tx_ring) is currently only used for allocating requests sent by the initiator to the target. This patch prepares using that ring for allocation of both requests and responses. Also, this patch differentiates the uses of SRP_SQ_SIZE, increases the size of the IB send completion queue by one element and reserves one transmit ring slot for SRP_TSK_MGMT requests. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-22 22:19:10 -07:00
Jason Gunthorpe	5715f5d44b	IB/qib: Process RDMA WRITE ONLY with IMMEDIATE properly See table 35 in IBA - the header order for RDMA_WRITE_ONLY_WITH_IMMEDIATE and SEND_LAST_WITH_IMMEDIATE is different: the RDMA_WRITE_ONLY has a RETH header before the immediate data, so we need a different code path to extract the immediate data. I tested this with a userspace app that does RDMA_WRITE with immediate on a QLE7140. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-22 22:12:15 -07:00
Steve Wise	b955150ea7	RDMA/cxgb3: When a user QP is marked in error, also mark the CQs in error The flushing of work requests for user QPs is implemented entirely in the user mode library. The only kernel interaction is to mark the user QP object indicating it is in error when the QP exits RTS. When the user QP operations are called by the application (eg: post_send, post_recv), the QP in error bit is checked and if set, the library flushes the QP. If, however, the application is not doing IO, but rather just polling the CQ, it will never get flushed work requests. This breaks some classes of applications. This patch adds logic to mark user CQs in error when a QP that is bound to the CQ is marked in error. The library poll code can then notice the CQ is in error and flush all the in error QPs bound to that CQ. Design: - add 1 extra CQE entry to the CQ memory that will be used to indicate in error status. - return the desired CQ memory size that should be mapped by the library - bump the ABI since the create_cq uverbs response changes. - detect older libraries and reduce the mmap size accordingly. (The ABI bump doesn't break old libraries, since they didn't check the ABI field anyway) Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-22 22:00:53 -07:00
Steve Wise	da411ba1da	RDMA/cxgb4: Use cxgb4 service for packet gl to skb Remove the local service t4_pktgl_to_skb() and use cxgb4_pktgl_to_skb() exported by cxgb4. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-22 21:58:50 -07:00
Steve Wise	de5dd81b49	RDMA/cxgb4: Export T4 TCP MIB Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-22 21:57:26 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Justin P. Mattock	631dd1a885	Update broken web addresses in the kernel. The patch below updates broken web addresses in the kernel Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Acked-by: Ben Pfaff <blp@cs.stanford.edu> Acked-by: Hans J. Koch <hjk@linutronix.de> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-10-18 11:03:14 +02:00
Randy Dunlap	e00ce92e0b	infiniband: fix mlx4 kconfig dependency warning Fix kconfig dependency warning to satisfy dependencies: warning: (MLX4_EN && NETDEVICES && NETDEV_10000 && PCI && INET \|\| MLX4_INFINIBAND && INFINIBAND) selects MLX4_CORE which has unmet direct dependencies (NETDEVICES && NETDEV_10000 && PCI) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:25 -07:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Eli Cohen	ff7f5aab35	IB/pack: IBoE UD packet packing support Add support for packing IBoE packet headers. Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Clean up and fix ib_ud_header_init() a bit. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-14 12:41:29 -07:00
Eli Cohen	3c86aa70bf	RDMA/cm: Add RDMA CM support for IBoE devices Add support for IBoE device binding and IP --> GID resolution. Path resolving and multicast joining are implemented within cma.c by filling in the responses and running callbacks in the CMA work queue. IP --> GID resolution always yields IPv6 link local addresses; remote GIDs are derived from the destination MAC address of the remote port. Multicast GIDs are always mapped to multicast MACs as is done in IPv6. (IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in <http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.) Some helper functions are added to ib_addr.h. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-13 15:46:43 -07:00
Eli Cohen	fac70d5191	IB/mad: IBoE supports only QP1 (no QP0) Since IBoE is using Ethernet as its link layer, there is no central management entity so there is need for QP0. QP1 is still needed since it handles communications between CM agents. This patch will skip QP0 and create only QP1 for IBoE ports. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-13 09:38:11 -07:00
Eli Cohen	7b4c876961	IPoIB: Skip IBoE ports IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link layer is Ethernet and IP can work directly over Ethernet, so disable IPoIB for non-IB_LINK_LAYER_INFINIBAND ports. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-13 09:38:11 -07:00
Eric Dumazet	29b4433d99	net: percpu net_device refcount We tried very hard to remove all possible dev_hold()/dev_put() pairs in network stack, using RCU conversions. There is still an unavoidable device refcount change for every dst we create/destroy, and this can slow down some workloads (routers or some app servers, mmap af_packet) We can switch to a percpu refcount implementation, now dynamic per_cpu infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes per device. On x86, dev_hold(dev) code : before lock incl 0x280(%ebx) after: movl 0x260(%ebx),%eax incl fs:(%eax) Stress bench : (Sending 160.000.000 UDP frames, IP route cache disabled, dual E5540 @2.53GHz, 32bit kernel, FIB_TRIE) Before: real 1m1.662s user 0m14.373s sys 12m55.960s After: real 0m51.179s user 0m15.329s sys 10m15.942s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-12 12:35:25 -07:00
Animesh K Trivedi	26012f0750	RDMA/iwcm: Fix hang in uninterruptible wait on cm_id destroy A process can get stuck in an uninterruptible wait in the kernel while destroying a cm_id when iw_cm_connect() fails: For example, When creation of a PD fails but the user continues with an attempt to connect to the server without checking the return value, in iw_cm_connect() a NULL qp is found so the call fails. However the IWCM_F_CONNECT_WAIT bit is not cleared. destroy_cm_id() then waits forever for IWCM_F_CONNECT_WAIT to be cleared. The same problem exists on the passive side with the accept call. Fix this by clearing the bit and waking up any waiters in the appropriate spots. Signed-off-by: Animesh Trivedi <atr@zurich.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-11 20:24:04 -07:00
Steve Wise	3160977a6e	RDMA/cxgb4: Use simple_read_from_buffer() for debugfs handlers We can replace our equivalent open-coded version. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-11 20:15:14 -07:00
Steve Wise	8bbac892fb	RDMA/cxgb4: Add default_llseek to debugfs files Incorporate BKL removal changes. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-11 20:14:00 -07:00
Eli Cohen	5a0fd09428	IB/mlx4: Limit size of fast registration WRs Fix the limit on the size of max fast registration WRs that can be posted to match hardware capabilities. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-11 14:33:17 -07:00
Joe Perches	0f2f930a67	IB/qib: Remove unnecessary casts of private_data Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-06 14:43:51 -07:00
Maciej Sosnowski	52106bd24c	RDMA/nes: Turn carrier off on ifdown This lets the bonding driver to detect when an interface goes down. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Chien Tung <chien.tin.tung@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-06 14:42:32 -07:00
Chien Tung	2932772156	RDMA/nes: Report correct port state if interface is down With commit `cd6860eb` ("RDMA/nes: Fix hangs on ifdown") we no longer remove nes interfaces on ifdown. On nes_query_port(), add an additional check of the netdev queue and report IB_PORT_DOWN if the queue is not running. Signed-off-by: Chien Tung <chien.tin.tung@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-06 12:59:56 -07:00

1 2 3 4 5 ...

2540 Commits