linux

Commit Graph

Author	SHA1	Message	Date
Alan Stern	e041c68341	[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:50 -08:00
Patrick McHardy	4277a083ec	[NETLINK]: Add netlink_has_listeners for avoiding unneccessary event message generation Keep a bitmask of multicast groups with subscribed listeners to let netlink users check for listeners before generating multicast messages. Queries don't perform any locking, which may result in false positives, it is guaranteed however that any new subscriptions are visible before bind() or setsockopt() return. Signed-off-by: Patrick McHardy <kaber@trash.net> ACKed-by: Jamal Hadi Salim<hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:52:01 -08:00
Patrick McHardy	cc9a06cd8d	[NETLINK]: Fix use-after-free in netlink_recvmsg The skb given to netlink_cmsg_recv_pktinfo is already freed, move it up a few lines. Coverity #948 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-12 20:39:38 -08:00
Alexey Kuznetsov	a70ea994a0	[NETLINK]: Fix a severe bug netlink overrun was broken while improvement of netlink. Destination socket is used in the place where it was meant to be source socket, so that now overrun is never sent to user netlink sockets, when it should be, and it even can be set on kernel socket, which results in complete deadlock of rtnetlink. Suggested fix is to restore status quo passing source socket as additional argument to netlink_attachskb(). A little explanation: overrun is set on a socket, when it failed to receive some message and sender of this messages does not or even have no way to handle this error. This happens in two cases: 1. when kernel sends something. Kernel never retransmits and cannot wait for buffer space. 2. when user sends a broadcast and the message was not delivered to some recipients. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-09 16:43:38 -08:00
Randy Dunlap	4fc268d24c	[PATCH] capable/capability.h (net/) net: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-01-11 18:42:14 -08:00
Martin Murray	ad8e4b75c8	[AF_NETLINK]: Fix DoS in netlink_rcv_skb() From: Martin Murray <murrayma@citi.umich.edu> Sanity check nlmsg_len during netlink_rcv_skb. An nlmsg_len == 0 can cause infinite loop in kernel, effectively DoSing machine. Noted by Matin Murray. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-10 13:02:29 -08:00
Kirill Korotaev	14591de147	[PATCH] netlink oops fix due to incorrect error code Fixed oops after failed netlink socket creation. Wrong parathenses in if() statement caused err to be 1, instead of negative value. Trivial fix, not trivial to find though. Signed-Off-By: Dmitry Mishin <dim@sw.ru> Signed-Off-By: Kirill Korotaev <dev@openvz.org> Signed-Off-By: Linus Torvalds <torvalds@osdl.org>	2006-01-09 09:36:52 -08:00
Eric Dumazet	90ddc4f047	[NET]: move struct proto_ops to const I noticed that some of 'struct proto_ops' used in the kernel may share a cache line used by locks or other heavily modified data. (default linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at least) This patch makes sure a 'struct proto_ops' can be declared as const, so that all cpus can share all parts of it without false sharing. This is not mandatory : a driver can still use a read/write structure if it needs to (and eventually a __read_mostly) I made a global stubstitute to change all existing occurences to make them const. This should reduce the possibility of false sharing on SMP, and speedup some socket system calls. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:11:15 -08:00
Herbert Xu	c27bd492fd	[NETLINK]: Use tgid instead of pid for nlmsg_pid Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-22 14:41:50 -08:00
Thomas Graf	82ace47a72	[NETLINK]: Generic netlink receive queue processor Introduces netlink_run_queue() to handle the receive queue of a netlink socket in a generic way. Processes as much as there was in the queue upon entry and invokes a callback function for each netlink message found. The callback function may refuse a message by returning a negative error code but setting the error pointer to 0 in which case netlink_run_queue() will return with a qlen != 0. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 02:26:40 +01:00
Thomas Graf	a8f74b2288	[NETLINK]: Make netlink_callback->done() optional Most netlink families make no use of the done() callback, making it optional gets rid of all unnecessary dummy implementations. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 02:26:40 +01:00
Linus Torvalds	236fa08168	Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15	2005-10-28 08:50:37 -07:00
Al Viro	7d877f3bda	[PATCH] gfp_t: net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-28 08:16:47 -07:00
Jayachandran C	ea7ce40649	[NETLINK]: Remove dead code in af_netlink.c Remove the variable nlk & call to nlk_sk as it does not have any side effect. Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-26 00:54:46 -02:00
Al Viro	dd0fc66fb3	[PATCH] gfp flags annotations - part 1 - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-08 15:00:57 -07:00
Patrick McHardy	513c250000	[NETLINK]: Don't prevent creating sockets when no kernel socket is registered This broke the pam audit module which includes an incorrect check for -ENOENT instead of -EPROTONOTSUPP. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:43:59 -07:00
Arnaldo Carvalho de Melo	6ed8a48582	[NETLINK]: Fix sparse warnings Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:01:35 -07:00
Patrick McHardy	066286071d	[NETLINK]: Add "groups" argument to netlink_kernel_create Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:01:11 -07:00
Patrick McHardy	9a4595bc7e	[NETLINK]: Add set/getsockopt options to support more than 32 groups NETLINK_ADD_MEMBERSHIP/NETLINK_DROP_MEMBERSHIP are used to join/leave groups, NETLINK_PKTINFO is used to enable nl_pktinfo control messages for received packets to get the extended destination group number. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:01:07 -07:00
Patrick McHardy	f7fa9b10ed	[NETLINK]: Support dynamic number of multicast groups per netlink family Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:01:02 -07:00
Patrick McHardy	ab33a1711c	[NETLINK]: Return -EPROTONOSUPPORT in netlink_create() if no kernel socket is registered This is necessary for dynamic number of netlink groups to make sure we know the number of possible groups before bind() is called. With this change pure userspace communication using unused netlink protocols becomes impossible. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:00:58 -07:00
Patrick McHardy	d629b836d1	[NETLINK]: Use group numbers instead of bitmasks internally Using the group number allows increasing the number of groups without beeing limited by the size of the bitmask. It introduces one limitation for netlink users: messages can't be broadcasted to multiple groups anymore, however this feature was never used inside the kernel. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:00:49 -07:00
Patrick McHardy	77247bbb30	[NETLINK]: Fix module refcounting problems Use-after-free: the struct proto_ops containing the module pointer is freed when a socket with pid=0 is released, which besides for kernel sockets is true for all unbound sockets. Module refcount leak: when the kernel socket is closed before all user sockets have been closed the proto_ops struct for this family is replaced by the generic one and the module refcount can't be dropped. The second problem can't be solved cleanly using module refcounting in the generic socket code, so this patch adds explicit refcounting to netlink_create/netlink_release. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:00:45 -07:00
Patrick McHardy	db08052979	[NETLINK]: Remove unused groups member from struct netlink_skb_parms Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:00:39 -07:00
Harald Welte	4fdb3bb723	[NETLINK]: Add properly module refcounting for kernel netlink sockets. - Remove bogus code for compiling netlink as module - Add module refcounting support for modules implementing a netlink protocol - Add support for autoloading modules that implement a netlink protocol as soon as someone opens a socket for that protocol Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:35:08 -07:00
Victor Fusco	37da647d99	[NETLINK]: Fix "nocast type" warnings From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:35:43 -07:00
David S. Miller	b03efcfb21	[NET]: Transform skb_queue_len() binary tests into skb_queue_empty() This is part of the grand scheme to eliminate the qlen member of skb_queue_head, and subsequently remove the 'list' member of sk_buff. Most users of skb_queue_len() want to know if the queue is empty or not, and that's trivially done with skb_queue_empty() which doesn't use the skb_queue_head->qlen member and instead uses the queue list emptyness as the test. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 14:57:23 -07:00
David S. Miller	d470e3b483	[NETLINK]: Fix two socket hashing bugs. 1) netlink_release() should only decrement the hash entry count if the socket was actually hashed. This was causing hash->entries to underflow, which resulting in all kinds of troubles. On 64-bit systems, this would cause the following conditional to erroneously trigger: err = -ENOMEM; if (BITS_PER_LONG > 32 && unlikely(hash->entries >= UINT_MAX)) goto err; 2) netlink_autobind() needs to propagate the error return from netlink_insert(). Otherwise, callers will not see the error as they should and thus try to operate on a socket with a zero pid, which is very bad. However, it should not propagate -EBUSY. If two threads race to autobind the socket, that is fine. This is consistent with the autobind behavior in other protocols. So bug #1 above, combined with this one, resulted in hangs on netlink_sendmsg() calls to the rtnetlink socket. We'd try to do the user sendmsg() with the socket's pid set to zero, later we do a socket lookup using that pid (via the value we stashed away in NETLINK_CB(skb).pid), but that won't give us the user socket, it will give us the rtnetlink socket. So when we try to wake up the receive queue, we dive back into rtnetlink_rcv() which tries to recursively take the rtnetlink semaphore. Thanks to Jakub Jelink for providing backtraces. Also, thanks to Herbert Xu for supplying debugging patches to help track this down, and also finding a mistake in an earlier version of this fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:31:51 -07:00
Thomas Graf	1797754ea7	[NETLINK]: Introduce NLMSG_NEW macro to better handle netlink flags Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes a flags argument. NLMSG_PUT stays there for compatibility but now calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to NLMSG_NEW_ANSWER which now also takes a flags argument. Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER and fixes the two direct users of __nlmsg_put to either provide the flags or use NLMSG_NEW(_ANSWER). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:53:48 -07:00
Tommy S. Christensen	aa1c6a6f7f	[NETLINK]: Defer socket destruction a bit In netlink_broadcast() we're sending shared skb's to netlink listeners when possible (saves some copying). This is OK, since we hold the only other reference to the skb. However, this implies that we must drop our reference on the skb, before allowing a receiving socket to disappear. Otherwise, the socket buffer accounting is disrupted. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-19 13:07:32 -07:00
Tommy S. Christensen	68acc024ea	[NETLINK]: Move broadcast skb_orphan to the skb_get path. Cloned packets don't need the orphan call. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-19 13:06:35 -07:00
Tommy S. Christensen	db61ecc335	[NETLINK]: Fix race with recvmsg(). This bug causes: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122) What's happening is that: 1) The skb is sent to socket 1. 2) Someone does a recvmsg on socket 1 and drops the ref on the skb. Note that the rmalloc is not returned at this point since the skb is still referenced. 3) The same skb is now sent to socket 2. This version of the fix resurrects the skb_orphan call that was moved out, last time we had 'shared-skb troubles'. It is practically a no-op in the common case, but still prevents the possible race with recvmsg. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-19 12:46:59 -07:00
David Woodhouse	bfd4bda097	Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git	2005-05-05 13:59:37 +01:00
Herbert Xu	96c3602343	[NETLINK]: cb_lock does not needs ref count on sk Here is a little optimisation for the cb_lock used by netlink_dump. While fixing that race earlier, I noticed that the reference count held by cb_lock is completely useless. The reason is that in order to obtain the protection of the reference count, you have to take the cb_lock. But the only way to take the cb_lock is through dereferencing the socket. That is, you must already possess a reference count on the socket before you can take advantage of the reference count held by cb_lock. As a corollary, we can remve the reference count held by the cb_lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 14:43:27 -07:00
Andrew Morton	54e0f520e7	netlink audit warning fix scumbags! net/netlink/af_netlink.c: In function `netlink_sendmsg': net/netlink/af_netlink.c:908: warning: implicit declaration of function `audit_get_loginuid' Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2005-04-30 07:07:04 +01:00
Serge Hallyn	c94c257c88	Add audit uid to netlink credentials Most audit control messages are sent over netlink.In order to properly log the identity of the sender of audit control messages, we would like to add the loginuid to the netlink_creds structure, as per the attached patch. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2005-04-29 16:27:17 +01:00
Al Viro	b453257f05	[PATCH] kill gratitious includes of major.h under net/* A lot of places in there are including major.h for no reason whatsoever. Removed. And yes, it still builds. The history of that stuff is often amusing. E.g. for net/core/sock.c the story looks so, as far as I've been able to reconstruct it: we used to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of net/* had moved a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed... Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-04-25 18:32:13 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

38 Commits