Commit Graph

7053 Commits

Author SHA1 Message Date
Hideo Aoki 95766fff6b [UDP]: Add memory accounting.
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:19 -08:00
Hideo Aoki 3ab224be6d [NET] CORE: Introducing new memory accounting interface.
This patch introduces new memory accounting functions for each network
protocol. Most of them are renamed from memory accounting functions
for stream protocols. At the same time, some stream memory accounting
functions are removed since other functions do same thing.

Renaming:
	sk_stream_free_skb()		->	sk_wmem_free_skb()
	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
	sk_stream_pages()      		->	sk_mem_pages()
	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
	sk_charge_skb()			->	sk_mem_charge()

Removeing
	sk_stream_rfree():	consolidates into sock_rfree()
	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
	sk_stream_mem_schedule()

The following functions are added.
    	sk_has_account(): check if the protocol supports accounting
	sk_mem_uncharge(): do the opposite of sk_mem_charge()

In addition, to achieve consolidation, updating sk_wmem_queued is
removed from sk_mem_charge().

Next, to consolidate memory accounting functions, this patch adds
memory accounting calls to network core functions. Moreover, present
memory accounting call is renamed to new accounting call.

Finally we replace present memory accounting calls with new interface
in TCP and SCTP.

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:18 -08:00
Gui Jianfeng a06b494b61 [IPV6]: Remove useless code from fib6_del_route().
There are useless codes in fib6_del_route(). The following patch has
been tested, every thing looks fine, as usual.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:17 -08:00
Chas Williams fb64c735a5 [ATM]: [br2864] whitespace cleanup
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:14 -08:00
Eric Kinzie 097b19a998 [ATM]: [br2864] routed support
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:13 -08:00
Kay Sievers ef39592f78 [ATM]: Convert struct class_device to struct device
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2008-01-28 15:00:12 -08:00
Robert P. J. Day 6fe5452b3b [ATM]: atm is no longer experimental
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:11 -08:00
Herbert Xu 9dd3245a2a [IPSEC]: Move all calls to xfrm_audit_state_icvfail to xfrm_input
Let's nip the code duplication in the bud :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Herbert Xu 0883ae0e55 [IPSEC]: Fix transport-mode async resume on intput without netfilter
When netfilter is off the transport-mode async resumption doesn't work
because we don't push back the IP header.  This patch fixes that by
moving most of the code outside of ifdef NETFILTER since the only part
that's not common is the short-circuit in the protocol handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:10 -08:00
Herbert Xu fcb8c156c8 [IPSEC]: Fix double free on skb on async output
When the output transform returns EINPROGRESS due to async operation we'll
free the skb the straight away as if it were an error.  This patch fixes
that so that the skb is freed when the async operation completes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:09 -08:00
Ilpo Järvinen c776ee01bd [TCP]: Remove seq_rtt ptr from clean_rtx_queue args
While checking Gavin's patch I noticed that the returned seq_rtt
is not used by the caller.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:07 -08:00
Ilpo Järvinen 0e3a4803aa [TCP]: Force TSO splits to MSS boundaries
If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on
odd boundary by the callers of tcp_window_allows.

We try really hard to avoid unnecessary modulos. Therefore the
old caller side check "if (skb->len < limit)" was too wide as
well because limit is not bound in any way to skb->len and can
cause spurious testing for trimming in the middle of the queue
while we only wanted that to happen at the tail of the queue.
A simple additional caller side check for tcp_write_queue_tail
would likely have resulted 2 x modulos because the limit would
have to be first calculated from window, however, doing that
unnecessary modulo is not mandatory. After a minor change to
the algorithm, simply determine first if the modulo is needed
at all and at that point immediately decide also from which
value it should be calculated from.

This approach also kills some duplicated code.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:06 -08:00
Michael Chan 7ffc49a6ee [ETH]: Combine format_addr() with print_mac().
print_mac() used many most net drivers and format_addr() used by
net-sysfs.c are very similar and they can be intergrated.

format_addr() is also identically redefined in the qla4xxx iscsi
driver.

Export a new function sysfs_format_mac() to be used by net-sysfs,
qla4xxx and others in the future.  Both print_mac() and
sysfs_format_mac() call _format_mac_addr() to do the formatting.

Changed print_mac() to use unsigned char * to be consistent with
net_device struct's dev_addr.  Added buffer length overrun checking
as suggested by Joe Perches.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:05 -08:00
Eric Dumazet 21371f768b [SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim()
sk_forward_alloc being signed, we should take care of divides by
SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
__sk_stream_mem_reclaim()

This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
shifts instead of plain divides.

This should help compiler to choose right shifts instead of
expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:05 -08:00
Masahide NAKAMURA b15c4bcd15 [XFRM]: Fix outbound statistics.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:04 -08:00
Eric W. Biederman 426b5303eb [NETNS]: Modify the neighbour table code so it handles multiple network namespaces
I'm actually surprised at how much was involved.  At first glance it
appears that the neighbour table data structures are already split by
network device so all that should be needed is to modify the user
interface commands to filter the set of neighbours by the network
namespace of their devices.

However a couple things turned up while I was reading through the
code.  The proxy neighbour table allows entries with no network
device, and the neighbour parms are per network device (except for the
defaults) so they now need a per network namespace default.

So I updated the two structures (which surprised me) with their very
own network namespace parameter.  Updated the relevant lookup and
destroy routines with a network namespace parameter and modified the
code that interacts with users to filter out neighbour table entries
for devices of other namespaces.

I'm a little concerned that we can modify and display the global table
configuration and from all network namespaces.  But this appears good
enough for now.

I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner.  The
hash table is already dynamically sized so there are it is not a
limiter.  The default parameter would be straight forward to take care
of.  However when I look at the how the network table is built and
used I still find some assumptions that there is only a single
neighbour table for each type of table in the kernel.  The netlink
operations, neigh_seq_start, the non-core network users that call
neigh_lookup.  So while it might be doable it would require more
refactoring than my current approach of just doing a little extra
filtering in the code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:03 -08:00
Paul Moore e1af9f270b [XFRM]: Drop packets when replay counter would overflow
According to RFC4303, section 3.3.3 we need to drop outgoing packets which
cause the replay counter to overflow:

   3.3.3.  Sequence Number Generation

   The sender's counter is initialized to 0 when an SA is established.
   The sender increments the sequence number (or ESN) counter for this
   SA and inserts the low-order 32 bits of the value into the Sequence
   Number field.  Thus, the first packet sent using a given SA will
   contain a sequence number of 1.

   If anti-replay is enabled (the default), the sender checks to ensure
   that the counter has not cycled before inserting the new value in the
   Sequence Number field.  In other words, the sender MUST NOT send a
   packet on an SA if doing so would cause the sequence number to cycle.
   An attempt to transmit a packet that would result in sequence number
   overflow is an auditable event.  The audit log entry for this event
   SHOULD include the SPI value, current date/time, Source Address,
   Destination Address, and (in IPv6) the cleartext Flow ID.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:02 -08:00
Paul Moore afeb14b490 [XFRM]: RFC4303 compliant auditing
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
Eric Dumazet dfd4f0ae2e [TCP]: Avoid two divides in __tcp_grow_window()
tcp_win_from_space() being signed, compiler might emit an integer divide
to compute tcp_win_from_space()/2 .

Using right shifts is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:01 -08:00
Eric Dumazet 8beb5c5f12 [TCP]: Avoid a divide in tcp_mtu_probing()
tcp_mtu_to_mss() being signed, compiler might emit an integer divide
to compute tcp_mtu_to_mss()/2 .

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:00:00 -08:00
David S. Miller 829942c187 [TCP]: Move mss variable in tcp_mtu_probing()
Down into the only scope where it is used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:59 -08:00
Eric Dumazet ce55dd3610 [TCP]: tcp_write_timeout.c cleanup
Before submiting a patch to change a divide to a right shift, I felt
necessary to create a helper function tcp_mtu_probing() to reduce length of
lines exceeding 100 chars in tcp_write_timeout().

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:58 -08:00
Eric Dumazet b790cedd24 [INET]: Avoid an integer divide in rt_garbage_collect()
Since 'goal' is a signed int, compiler may emit an integer divide
to compute goal/2.

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:57 -08:00
YOSHIFUJI Hideaki 9cb5734e5b [TCP]: Convert several length variable to unsigned.
Several length variables cannot be negative, so convert int to
unsigned int.  This also allows us to do sane shift operations
on those variables.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:56 -08:00
John W. Linville c40896de50 net/mac80211/Kconfig: whitespace corrections
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:55 -08:00
John W. Linville d6084cb61d net/wireless/Kconfig: whitespace corrections
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:55 -08:00
Johannes Berg c27f9830f3 mac80211: don't read ERP information from (re)association response
According to the standard, the field cannot be present, so don't
try to interpret it either.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Drake <dsd@gentoo.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:54 -08:00
Johannes Berg 176e4f8442 mac80211: move tx crypto decision
This patch moves the decision making about whether a frame is encrypted
with a certain algorithm up into the TX handlers rather than having it
in the crypto algorithm implementation.

This fixes a problem with the radiotap injection code where injecting
a non-data packet and requesting encryption could end up asking the
driver to encrypt a packet without giving it a key.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:53 -08:00
Johannes Berg 7bbdd2d987 mac80211: implement station stats retrieval
This implements the required cfg80211 callback in mac80211
to allow userspace to get station statistics.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:53 -08:00
Johannes Berg fd5b74dcb8 cfg80211/nl80211: implement station attribute retrieval
After a station is added to the kernel's structures, userspace
has to be able to retrieve statistics about that station, especially
whether the station was idle and how much bytes were transferred
to and from it. This adds the necessary code to nl80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:52 -08:00
Johannes Berg 5727ef1b2e cfg80211/nl80211: station handling
This patch adds station handling to cfg80211/nl80211.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:51 -08:00
Johannes Berg ed1b6cc7f8 cfg80211/nl80211: add beacon settings
This adds the necessary API to cfg80211/nl80211 to allow
changing beaconing settings.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:50 -08:00
Johannes Berg 62da92fb75 mac80211: support getting key sequence counters via cfg80211
This implements cfg80211's get_key() to allow retrieving the sequence
counter for a TKIP or CCMP key from userspace. It also cleans up and
documents the associated low-level driver interface.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:50 -08:00
Johannes Berg e8cbb4cbeb mac80211: support adding/removing keys via cfg80211
This adds the necessary hooks to mac80211 to allow userspace
to edit keys with cfg80211 (through nl80211.)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:49 -08:00
Johannes Berg 41ade00f21 cfg80211/nl80211: introduce key handling
This introduces key handling to cfg80211/nl80211. Default
and group keys can be added, changed and removed; sequence
counters for each key can be retrieved.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:48 -08:00
Johannes Berg 7d54d0ddd6 mac80211: allow easier multicast/broadcast buffering in hardware
There are various decisions influencing the decision whether to buffer
a frame for after the next DTIM beacon. The "do we have stations in PS
mode" condition cannot be tested by the driver so mac80211 has to do
that. To ease driver writing for hardware that can buffer frames until
after the next DTIM beacon, introduce a new txctl flag telling the
driver to buffer a specific frame.

While at it, restructure and comment the code for multicast buffering
and remove spurious "inline" directives.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:47 -08:00
Johannes Berg 4e20cb293c mac80211: make ieee80211_rx_mgmt_action static
The function is only used locally.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:46 -08:00
Johannes Berg 678f5f7117 mac80211: clean up eapol handling in TX path
The previous patch left only one user of the ieee80211_is_eapol()
function and that user can be eliminated easily by introducing
a new "frame is EAPOL" flag to handle the frame specially (we
already have this information) instead of doing the (expensive)
ieee80211_is_eapol() all the time.

Also, allow unencrypted frames to be sent when they are injected.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:46 -08:00
Johannes Berg ce3edf6d0b mac80211: clean up eapol frame handling/port control
This cleans up the eapol frame handling and some related code in the
receive and transmit paths. After this patch
 * EAPOL frames addressed to us or the EAPOL group address are
   always accepted regardless of whether they are encrypted or not
 * other frames from a station are dropped if PAE is enabled and
   the station is not authorized
 * unencrypted frames (except the EAPOL frames above) are dropped if
   drop_unencrypted is enabled
 * some superfluous code that eth_type_trans handles anyway is gone
 * port control is done for transmitted packets

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:45 -08:00
Mattias Nissler 1946b74ce0 rc80211-pid: export tuning parameters through debugfs
This adds all the tunable parameters used by rc80211_pid to debugfs for easy
testing and tuning.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:44 -08:00
Mattias Nissler 12446c67fe rc80211-pid: add debugging
This adds a new debugfs file from which rate control relevant events can be
read one event per line. The output includes the current time, so graphs can be
created showing the rate control parameters. This helps in evaluating and
tuning rate control parameters. While at it, we split headers and code for
better readability.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:44 -08:00
Stefano Brivio 1dc4d1e6a1 rc80211-pid: add sharpening factor
This patch introduces a PID sharpening factor for faster response after
association and low activity events.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:43 -08:00
Stefano Brivio 90d501d610 rc80211-pid: add rate behaviour learning algorithm
This patch introduces a learning algorithm in order for the PID controller
to learn how to map adjustment values to rates. This is better described in
code comments.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:42 -08:00
Stefano Brivio c21b39aca4 mac80211: make PID rate control algorithm the default
This makes the new PID TX rate control algorithm the default instead of the
rc80211_simple rate control algorithm. The simple algorithm was flawed in
several ways: it wasn't responsive at all and didn't age the information it was
relying on properly. The PID algorithm allows us to tune characteristics such
as responsiveness by adjusting parameters and was found to generally behave
better.

The default algorithm can be overridden to select simple instead. Which
ever algorithm is the default is included as part of the mac80211
module automatically. The other algorithm (simple vs. pid) can
be selected for inclusion as well. If EMBEDDED is selected then
the choice is available to have no default specified and neither
algorithm included in mac80211. The default algorithm can be set
through a modparam.

While at it, mark rc80211-simple as deprecated, and schedule it
for removal.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:41 -08:00
Eric Dumazet b92edbe0b8 [TCP] Avoid two divides in tcp_output.c
Because 'free_space' variable in __tcp_select_window() is signed,
expression (free_space / 2) forces compiler to emit an integer divide.

This can be changed to a plain right shift, less expensive.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:41 -08:00
Paul Moore 68277accb3 [XFRM]: Assorted IPsec fixups
This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

 * Use the 'audit_enabled' variable already in include/linux/audit.h
   Removed the need for extern declarations local to each XFRM audit fuction

 * Convert 'sid' to 'secid' everywhere we can
   The 'sid' name is specific to SELinux, 'secid' is the common naming
   convention used by the kernel when refering to tokenized LSM labels,
   unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
   we risk breaking userspace

 * Convert address display to use standard NIP* macros
   Similar to what was recently done with the SPD audit code, this also also
   includes the removal of some unnecessary memcpy() calls

 * Move common code to xfrm_audit_common_stateinfo()
   Code consolidation from the "less is more" book on software development

 * Proper spacing around commas in function arguments
   Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:40 -08:00
Masahide NAKAMURA 8ea843495d [XFRM]: Add packet processing statistics option.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:39 -08:00
Masahide NAKAMURA 0aa647746e [XFRM]: Support to increment packet dropping statistics.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:39 -08:00
Masahide NAKAMURA 558f82ef6e [XFRM]: Define packet dropping statistics.
This statistics is shown factor dropped by transformation
at /proc/net/xfrm_stat for developer.
It is a counter designed from current transformation source code
and defined as linux private MIB.

See Documentation/networking/xfrm_proc.txt for the detail.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:38 -08:00
Masahide NAKAMURA 9473e1f631 [XFRM] MIPv6: Fix to input RO state correctly.
Disable spin_lock during xfrm_type.input() function.
Follow design as IPsec inbound does.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:37 -08:00
Masahide NAKAMURA a1b051405b [XFRM] IPv6: Fix dst/routing check at transformation.
IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
  care about routing changes. It is required by MIPv6 and
  off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
  MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Ilpo Järvinen bd515c3e48 [TCP]: Fix TSO deferring
I'd say that most of what tcp_tso_should_defer had in between
there was dead code because of this.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:36 -08:00
Pavel Emelyanov a43d8994b9 [NEIGH]: Make neigh_add_timer symmetrical to neigh_del_timer.
The neigh_del_timer() looks sane - it removes the timer and
(conditionally) puts the neighbor. I expected, that the
neigh_add_timer() is symmetrical to the del one - i.e. it
holds the neighbor and arms the timer - but it turned out
that it was not so.

I think, that making them look symmetrical makes the code
more readable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:28 -08:00
Pavel Emelyanov 7054fb9376 [INET]: Uninline the inet_twsk_put function.
This one is not that big, but is widely used: saves 1200 bytes
from net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/12 up/down: 97/-1300 (-1203)
function                                     old     new   delta
inet_twsk_put                                  -      87     +87
__inet_lookup_listener                       274     284     +10
tcp_sacktag_write_queue                     2255    2254      -1
tcp_time_wait                                482     411     -71
__inet_check_established                     796     722     -74
tcp_v4_err                                   973     898     -75
__inet_twsk_kill                             230     154     -76
inet_twsk_deschedule                         180     103     -77
tcp_v4_do_rcv                                462     384     -78
inet_hash_connect                            686     607     -79
inet_twdr_do_twkill_work                     236     150     -86
inet_twdr_twcal_tick                         395     307     -88
tcp_v4_rcv                                  1744    1480    -264
tcp_timewait_state_process                   975     644    -331

Export it for ipv6 module.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:28 -08:00
Pavel Emelyanov 77a5ba55da [INET]: Uninline the __inet_lookup_established function.
This is -700 bytes from the net/ipv4/built-in.o

add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700)
function                                     old     new   delta
__inet_lookup_established                      -     339    +339
tcp_sacktag_write_queue                     2254    2255      +1
tcp_v4_err                                  1304     973    -331
tcp_v4_rcv                                  2089    1744    -345
tcp_v4_do_rcv                                826     462    -364

Exporting is for dccp module (used via e.g. inet_lookup).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:27 -08:00
Pavel Emelyanov 152da81deb [INET]: Uninline the __inet_hash function.
This one is used in quite many places in the networking code and
seems to big to be inline.

After the patch net/ipv4/build-in.o loses ~650 bytes:
add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653)
function                                     old     new   delta
__inet_hash_nolisten                           -     282    +282
__inet_hash                                    -     179    +179
tcp_sacktag_write_queue                     2255    2254      -1
__inet_lookup_listener                       284     274     -10
tcp_v4_syn_recv_sock                         755     493    -262
tcp_v4_hash                                  389      35    -354
inet_hash_connect                           1086     599    -487

This version addresses the issue pointed by Eric, that
while being inline this function was optimized by gcc
in respect to the 'listen_possible' argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:26 -08:00
Vlad Yasevich d670119132 [SCTP]: Follow Add-IP security consideratiosn wrt INIT/INIT-ACK
The Security Considerations section of RFC 5061 has the following
text:

   If an SCTP endpoint that supports this extension receives an INIT
   that indicates that the peer supports the ASCONF extension but does
   NOT support the [RFC4895] extension, the receiver of such an INIT
   MUST send an ABORT in response.  Note that an implementation is
   allowed to silently discard such an INIT as an option as well, but
   under NO circumstance is an implementation allowed to proceed with
   the association setup by sending an INIT-ACK in response.

   An implementation that receives an INIT-ACK that indicates that the
   peer does not support the [RFC4895] extension MUST NOT send the
   COOKIE-ECHO to establish the association.  Instead, the
   implementation MUST discard the INIT-ACK and report to the upper-
   layer user that an association cannot be established destroying the
   Transmission Control Block (TCB).

Follow the recomendations.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:25 -08:00
Vlad Yasevich 75205f4783 [SCTP]: Implement ADD-IP special case processing for ABORT chunk
ADD-IP spec has a special case for processing ABORTs:
    F4) ... One special consideration is that ABORT
        Chunks arriving destined to the IP address being deleted MUST be
        ignored (see Section 5.3.1 for further details).

Check if the address we received on is in the DEL state, and if
so, ignore the ABORT.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:24 -08:00
Vlad Yasevich f57d96b2e9 [SCTP]: Change use_as_src into a full address state
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:24 -08:00
Vlad Yasevich a08de64d07 [SCTP]: Update ASCONF processing to conform to spec.
The processing of the ASCONF chunks has changed a lot in the
spec.  New items are:
    1. A list of ASCONF-ACK chunks is now cached
    2. The source of the packet is used in response.
    3. New handling for unexpect ASCONF chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:23 -08:00
Vlad Yasevich ba8a06daed [SCTP]: ADD-IP updates the states where ASCONFs can be sent
C4)  Both ASCONF and ASCONF-ACK Chunks MUST NOT be sent in any SCTP
        state except ESTABLISHED, SHUTDOWN-PENDING, SHUTDOWN-RECEIVED,
        and SHUTDOWN-SENT.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:22 -08:00
Vlad Yasevich df21857714 [SCTP]: Update association lookup to look at ASCONF chunks as well
ADD-IP draft section 5.2 specifies that if an association can not
be found using the source and destination of the IP packet,
then, if the packet contains ASCONF chunks, the Address Parameter
TLV should be used to lookup an association.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:22 -08:00
Vlad Yasevich d6de309759 [SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT
The ADD-IP "Set Primary IP Address" parameter is allowed in the
INIT/INIT-ACK exchange.  Allow processing of this parameter during
the INIT/INIT-ACK.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:21 -08:00
Vlad Yasevich 42e30bf346 [SCTP]: Handle the wildcard ADD-IP Address parameter
The Address Parameter in the parameter list of the ASCONF chunk
may be a wildcard address.  In this case special processing
is required.  For the 'add' case, the source IP of the packet is
added.  In the 'del' case, all addresses except the source IP
of packet are removed. In the "mark primary" case, the source
address is marked as primary.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:20 -08:00
Vlad Yasevich 6afd2e83cd [SCTP]: Discard unauthenticated ASCONF and ASCONF ACK chunks
Now that we support AUTH, discard unauthenticated ASCONF and ASCONF ACK
chunks as mandated in the ADD-IP spec.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:19 -08:00
Herbert Xu 195ad6a3ac [IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels
It appears that I've managed to create two different functions both
called xfrm6_tunnel_output.  This is because we have the plain tunnel
encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
which lives in the files xfrmX_mode_tunnel.c.

This patch renames functions from the latter to use the xfrmX_mode_tunnel
prefix to avoid name-space conflicts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:18 -08:00
Mattias Nissler ad01837593 mac80211: add PID controller based rate control algorithm
Add a new rate control algorithm based on a PID controller. It samples the
percentage of failed frames over time, feeds the result into the controller and
uses its output to control the TX rate.

Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:18 -08:00
Mattias Nissler 1abbe498e4 mac80211: clean up rate selection
Move some code out of rc80211_simple since it's probably needed for all rate
selection algorithms, and fix iwlwifi accordingly. While at it, clean up the
rate_control_get_rate() interface.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:17 -08:00
Ron Rindjunsky 98f0b0a3a4 mac80211: pass in PS_POLL frames
This patch fixes should_drop_frame function to pass in ps poll control
frames required for power save functioanlity. Interface types that do not
have interest for PS POLL frames now drop it in handler.

Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:16 -08:00
Herbert Xu 910ef70aa3 [IPSEC]: Do xfrm_state_check_space before encapsulation
While merging the IPsec output path I moved the encapsulation output
operation to the top of the loop so that it sits outside of the locked
section.  Unfortunately in doing so it now sits in front of the space
check as well which could be a fatal error.

This patch rearranges the calls so that the space check happens as
the thing on the output path.

This patch also fixes an incorrect goto should the encapsulation output
fail.

Thanks to Kazunori MIYAZAWA for finding this bug.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:13 -08:00
Patrick McHardy 33b8e77605 [NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option
The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter
options when disabled and provides defaults (M) that should allow to
run a distribution firewall without further thinking.

Defaults to 'y' to avoid breaking current configurations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:12 -08:00
Patrick McHardy 34498825cb [NETFILTER]: non-power-of-two jhash optimizations
Apply Eric Dumazet's jhash optimizations where applicable. Quoting Eric:

Thanks to jhash, hash value uses full 32 bits. Instead of returning
hash % size (implying a divide) we return the high 32 bits of the
(hash * size) that will give results between [0 and size-1] and same
hash distribution.

On most cpus, a multiply is less expensive than a divide, by an order
of magnitude.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:11 -08:00
Eric Dumazet 7b21e09d1c [NETFILTER]: xt_hashlimit: reduce overhead without IPv6
This patch generalizes the (CONFIG_IP6_NF_IPTABLES || CONFIG_IP6_NF_IPTABLES_MODULE)
test done in hashlimit_init_dst() to all the xt_hashlimit module.

This permits a size reduction of "struct dsthash_dst". This saves memory and
cpu for IPV4 only hosts.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:11 -08:00
Eric Dumazet e2f82ac3fc [NETFILTER]: xt_hashlimit: speedup hash_dst()
1) Using jhash2() instead of jhash() is a litle bit faster if applicable.

2) Thanks to jhash, hash value uses full 32 bits.
   Instead of returning hash % size (implying a divide)
   we return the high 32 bits of the (hash * size) that will
   give results between [0 and size-1] and same hash distribution.

  On most cpus, a multiply is less expensive than a divide, by an order
  of magnitude.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:10 -08:00
Jan Engelhardt 22c2d8bca2 [NETFILTER]: xt_connlimit: use the new union nf_inet_addr
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:09 -08:00
Jan Engelhardt e79ec50b95 [NETFILTER]: Parenthesize macro parameters
Parenthesize macro parameters.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:08 -08:00
Jan Engelhardt 643a2c15a4 [NETFILTER]: Introduce nf_inet_address
A few netfilter modules provide their own union of IPv4 and IPv6
address storage. Will unify that in this patch series.

(1/4): Rename union nf_conntrack_address to union nf_inet_addr and
move it to x_tables.h.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:07 -08:00
Jan Engelhardt df54aae022 [NETFILTER]: x_tables: use %u format specifiers
Use %u format specifiers as ->family is unsigned.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:07 -08:00
Patrick McHardy 051578ccbc [NETFILTER]: nf_nat: properly use RCU for ip_nat_decode_session
We need to use rcu_assign_pointer/rcu_dereference to avoid races.
Also remove an obsolete CONFIG_IP_NAT_NEEDED ifdef.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:06 -08:00
Patrick McHardy 1e796fda00 [NETFILTER]: constify nf_afinfo
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:05 -08:00
Patrick McHardy 76aa1ce139 [NETFILTER]: nfnetlink_log: include GID in netlink message
Similar to Maciej Soltysiak's ipt_LOG patch, include GID in addition
to UID in netlink message.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:04 -08:00
Patrick McHardy 0dfedd2874 [NETFILTER]: nfnetlink_log: use endianness-aware attribute functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:03 -08:00
Patrick McHardy baab2ce7d2 [NETFILTER]: nfnetlink_{queue,log}: return proper error codes in instance_create
Currently we return EINVAL for "instance exists", "allocation failed" and
"module unloaded below us", which is completely inapproriate.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:02 -08:00
Patrick McHardy 1792bab4ca [NETFILTER]: nfnetlink_log: remove excessive debugging
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:02 -08:00
Patrick McHardy cd21f0ac43 [NETFILTER]: nfnetlink_{queue,log}: return ENOTSUPP for unknown cfg commands
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:01 -08:00
Patrick McHardy c0506365a9 [NETFILTER]: nfnetlink_log: fix checks in nfulnl_recv_config
Similar to the nfnetlink_queue fixes:

The peer_pid must be checked in all cases when a logging instance exists,
additionally we must check whether an instance exists before attempting
to configure it to avoid NULL ptr dereferences.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:00 -08:00
Patrick McHardy a7c42955e0 [NETFILTER]: nf_log: remove incomprehensible comment
Whatever that comment tries to say, I don't get it and it looks like
a leftover from the time when RCU wasn't used properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:59:00 -08:00
Patrick McHardy 7b2f9631e7 [NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo arg
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:59 -08:00
Patrick McHardy f01ffbd6e7 [NETFILTER]: nf_log: move logging stuff to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:58 -08:00
Patrick McHardy cc01dcbd26 [NETFILTER]: nf_nat: pass manip type instead of hook to nf_nat_setup_info
nf_nat_setup_info gets the hook number and translates that to the
manip type to perform. This is a relict from the time when one
manip per hook could exist, the exact hook number doesn't matter
anymore, its converted to the manip type. Most callers already
know what kind of NAT they want to perform, so pass the maniptype
in directly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:57 -08:00
Patrick McHardy ce4b1cebdc [NETFILTER]: nf_nat: sprinkle a few __read_mostlys
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:57 -08:00
Patrick McHardy 2b628a0866 [NETFILTER]: nf_nat: mark NAT protocols const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:56 -08:00
Patrick McHardy 3ee9e76038 [NETFILTER]: nf_nat_proto_gre: add missing module reference
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:55 -08:00
Patrick McHardy d978e5daec [NETFILTER]: ctnetlink: fix expectation timeout dumping
When the timer is late its timeout might be before the current time,
in which case a very large value is dumped.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:55 -08:00
Patrick McHardy 77236b6e33 [NETFILTER]: ctnetlink: use netlink attribute helpers
Use NLA_PUT_BE32, nla_get_be32() etc.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:54 -08:00
Pablo Neira Ayuso c7212e9d39 [NETFILTER]: nf_conntrack_sctp: add ctnetlink support
This patch adds support for SCTP to ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:52 -08:00
Pablo Neira Ayuso 37fccd8577 [NETFILTER]: ctnetlink: add support for secmark
This patch adds support for James Morris' connsecmark.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:52 -08:00
Pablo Neira Ayuso 0f417ce989 [NETFILTER]: ctnetlink: add support for master tuple event notification and dumping
This patch adds support for master tuple event notification and
dumping.  Conntrackd needs this information to recover related
connections appropriately.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:51 -08:00
Pablo Neira Ayuso 13eae15a24 [NETFILTER]: ctnetlink: add support for NAT sequence adjustments
The combination of NAT and helpers may produce TCP sequence adjustments.
In failover setups, this information needs to be replicated in order to
achieve a successful recovery of mangled, related connections. This patch is
particularly useful for conntrackd, see:

http://people.netfilter.org/pablo/conntrack-tools/

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:50 -08:00
Benjamin LaHaise 170080645d [NETFILTER]: xt_TCPMSS: don't allow netfilter --setmss to increase mss
When terminating DSL connections for an assortment of random customers, I've
found it necessary to use iptables to clamp the MSS used for connections to
work around the various ICMP blackholes in the greater net.  Unfortunately,
the current behaviour in Linux is imperfect and actually make things worse,
so I'm proposing the following: increasing the MSS in a packet can never be
a good thing, so make --set-mss only lower the MSS in a packet.

Yes, I am aware of --clamp-mss-to-pmtu, but it doesn't work for outgoing
connections from clients (ie web traffic), as it only looks at the PMTU on
the destination route, not the source of the packet (the DSL interfaces in
question have a 1442 byte MTU while the destination ethernet interface is
1500 -- there are problematic hosts which use a 1300 byte MTU).  Reworking
that is probably a good idea at some point, but it's more work than this is.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:50 -08:00