xt_socket can use connection tracking, and checks whether it is a module.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_slave_info_query() should keep a read lock while accessing slave info,
or risk accessing stale data and corruption.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial: fixing gcc 4.4 compiler warning:
drivers/net/cxgb3/t3_hw.c: In function ‘t3_prep_adapter’:
drivers/net/cxgb3/t3_hw.c:3782: warning: suggest parentheses around operand of ‘!’ or change ‘|’ to ‘||’ or ‘!’ to ‘~’
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb_rx_queue_recorded() is true, we dont want to use jash distribution
as the device driver exactly told us which queue was selected at RX time.
jhash makes a statistical shuffle, but this wont work with 8 static inputs.
Later improvements would be to compute reciprocal value of real_num_tx_queues
to avoid a divide here. But this computation should be done once,
when real_num_tx_queues is set. This needs a separate patch, and a new
field in struct net_device.
Reported-by: Andrew Dickinson <andrew@whydna.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennert Buytenhek wrote:
> Since 4fb6699481 ("net: Optimize memory
> usage when splicing from sockets.") I'm seeing this oops (e.g. in
> 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver
> (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather
> than the frag mode:
My patch incorrectly assumed skb->sk was always valid, but for
"frag_listed" skbs we can only use skb->sk of their parent.
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org>
Tested-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On several mv643xx_eth hardware versions, the two 64bit mib counters
for 'good octets received' and 'good octets sent' are actually 32bit
counters, and reading from the upper half of the register has the same
effect as reading from the lower half of the register: an atomic
read-and-clear of the entire 32bit counter value. This can under heavy
traffic occasionally lead to small numbers being added to the upper
half of the 64bit mib counter even though no 32bit wrap has occured.
Since we poll the mib counters at least every 30 seconds anyway, we
might as well just skip the reads of the upper halves of the hardware
counters without breaking the stats, which this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when OOM occurs during rx ring refill, mv643xx_eth will get
into an infinite loop, due to the refill function setting the OOM bit
but not clearing the 'rx refill needed' bit for this queue, while the
calling function (the NAPI poll handler) will call the refill function
in a loop until the 'rx refill needed' bit goes off, without checking
the OOM bit.
This patch fixes this by checking the OOM bit in the NAPI poll handler
before attempting to do rx refill. This means that once OOM occurs,
we won't try to do any memory allocations again until the next invocation
of the poll handler.
While we're at it, change the OOM flag to be a single bit instead of
one bit per receive queue since OOM is a system state rather than a
per-queue state, and cancel the OOM timer on entry to the NAPI poll
handler if it's running to prevent it from firing when we've already
come out of OOM.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In "mac80211: correct wext transmit power handler"
I fixed the wext handler, but forgot to make the default of the
user_power_level -1 (aka "auto"), so that now the transmit power
is always set to 0, causing associations to time out and similar
problems since we're transmitting with very little power. Correct
this by correcting the default user_power_level to -1.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Bisected-by: Niel Lambrechts <niel.lambrechts@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- ieee80211_wep_init(), which is called with rtnl_lock held, blocks in
request_module() [waiting for modprobe to load a crypto module].
- modprobe blocks in a call to flush_workqueue(), when it closes a TTY
[presumably when it exits].
- The workqueue item linkwatch_event() blocks on rtnl_lock.
There's no reason for wep_init() to be called with rtnl_lock held, so
just move it outside the critical section.
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After experimenting with kexec with the last merges after 2.6.29, I've
had some problems when probing e100. It would not read the eeprom. After
some bisects, I realized this has been like that since forever (at least
2.6.18). The problem is that shutdown is doing the same thing that
suspend does and puts the device in D3 state. I couldn't find a way to
get the device back to a sane state in the probe function. So, based on
some similar patches from Rafael J. Wysocki for e1000, e1000e, and ixgbe,
I wrote this one for e100.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.
This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
char bname[5] is too small for the string "X GHz" when the null
terminator is taken into account. Thus, turning on rate debugging
can crash unless we have lucky stack alignment.
Cc: stable@kernel.org
Reported-by: Paride Legovini <legovini@spiro.fisica.unipd.it>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Under certain circumstances iwlwifi can get stuck and will no
longer accept scan requests, because the core code (cfg80211)
thinks that it's still processing one. This fixes one of the
points where it can happen, but I've still seen it (although
only with my radio-off-when-idle patch).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rndis_wext_link_change() might be called from rndis_command() at
initialization stage and priv->workqueue/priv->work have not been
initialized yet. This causes invalid opcode at rndis_wext_bind on
some brands of bcm4320.
Fix by initializing workqueue/workers in rndis_wext_bind() before
rndis_command is used.
This bug has existed since 2.6.25, reported at:
http://bugzilla.kernel.org/show_bug.cgi?id=12794
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
drivers/net/wireless/iwlwifi/iwl3945-base.c:1415: error: __ksymtab_iwl3945_rx_queue_reset causes a section type conflict
I am pretty sure that this is a compiler bug, so not to worry. However,
as far as I can see, iwl-3945.o (the only user) and iwl3945-base.o are
always linked into the same module, so the EXPORT_SYMBOL (which causes
the problem) should not be needed. Correct?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The Bluetooth 2.1 specification introduced four different security modes
that can be mapped using Legacy Pairing and Simple Pairing. With the
usage of Simple Pairing it is required that all connections (except
the ones for SDP) are encrypted. So even the low security requirement
mandates an encrypted connection when using Simple Pairing. When using
Legacy Pairing (for Bluetooth 2.0 devices and older) this is not required
since it causes interoperability issues.
To support this properly the low security requirement translates into
different host controller transactions depending if Simple Pairing is
supported or not. However in case of Simple Pairing the command to
switch on encryption after a successful authentication is not triggered
for the low security mode. This patch fixes this and actually makes
the logic to differentiate between Simple Pairing and Legacy Pairing
a lot simpler.
Based on a report by Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth stack uses a reference counting for all established ACL
links and if no user (L2CAP connection) is present, the link will be
terminated to save power. The problem part is the dedicated pairing
when using Legacy Pairing (Bluetooth 2.0 and before). At that point
no user is present and pairing attempts will be disconnected within
10 seconds or less. In previous kernel version this was not a problem
since the disconnect timeout wasn't triggered on incoming connections
for the first time. However this caused issues with broken host stacks
that kept the connections around after dedicated pairing. When the
support for Simple Pairing got added, the link establishment procedure
needed to be changed and now causes issues when using Legacy Pairing
When using Simple Pairing it is possible to do a proper reference
counting of ACL link users. With Legacy Pairing this is not possible
since the specification is unclear in some areas and too many broken
Bluetooth devices have already been deployed. So instead of trying to
deal with all the broken devices, a special pairing timeout will be
introduced that increases the timeout to 60 seconds when pairing is
triggered.
If a broken devices now puts the stack into an unforeseen state, the
worst that happens is the disconnect timeout triggers after 120 seconds
instead of 4 seconds. This allows successful pairings with legacy and
broken devices now.
Based on a report by Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In 2.6.25 we added UDP mem accounting.
This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.
(All threads waiting on a socket are using same sk_sleep anchor)
This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.
Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2
Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b3c
epoll keyed wakeups: make sockets use keyed wakeups)
Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.
This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.
Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
skb_release_head_state()
sock_wfree()
sock_def_write_space()
__wake_up_sync_key()
__wake_up_common()
receiver_wake_function() : Stops here since thread is waiting for an INPUT
Reported-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:
IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
It would be nice to cap this for memory consumption reasons, but a massive
hashtable also causes a significant spike when measuring OS jitter.
With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.
With the patch applied we limit the number of entries to 512k which
can still be overriden by using the rt_entries boot option:
IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
With this patch rt_worker_func now takes 0.460 ms on the same system.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the entry about CAPI 2.0 to the beginning and add a URL.
Incorporate changes suggested by Randy Dunlap, thanks for proofreading.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
isdn: document Kernel CAPI driver interface
Create a file Documentation/isdn/INTERFACE.CAPI describing the
interface between the kernel CAPI subsystem and ISDN device drivers,
analogous to the existing Documentation/isdn/INTERFACE for the old
isdn4linux subsystem. Also add kerneldoc comments to the exported
functions in drivers/isdn/capi/kcapi.c.
Impact: Documentation
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the merging of mISDN, state which files refer only to the
old isdn4linux subsystem. Also add a few missing files.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code writes the PME enabled bit in PCI config space which is
wrong. This was needed for pre-release hardware, and was not removed from
the driver. Also, we need to clear the WUS (wake up status) after we
resume. Otherwise we can't wake for the same event again since it's still
asserted in the hardware. Plus, the multicast lists were being written
improperly, causing multicast WoL to fail.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Stephen Hemminger <shemminger@vyatta.com>
The veth driver will oops if sysfs hooks are open while module is removed.
The net device destructor can not point to code in a module; basically
there are only two possible safe values: NULL - no destructor, or
free_netdev - free on last use
Signed-off-by: David S. Miller <davem@davemloft.net>
When kernel inserts a temporary SA for IKE, it uses the wrong hash
value for dst list. Two hash values were calcultated before: one with
source address and one with a wildcard source address.
Bug hinted by Junwei Zhang <junwei.zhang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the tx_timeout() to properly handle the clean up of the
tx ring. It also sets the tx put pointer back to the correct position to
be in sync with HW.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we failed to allocate new fragments for receive buffer,
the packet should be dropped and packets should be reused.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of mlx4_en_activate_cq() failure, the cleanup
code would go to rx_err and try to disable unactivated rings.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the VLAN event handler does not adjust the VLAN
device's carrier state when the real device or the VLAN device is set
administratively up or down.
The following patch adds a transfer of operating state from the
real device to the VLAN device when the real device is administratively
set up or down, and sets the carrier state up or down during init, open
and close of the VLAN device.
This permits observers above the VLAN device that care about the
carrier state (bonding's link monitor, for example) to receive updates
for administrative changes by more closely mimicing the behavior of real
devices.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Related-to: commit 325fb5b4d2
The compat path suffers from a similar problem. It only uses a __be32
when all of the recent code uses, and expects, an nf_inet_addr
everywhere. As a result, addresses stored by xt_recents were
filled with whatever other stuff was on the stack following the be32.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
With a minor compile fix from Roman.
Reported-and-tested-by: Roman Hoog Antink <rha@open.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds missing role attribute to the DCCP type, otherwise
the creation of entries is not of any use.
The attribute added is CTA_PROTOINFO_DCCP_ROLE which contains the
role of the conntrack original tuple.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit d0dba725 (netfilter: ctnetlink: add callbacks to the per-proto
nlattrs) changed the protocol registration function to abort if the
to-be registered protocol doesn't provide a new callback function.
The DCCP and UDP-Lite IPv6 protocols were missed in this conversion,
add the required callback pointer.
Reported-and-tested-by: Steven Jan Springl <steven@springl.ukfsn.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes a (bogus?) gcc warning during compilation:
net/netfilter/nf_conntrack_netlink.c🔢 warning: 'helpname' may be used uninitialized in this function
net/netfilter/nf_conntrack_netlink.c:991: warning: 'helpname' may be used uninitialized in this function
In fact, helpname is initialized by ctnetlink_parse_help() so
I cannot see a way to use it without being initialized.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch "af_rose/x25: Sanity check the maximum user frame size"
(commit 83e0bbcbe2) from Alan Cox got
locking wrong. If we bail out due to user frame size being too large,
we must unlock the socket beforehand.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NetLabel address selector mechanism has a problem where it can get
mistakenly remove the wrong selector when similar addresses are used. The
problem is caused when multiple addresses are configured that have different
netmasks but the same address, e.g. 127.0.0.0/8 and 127.0.0.0/24. This patch
fixes the problem.
Reported-by: Etienne Basset <etienne.basset@numericable.fr>
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Tested-by: Etienne Basset <etienne.basset@numericable.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
If using the UCC on a MPC8360 in RMII mode, don;t set
UCC_GETH_UPSMR_RPM bit in the upsmr register.
Signed-off-by: Heiko Schocher <hs@denx.de>
Acked-by: Li Yang <leoli@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While ifconfig eth0 up kernel calls open() of 8139 driver(8139too.c).
In rtl8139_hw_start() of rtl8139_open(), 8139 driver enable RX before
setting up the DMA buffer address. In this interval where RX was
enabled and DMA buffer address is not yet set up, any incoming
broadcast packet would be send to a strange physical address:
0x003e8800 which is the default value of DMA buffer address.
Unfortunately, this address is used by Linux kernel. So kernel panics.
This patch fix it by setting up DMA buffer address before RX enabled
and everything is fine even under broadcast packets attack.
Signed-off-by: Jonathan Lin <jon.lin@vatics.com>
Signed-off-by: Amos Kong <jianjun@zeuux.org>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_IUCV runs into a race when queuing incoming iucv messages
and receiving the resulting backlog.
If the Linux system is under pressure (high load or steal time),
the message queue grows up, but messages are not received and queued
onto the backlog queue. In that case, applications do not
receive any data with recvmsg() even if AF_IUCV puts incoming
messages onto the message queue.
The race can be avoided if the message queue spinlock in the
message_pending callback is spreaded across the entire callback
function.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add few more sk states in iucv_sock_shutdown().
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reject incoming iucv messages if the receive direction has been shut down.
It avoids that the queue of outstanding messages increases and exceeds the
message limit of the iucv communication path.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If iucv_sock_recvmsg() is called with MSG_PEEK flag set, the skb is enqueued
twice. If the socket is then closed, the pointer to the skb is freed twice.
Remove the skb_queue_head() call for MSG_PEEK, because the skb_recv_datagram()
function already handles MSG_PEEK (does not dequeue the skb).
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure a second invocation of iucv_sock_close() guarantees proper
freeing of an iucv path.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>