This adds IRQF_SAMPLE_RANDOM to the feature-removal (deprecated) list
since most of the IRQF_SAMPLE_RANDOM users are technically bogus as
entropy sources in the kernel's current entropy model.
This was discussed on the lkml the past few days, which started here:
http://lkml.org/lkml/2009/4/6/283
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Statistics for softirq doesn't exist.
It will be helpful like statistics for interrupts.
This patch introduces counting the number of softirq,
which will be exported in /proc/softirqs.
When softirq handler consumes much CPU time,
/proc/stat is like the following.
$ while :; do cat /proc/stat | head -n1 ; sleep 10 ; done
cpu 88 0 408 739665 583 28 2 0 0
cpu 450 0 1090 740970 594 28 1294 0 0
^^^^
softirq
In such a situation,
/proc/softirqs shows us which softirq handler is invoked.
We can see the increase rate of softirqs.
<before>
$ cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI 0 0 0 0
TIMER 462850 462805 462782 462718
NET_TX 0 0 0 365
NET_RX 2472 2 2 40
BLOCK 0 0 381 1164
TASKLET 0 0 0 224
SCHED 462654 462689 462698 462427
RCU 3046 2423 3367 3173
<after>
$ cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI 0 0 0 0
TIMER 463361 465077 465056 464991
NET_TX 53 0 1 365
NET_RX 3757 2 2 40
BLOCK 0 0 398 1170
TASKLET 0 0 0 224
SCHED 463074 464318 464612 463330
RCU 3505 2948 3947 3673
When CPU TIME of softirq is high,
the rates of increase is the following.
TIMER : 220/sec : CPU1-3
NET_TX : 5/sec : CPU0
NET_RX : 120/sec : CPU0
SCHED : 40-200/sec : all CPU
RCU : 45-58/sec : all CPU
The rates of increase in an idle mode is the following.
TIMER : 250/sec
SCHED : 250/sec
RCU : 2/sec
It seems many softirqs for receiving packets and rcu are invoked. This
gives us help for checking system.
Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
GCC 4.5.0 complains about the declaration of variables
__kernel_sigreturn and __kernel_rt_sigreturn because they have type
void. Correctly declare these symbols as functions to fix the
following error,
arch/sh/kernel/signal_32.c: In function 'setup_frame':
arch/sh/kernel/signal_32.c:368:14: error: taking address of expression of type 'void'
arch/sh/kernel/signal_32.c: In function 'setup_rt_frame':
arch/sh/kernel/signal_32.c:452:14: error: taking address of expression of type 'void'
make[1]: *** [arch/sh/kernel/signal_32.o] Error 1
make: *** [arch/sh/kernel] Error 2
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This forces every update of tx ring producer to check for
availability of space for next full TSO command. Earlier
firmware control commands didn't care to pause tx queue.
Stop the tx queue if there's not enough space to transmit one full
LSO command left on the tx ring after current transmit. This avoids
returning NETDEV_TX_BUSY after checking distance between producer
and consumer on every cpu.
Restart the tx queue only if we have cleaned up enough tx
descriptors.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the detection of cut-thru mode of the hardware (direct dma
to host) to mode configured in SRE (ingress block) rather than
onboard memory control.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
forcedeth doesnt use properly dma api in its tx completion path
and in nv_loopback_test()
pci_map_single() should be paired with pci_unmap_single()
pci_map_page() should be paired with pci_unmap_page()
forcedeth xmit path uses pci_map_single() & pci_map_page(),
but tx completion path only uses pci_unmap_single()
nv_loopback_test() uses pci_map_single() & pci_unmap_page()
Add a dma_single field in struct nv_skb_map, and
define a helper function nv_unmap_txskb
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NMI sourcing functionality (Can only be active if nmi_watchdog is
inactive).
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Bugzilla: 9868 & 10195.
There seems to be a bug into the SMM code that handles TCO Timeout SMI.
Andriy Gapon found that the code on his DG33TL system does the following:
> The handler is quite simple - it tests value in TCO1_CNT against 0x800, i.e.
> checks TCO_TMR_HLT. If the bit is set the handler goes into an infinite loop,
> apparently to allow the second timeout and reboot. Otherwise it simply clears
> TIMEOUT bit in TCO1_STS and that's it.
> So the logic seems to be reversed, because it is hard to see how TIMEOUT can
> get set to 1 and SMI generated when TCO_TMR_HLT is set (other than a
> transitional effect).
The only trick we have is to bypass the SMM code by turning of the generation
of the SMI#. The trick can only be enabled by setting the vendorsupport module
parameter to 911. This trick doesn't work well on laptop's.
Note: this is a dirty hack. Please handle with care. The only real fix is that
the bug in the SMM bios code get's fixed.
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
According to 9.1.33 on p.343 of ICH8.pdf RCBA can be disabled by
hardware if bit 0 of RCBA register is not set.
Perform correct check for this to prevent memory corruption under
some virtual machines where this feature is disabled.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vasily Averin <vvs@openvz.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A pointer to probe and remove functions is passed to the core via
platform_driver_register and so the function must not disappear when the
.init sections are discarded. Otherwise (if also having HOTPLUG=y)
unbinding and binding a device to the driver via sysfs will result in an
oops as does a device being registered late.
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
This broke net/atm since this protocol assumed a null
initial value. This patch makes necessary changes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the adapter is not power-manageable using either ACPI, or the
native PCI PM interface, __e100_power_off() returns error code, which
causes every attempt to suspend to fail, although it should return 0
in such a case. Fix this problem by ignoring the return value of
pci_set_power_state() in __e100_power_off().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
clk_disable was called twice in the remove function.
Correct this so that the driver module unloads without error.
Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bnx2.c | 4 +-
drivers/net/e1000/e1000_main.c | 4 +-
drivers/net/ixgbe/ixgbe_main.c | 6 +-
drivers/net/mv643xx_eth.c | 2 +-
drivers/net/niu.c | 4 +-
drivers/net/virtio_net.c | 10 ++--
drivers/s390/net/qeth_l2_main.c | 2 +-
include/linux/netdevice.h | 17 +++--
net/core/dev.c | 130 ++++++++++++++++++--------------------
9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
My previous patch, which explicitly delays freeing of tnodes by adding
them to the list to flush them after the update is finished, isn't
strict enough. It treats exceptionally tnodes without parent, assuming
they are newly created, so "invisible" for the read side yet.
But the top tnode doesn't have parent as well, so we have to exclude
all exceptions (at least until a better way is found). Additionally we
need to move rcu assignment of this node before flushing, so the
return type of the trie_rebalance() function is changed.
Reported-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/sh has a couple of stray markers without any users introduced
in commit 3d58695edb. Remove them in
preparation of removing the markers in favour of the TRACE_EVENT
macro (and also because we don't keep dead code around).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Action police statistics could be misleading because drops are not
shown when expected.
With feedback from: Jamal Hadi Salim <hadi@cyberus.ca>
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements skb recycling. It reclaims transmitted skb's
for use in the receive ring.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the size of the driver transmit ring to reduce latency
and allow qdisc to do better rate control. Also make it
obvious what the minimum transmit ring allowed is and why.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since it is likely that there are multiple packets received per
interrupt, only update the receive counters once after all
packets are processed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic in sky2_down was incorrect. Receiver could report status
after rx_stop was called.
The steps need to be:
* stop new frames from being transmitted
* shut off transmit/receive logic
* synchronize with NAPI to process status info about transmitter
and receiver
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some read's to avoid any PCI posting issues when controlling
irq's.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset more parts of the receive path when device is take offline.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unblocks the chip if it is stuck in pause cycle during
shutdown.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stopping all activity through ChipCmd and blindly acking the irqs
is neither nice nor completely needed: the transition to low-power
mode does enough work and it apparently keeps the device in a sane
state.
Patch suggested by a fix for http://bugzilla.kernel.org/show_bug.cgi?id=9512
The rtl_shutdown path is kept unchanged so far.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Anders Eriksson <aeriksson@fastmail.fm>
Cc: Edward Hsu <edward_hsu@realtek.com.tw>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sis190 driver is trying to get default phy, if it doesn't find home
or lan phy, it falls back to the first phy in the phy list but list_entry()
points to a bogus entry. list_first_entry() should be used instead.
Signed-off-by: Arnaud Patard <apatard@mandriva.com>
Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-- derived from reverted commit 047584ce94
-- reworked by Grant Likely to play nice with commit:
"net: Rework ucc_geth driver to use of_mdio infrastructure"
(0b9da337dc)
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 047584ce94.
This patch meshes badly with "net: Rework ucc_geth driver to use
of_mdio infrastructure" (0b9da337dc).
Since most of the patch needs to be reworked, it is clearer to revert
the patch and then apply the corrected version
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb mac_header field is sometimes NULL (or ~0u) as a sentinel
value. The places where skb is expanded add an offset which would
change this flag into an invalid pointer (or offset).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looking at the crash in log_martians(), one suspect is that the check for
mac header being set is not correct. The value of mac_header defaults to
0 on allocation, therefore skb_mac_header_was_set will always be true on
platforms using NET_SKBUFF_USES_OFFSET.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts 3f31fddf, which is no longer needed because if a
race between freeing buffer and committing transaction functionality
occurs and dio gets error, currently dio falls back to buffered IO due
to the commit 6ccfa806.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Mingming Cao <cmm@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
At the end of reshape_request we update cyrr_resync_completed
if we are about to pause due to reaching resync_max.
However we update it to the wrong value. We need to add the
"reshape_sectors" that have just been reshaped.
Signed-off-by: NeilBrown <neilb@suse.de>
In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
conflict.
Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Current, when we update the 'conf' structure, when adding a
drive to a linear array, we keep the old version around until
the array is finally stopped, as it is not safe to free it
immediately.
Now that we have rcu protection on all accesses to 'conf',
we can use call_rcu to free it more promptly.
Signed-off-by: NeilBrown <neilb@suse.de>
Due to the lack of memory ordering guarantees, we may have races around
mddev->conf.
In particular, the correct contents of the structure we get from
dereferencing ->private might not be visible to this CPU yet, and
they might not be correct w.r.t mddev->raid_disks.
This patch addresses the problem using rcu protection to avoid
such race conditions.
Signed-off-by: SandeepKsinha <sandeepksinha@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If the superblock of a component device indicates the presence of a
bitmap but the corresponding raid personality does not support bitmaps
(raid0, linear, multipath, faulty), then something is seriously wrong
and we'd better refuse to run such an array.
Currently, this check is performed while the superblocks are examined,
i.e. before entering personality code. Therefore the generic md layer
must know which raid levels support bitmaps and which do not.
This patch avoids this layer violation without adding identical code
to various personalities. This is accomplished by introducing a new
public function to md.c, md_check_no_bitmap(), which replaces the
hard-coded checks in the superblock loading functions.
A call to md_check_no_bitmap() is added to the ->run method of each
personality which does not support bitmaps and assembly is aborted
if at least one component device contains a bitmap.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
It is easiest to round sizes to multiples of chunk size in
the personality code for those personalities which care.
Those personalities now do the rounding, so we can
remove that function from common code.
Also remove the upper bound on the size of a chunk, and the lower
bound on the size of a device (1 chunk), neither of which really buy
us anything.
Signed-off-by: NeilBrown <neilb@suse.de>
This is currently ensured by common code, but it is more reliable to
ensure it where it is needed in personality code.
All the other personalities that care already round the size to
the chunk_size. raid0 and linear are the only hold-outs.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently the assignment to utime gets skipped for 'external'
metadata. So move it to the top of the function so that it
always gets effected.
This is of largely cosmetic interest. Nothing actually depends
on ->utime being right for external arrays.
"mdadm --monitor" does use it for 0.90 and 1.x arrays, but with
mdadm-3.0, this is not important for external metadata.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).
Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
The difference between these two methods is artificial.
Both check that a pending reshape is valid, and perform any
aspect of it that can be done immediately.
'reconfig' handles chunk size and layout.
'check_reshape' handles raid_disks.
So make them just one method.
Signed-off-by: NeilBrown <neilb@suse.de>