Impact: Fix debugobjects warning
debugobject enabled kernels spit out a warning in hpet code due to a
workqueue which is initialized on stack.
Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
in hpet.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: remove the old CONFIG_RCU_CPU_STALL_DETECTOR
tree_rcu introduce CONFIG_RCU_CPU_STALL_DETECTOR again.
These two are the same exactly except:
the old one "depends on CLASSIC_RCU"
the new one "depends on CLASSIC_RCU || TREE_RCU"
This patch remove the old one.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Fixes crashes with misconfigured BIOSes on XSAVE hardware
Avuton Olrich reported early boot crashes with v2.6.28 and
bisected it down to dc1e35c6e9
("x86, xsave: enable xsave/xrstor on cpus with xsave support").
If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to
make all CPUID information available. This is required for some
features to work, in particular XSAVE.
Reported-and-bisected-by: Avuton Olrich <avuton@gmail.com>
Tested-by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Create a separate mode_config IDR lock for simplicity. The core DRM
config structures (connector, mode, etc. lists) are still protected by
the mode_config mutex, but the CRTC IDR (used for the various identifier
IDs) is now protected by the mode_config idr_mutex. Simplifies the
locking a bit and removes a warning.
All objects are protected by the config mutex, we may in the future,
split the object further to have reference counts.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[XFS] Long btree pointers are still 64 bit on disk
On 32 bit machines with CONFIG_LBD=n, XFS reduces the
in memory size of xfs_fsblock_t to 32 bits so that it
will fit within 32 bit addressing. However, the disk format
for long btree pointers are still 64 bits in size.
The recent btree rewrite failed to take this into account
when initialising new btree blocks, setting sibling pointers
to NULL and checking if they are NULL. Hence checking whether
a 64 bit NULL was the same as a 32 bit NULL was failingi
resulting in NULL sibling pointers failing to be detected
correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns
in xfs_btree_delrec.
Fix this by making all the comparisons and setting of long
pointer btree NULL blocks to the disk format, not the
in memory format. i.e. use NULLDFSBNO.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Reported-by: Jacek Luczak <difrost.kernel@gmail.com>
Reported-by: Danny ter Haar <dth@dth.net>
Tested-by: Jacek Luczak <difrost.kernel@gmail.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
This greatly simplifies testing to verify I have fixed the problems
with a tun device disappearing when the tun file descriptor is still
held open.
Further it allows removal network namespace operations for the tun
driver. Reducing the network namespace handling in the driver to the
minimum. i.e. When we are creating a tun device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the awkward case between free_netdev and dev_chr_close fixed
there is no longer any need to limit tun and tap devices to the
network namespace they were created in. So remove the
NETIF_F_NETNS_LOCAL flag on the network device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tun code does not cope gracefully if the network device goes away before
the tun file descriptor is closed. It looks like we can trigger this with
rmmod, and moving tun devices between network namespaces will allow this
to be triggered when network namespaces exit.
To fix this I introduce an intermediate data structure tun_file which
holds a count of users and a pointer to the struct tun_struct. tun_get
increments that reference count if it is greater than 0. tun_put decrements
that reference count and detaches from the network device if the count is 0.
While we have a file attached to the network device I hold a reference
to the network device keeping it from going away completely.
When a network device is unregistered I decrement the count of the
attached tun_file and if that was the last user I detach the tun_file,
and all processes on read_wait are woken up to ensure they do not
sleep indefinitely. As some of those sleeps happen with the count on
the tun device elevated waking up the read waiters ensures that
tun_file will be detached in a timely manner.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The poll interface requires that the waitqueue exist while the struct
file is open. In the rare case when a tun device disappears before
the tun file closes we fail to provide this property, so move
read_wait.
This is safe now that tun_net_xmit is atomic with tun_detach.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently this small race allows for a packet to be received when we
detach from an tun device and still be enqueued. Not especially
important but not what the code is trying to do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grabbing namespaces in open, and putting them in close always seems to
be the cleanest approach with the fewest surprises.
So now that we have tun_file so we have somepleace to put the network
namespace, let's grab the network namespace on file open and put on
file close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun code suffers from only having a single word of
data that exists for the entire life of the tun file descriptor.
This results in peculiar holding of references to the network namespace
as well as races between free_netdevice and tun_chr_close.
Fix this by introducing tun_file which will hold the per file state.
For the moment it still holds just a single word so the differences
are all logic changes with no changes in semantics.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EBADF is meaningless in the context of a poll mask so use POLLERR
instead.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for two different tasks with access to the same file
descriptor to call tun_set_iff on it at the same time and race to
attach to a tap device. Prevent this by placing all of the logic to
attach to a file descriptor in one function and testing the file
descriptor to be certain it is not already attached to another tun
device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun driver keeps a private list of tun devices for what
appears to be a small gain in performance when reconnecting a file
descriptor to an existing tun or tap device. So simplify the code by
removing it.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In PPPo[E|L2TP] we could explicitly point which net namespace
we're going to use for channels - make it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Each namespace contains ppp channels and units separately
with appropriate locks
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Each tunnel and appropriate lock are inside own namespace now.
- pppox code allows to create per-namespace sockets for
both PX_PROTO_OE and PX_PROTO_OL2TP protocols. Actually since
now pppox_create support net-namespaces new PPPo... protocols
(if they ever will be) should support net-namespace too otherwise
explicit check for &init_net would be needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- each net-namespace for pppoe module is having own
hash table and appropriate locks wich are allocated
at time of namespace intialization. It requires about
140 bytes of memory for every new namespace but such
approach allow us to escape from hash chains growing
and additional lock contends (especially in SMP environment).
- pppox code allows to create per-namespace sockets for
PX_PROTO_OE protocol only (since at this moment support
for pppol2tp net-namespace is not implemented yet).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: None (new bit definitions currently unused)
Add bit definitions for the MSR_IA32_MISC_ENABLE MSRs to
<asm/msr-index.h>.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reworked receive queue fill policies to make the driver more tolerant
in low memory conditions.
Signed-off-by: Thomas Klein <tklein@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PAGE_SIZE allocations via slab are not guaranteed to be page-aligned. Fixed
all memory allocations where page alignment is required by firmware.
Signed-off-by: Thomas Klein <tklein@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adapt to lately introduced net_device_ops structure.
Signed-off-by: Thomas Klein <tklein@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LLTX is deprecated, don't use it. This completes the removal of LLTX from
the Intel Network drivers.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It appears that a step was missed in the initialization of 82576 fiber nics
that resulted in it not powering on the optics.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igb has flags enabling lltx but this is a holdover from the earlier
e1000 driver which the igb driver was based off of.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the locking is OK as the locks were being taken before the
various phy setup functions, add the annotations as they release and
reacquire the phy_lock.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes cxgb3 invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
I've kept the ioctl controls for per-queue LRO switches. However,
we should not encourage anyone to use these.
Because of that, I've also kept the skb construction code in
cxgb3. Hopefully we can phase out those per-queue switches
and then kill this too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the device on receive path and allow otherwise identical devices
as long as the physical device differs.
This is useful for NBMA tunnels, where you want to use different gre IP
for each public IP available via different physical devices.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
both pdata->mdc and pdata->mdio are unsigned. Notice a negative
return value.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the host to inform us that the link is down by adding
a VIRTIO_NET_F_STATUS which indicates that device status is
available in virtio_net config.
This is currently useful for simulating link down conditions
(e.g. using proposed qemu 'set_link' monitor command) but
would also be needed if we were to support device assignment
via virtio.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added future masking)
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the gelic wireless driver to net_device_ops
Signed-off-by: Masakazu Mokuno <mokuno@sm.sony.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.
So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.
Attached picture shows bind() time depending on number of already bound
sockets.
Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.
At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.
Blue area corresponds to the port selection optimization.
This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes igb invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
Three things of note:
1) I've kept the LRO Kconfig option until we decide to enable
GRO across the board at which point it can also be killed.
2) The poll_controller stuff is broken in igb as it tries to do
the same work as the normal poll routine. Since poll_controller
can be called in the middle of a poll, this can't be good.
I noticed this because poll_controller can invoke the GRO hooks
without flushing held GRO packets.
However, this should be harmless (assuming the poll_controller
bug above doesn't kill you first :) since the next ->poll will
clear the backlog. The only time when we'll have a problem is
if we're already executing the GRO code on the same ring, but
that's no worse than what happens now.
3) I kept the ip_summed check before calling GRO so that we're
on par with previous behaviour.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The base versions handle constant folding just fine, use them
directly.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes sfc invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
Everything should appear identical to the user except that the
offload is now controlled by the GRO ethtool option instead of
LRO. I've kept the lro module parameter as is since that's for
compatibility only.
I have eliminated efx_rx_mk_skb as the GRO layer can take care
of all packets regardless of whether GRO is enabled or not.
So the only case where we don't call GRO is if the packet checksum
is absent. This is to keep the behaviour changes of the patch to
a minimum.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>