Make sure net->ipvs is reset on netns cleanup or failed
initialization. It is needed for IPVS applications to know that
IPVS core is not loaded in netns.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Avoid crash when registering ip_vs_ftp after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
commit 14e405461e (2.6.39)
("Add __ip_vs_control_{init,cleanup}_sysctl()")
introduced regression due to wrong __net_init for
__ip_vs_control_cleanup_sysctl. This leads to crash when
the ip_vs module is unloaded.
Fix it by changing __net_init to __net_exit for
the function that is already renamed to ip_vs_control_net_cleanup_sysctl.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
in function nf_conntrack_init_net,when nf_conntrack_timeout_init falied,
we should call nf_conntrack_ecache_fini to do rollback.
but the current code calls nf_conntrack_timeout_fini.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It was reported that the Linux kernel sometimes logs:
klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392
ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.
The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.
The patch closes netfilter bugzilla id 771.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv6 conntrack marked invalid packets as INVALID and let the user
drop those by an explicit rule, while IPv4 conntrack dropped such
packets itself.
IPv4 conntrack is changed so that it marks INVALID packets and let
the user to drop them.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We may hit this in xt_LOG:
net/built-in.o:xt_LOG.c:function dump_ipv6_packet:
error: undefined reference to 'ip6t_ext_hdr'
happens with these config options:
CONFIG_NETFILTER_XT_TARGET_LOG=y
CONFIG_IP6_NF_IPTABLES=m
ip6t_ext_hdr is fairly small and it is called in the packet path.
Make it static inline.
Reported-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For a picked up connection, the window win is scaled twice: one is by the
initialization code, and the other is by the sender updating code.
I use the temporary variable swin instead of modifying the variable win.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit dc1f8bf68b ('netdev: change
transmit to limited range type') changed the required return type and
9a1654ba0b ('net: Optimize
hard_start_xmit() return checking') changed the valid numerical
return values.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 08baf56108 ('net:
txq_trans_update() helper') made it unnecessary for most drivers to
set net_device::trans_start (or netdev_queue::trans_start).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commits d314774cf2 ('netdev: network
device operations infrastructure') and
008298231a ('netdev: add more functions
to netdevice ops') moved and renamed net device operation pointers.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commits e308a5d806 ('netdev: Add
netdev->addr_list_lock protection.') and
e8a0464cc9 ('netdev: Allocate multiple
queues for TX.') introduced more fine-grained locks.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bea3348eef ('[NET]: Make NAPI
polling independent of struct net_device objects.') removed the
automatic disabling of NAPI polling by dev_close(), and drivers
must now do this themselves.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e52ac3398c ('net: Use device
model to get driver name in skb_gso_segment()') removed the only
in-tree caller of ethtool ops that doesn't hold the RTNL lock.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell has agreed to do maintenance on the sky2 driver.
* Add the developer to the maintainers file
* Remove the old reference to the long gone (sk98lin) driver
* Rearrange to fit current topic organization
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a slave comes up, we're unsetting the current_arp_slave without
removing active flags from it, which can lead to situations where we have
more than one slave with active flags in active-backup mode.
To avoid this situation we must remove the active flags from a slave before
removing it as a current_arp_slave.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2f53384424 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.
We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.
Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.
For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert array index from the loop bound to the loop index.
And remove the void type conversion to ip6_mc_del1_src() return
code, seem it is unnecessary, since ip6_mc_del1_src() does not
use __must_check similar attribute, no compiler will report the
warning when it is removed.
v2: enrich the commit header
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver uses a 2-order allocation, which is too much on architectures
like ppc64, which has a 64KiB page. This particular allocation is used
for large packet fragments that may have a size of 512, 1024, 4096 or
fill the whole allocation. So, a minimum size of 16384 is good enough
and will be the same size that is used in architectures of 4KiB sized
pages.
This will avoid allocation failures that we see when the system is under
stress, but still has plenty of memory, like the one below.
This will also allow us to set the interface MTU to higher values like
9000, which was not possible on ppc64 without this patch.
Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB
83137 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 10420096kB
Total swap = 10420096kB
107776 pages RAM
1184 pages reserved
147343 pages shared
28152 pages non-shared
netstat: page allocation failure. order:2, mode:0x4020
Call Trace:
[c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
[c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930
[c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170
[c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en]
[c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en]
[c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en]
[c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en]
[c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450
[c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290
[c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24
[c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110
[c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0
[c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Tested-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit (bfab27a stmmac: add the experimental PCI support) the
IFF_UNICAST_FLT flag has been removed from the stmmac_mac_device_setup()
function. This patch re-adds the flag.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch clears a warning message of "MDC/MDIO access timeout" which may
appear when interface is loaded due to missing clock setting before resetting
the LED, and starting periodic function too early.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a link problem on the second port of BCM57711 + BCM84823 boards due to
incorrect macro usage.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a link problem on BCM57712 + BCM8727 designs in which the TX
laser is controller by GPIO, after 1.60.xx drivers were previously loaded.
On these designs the TX_LASER is enabled by logic AND between the PHY
(through MDIO), and the GPIO. When an old driver is used, it disables the
MDIO part, hence the GPIO control had no affect de facto.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix no-LED problem when link speed is 1G on BCM57712 + BCM8727 designs, by
removing a logic error checking for a different PHY.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 578x0-SFI pre-emphasis settings per HW recommendations to achieve better
link strength.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BCM57810-KR link may not come up in 1G after running loopback test, so set
the relevant registers to their default values before starting KR autoneg.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 57810-KR flow-control handling link is achieved via CL37 AN.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a problem in which PFC frames are not honored, due to incorrect link
attributes synchronization following PMF migration, and verify PFC XON is not
stuck from previous link change.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modified from original patch from Chris.
The sky2 driver has to have 8 byte alignment of receive buffer
on some chip versions. On architectures which don't support efficient
unaligned access this doesn't work very well. The solution is to
just copy all received packets which is what the driver already
does for small packets.
This allows the driver to be used on the Tilera TILEmpower-Gx, since
the tile architecture doesn't currently handle kernel unaligned accesses,
just userspace.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current implemenation was buggy for slaves who use ndo_neigh_setup,
since the networking stack invokes the bonding device ndo entry (from
neigh_params_alloc) before any devices are enslaved, and the bonding
driver can't further delegate the call at that point in time. As a
result when bonding IPoIB devices, the neigh_cleanup hasn't been called.
Fix that by deferring the actual call into the slave ndo_neigh_setup
from the time the bonding neigh_setup is called.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 7d26bb103c "bonding: emit event when bonding changes MAC" didn't
take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding
actually changes the mac address (to all zeroes). As a result the neighbours
aren't deleted by the core networking code (which does so upon getting that
event).
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
an error if the user provides less bytes than the size of struct
sctp_event_subscribe.
Struct sctp_event_subscribe needs to be extended by an u8 for every
new event or notification type that is added.
This obviously makes getsockopt fail for binaries that are compiled
against an older versions of <net/sctp/user.h> which do not contain
all event types.
This patch changes getsockopt behaviour to no longer return an error
if not enough bytes are being provided by the user. Instead, it
returns as much of sctp_event_subscribe as fits into the provided buffer.
This leads to the new behavior that users see what they have been aware
of at compile time.
The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to decrement the conntrack counter if we fail to access the
zone extension.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error path misses putting the timeout object. This patch adds
new function xt_ct_tg_timeout_put() to put the timeout object.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI is disabled during suspend and needs to be enabled on resume. Without
this the driver locks up during resume in rtl_reset_work() trying to disable
NAPI again.
Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 621b4d6 updated the bnx2x driver to a new FW version, but lacked
a commit to a header file with changes to the firmware's interface.
The missing interface change causes iscsi and fcoe to misbehave with the
updated firmware.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes Auto Power Saving configuration in ip101a_config_init
which was broken as there is no phy register write followed after
setting IP101A_APS_ON flag.
This patch also fixes the return value of ip101a_config_init.
Without this patch ip101a_config_init returns 2 which is not an error
accroding to IS_ERR and the mac driver will continue accessing 2 as
valid pointer to phy_dev resulting in memory fault.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rare circumstances, a descriptor writeback flush may not work if it
arrives on a specific clock cycle as a writeback request is going out.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the adapter is closed while it is simultaneously going through a
reset, it can cause a null-pointer dereference when the two different code
paths simultaneously cleanup up the Tx/Rx resources.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix up code so that changes in DCB settings
are detected only when ixgbe_dcbnl_set_all is called.
Previously, a series of 'change' commands followed by
a call to ixgbe_dcbnl_set_all() would always be handled
as a HW change - even if the net change was zero.
This patch checks for this case of no actual change and
skips going through the HW set process.
Without this fix, the link could reset and result in
a link flap.
The core change in this patch is to check for changes
in the ixgbe_copy_dcb_cfg() routine - and return
a bitmask of detected changes. The other
places where changes were detected previously can be removed.
Signed-off-by: Eric Multanen <eric.w.multanen@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now the helper function from filter.c for negative offsets is exported,
it can be used it in the jit to handle negative offsets.
First modify the asm load helper functions to handle:
- know positive offsets
- know negative offsets
- any offset
then the compiler can be modified to explicitly use these helper
when appropriate.
This fixes the case of a negative X register and allows to lift
the restriction that bpf programs with negative offsets can't
be jited.
Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>