Commit Graph

2819 Commits

Author SHA1 Message Date
Pablo Neira Ayuso f5553c19ff netfilter: nf_tables: fix leaks in error path of nf_tables_newchain()
Release statistics and module refcount on memory allocation problems.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 18:42:08 +01:00
Julian Anastasov 579eb62ac3 ipvs: rerouting to local clients is not needed anymore
commit f5a41847ac ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-01-30 10:05:55 +09:00
Pablo Neira Ayuso e8781f70a5 netfilter: nf_tables: disable preemption when restoring chain counters
With CONFIG_DEBUG_PREEMPT=y

[22144.496057] BUG: using smp_processor_id() in preemptible [00000000] code: iptables-compat/10406
[22144.496061] caller is debug_smp_processor_id+0x17/0x1b
[22144.496065] CPU: 2 PID: 10406 Comm: iptables-compat Not tainted 3.19.0-rc4+ #
[...]
[22144.496092] Call Trace:
[22144.496098]  [<ffffffff8145b9fa>] dump_stack+0x4f/0x7b
[22144.496104]  [<ffffffff81244f52>] check_preemption_disabled+0xd6/0xe8
[22144.496110]  [<ffffffff81244f90>] debug_smp_processor_id+0x17/0x1b
[22144.496120]  [<ffffffffa07c557e>] nft_stats_alloc+0x94/0xc7 [nf_tables]
[22144.496130]  [<ffffffffa07c73d2>] nf_tables_newchain+0x471/0x6d8 [nf_tables]
[22144.496140]  [<ffffffffa07c5ef6>] ? nft_trans_alloc+0x18/0x34 [nf_tables]
[22144.496154]  [<ffffffffa063c8da>] nfnetlink_rcv_batch+0x2b4/0x457 [nfnetlink]

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-26 11:50:02 +01:00
Pablo Neira Ayuso 75e8d06d43 netfilter: nf_tables: validate hooks in NAT expressions
The user can crash the kernel if it uses any of the existing NAT
expressions from the wrong hook, so add some code to validate this
when loading the rule.

This patch introduces nft_chain_validate_hooks() which is based on
an existing function in the bridge version of the reject expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-19 14:52:39 +01:00
David S. Miller 2bd8221804 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains netfilter/ipvs fixes, they are:

1) Small fix for the FTP helper in IPVS, a diff variable may be left
   unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.

2) Fix nf_tables port NAT in little endian archs, patch from leroy
   christophe.

3) Fix race condition between conntrack confirmation and flush from
   userspace. This is the second reincarnation to resolve this problem.

4) Make sure inner messages in the batch come with the nfnetlink header.

5) Relax strict check from nfnetlink_bind() that may break old userspace
   applications using all 1s group mask.

6) Schedule removal of chains once no sets and rules refer to them in
   the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
   Toennesen.

Note that this batch comes later than usual because of the short
winter holidays.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 00:14:49 -05:00
Pablo Neira Ayuso a2f18db0c6 netfilter: nf_tables: fix flush ruleset chain dependencies
Jumping between chains doesn't mix well with flush ruleset. Rules
from a different chain and set elements may still refer to us.

[  353.373791] ------------[ cut here ]------------
[  353.373845] kernel BUG at net/netfilter/nf_tables_api.c:1159!
[  353.373896] invalid opcode: 0000 [#1] SMP
[  353.373942] Modules linked in: intel_powerclamp uas iwldvm iwlwifi
[  353.374017] CPU: 0 PID: 6445 Comm: 31c3.nft Not tainted 3.18.0 #98
[  353.374069] Hardware name: LENOVO 5129CTO/5129CTO, BIOS 6QET47WW (1.17 ) 07/14/2010
[...]
[  353.375018] Call Trace:
[  353.375046]  [<ffffffff81964c31>] ? nf_tables_commit+0x381/0x540
[  353.375101]  [<ffffffff81949118>] nfnetlink_rcv+0x3d8/0x4b0
[  353.375150]  [<ffffffff81943fc5>] netlink_unicast+0x105/0x1a0
[  353.375200]  [<ffffffff8194438e>] netlink_sendmsg+0x32e/0x790
[  353.375253]  [<ffffffff818f398e>] sock_sendmsg+0x8e/0xc0
[  353.375300]  [<ffffffff818f36b9>] ? move_addr_to_kernel.part.20+0x19/0x70
[  353.375357]  [<ffffffff818f44f9>] ? move_addr_to_kernel+0x19/0x30
[  353.375410]  [<ffffffff819016d2>] ? verify_iovec+0x42/0xd0
[  353.375459]  [<ffffffff818f3e10>] ___sys_sendmsg+0x3f0/0x400
[  353.375510]  [<ffffffff810615fa>] ? native_sched_clock+0x2a/0x90
[  353.375563]  [<ffffffff81176697>] ? acct_account_cputime+0x17/0x20
[  353.375616]  [<ffffffff8110dc78>] ? account_user_time+0x88/0xa0
[  353.375667]  [<ffffffff818f4bbd>] __sys_sendmsg+0x3d/0x80
[  353.375719]  [<ffffffff81b184f4>] ? int_check_syscall_exit_work+0x34/0x3d
[  353.375776]  [<ffffffff818f4c0d>] SyS_sendmsg+0xd/0x20
[  353.375823]  [<ffffffff81b1826d>] system_call_fastpath+0x16/0x1b

Release objects in this order: rules -> sets -> chains -> tables, to
make sure no references to chains are held anymore.

Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.biz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:48 +01:00
Pablo Neira Ayuso 62924af247 netfilter: nfnetlink: relax strict multicast group check from netlink_bind
Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:47 +01:00
Pablo Neira Ayuso 9ea2aa8b7d netfilter: nfnetlink: validate nfnetlink header from batch
Make sure there is enough room for the nfnetlink header in the
netlink messages that are part of the batch. There is a similar
check in netlink_rcv_skb().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:46 +01:00
Pablo Neira Ayuso 8ca3f5e974 netfilter: conntrack: fix race between confirmation and flush
Commit 5195c14c8b ("netfilter: conntrack: fix race in
__nf_conntrack_confirm against get_next_corpse") aimed to resolve the
race condition between the confirmation (packet path) and the flush
command (from control plane). However, it introduced a crash when
several packets race to add a new conntrack, which seems easier to
reproduce when nf_queue is in place.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list

This patch also changes the verdict from NF_ACCEPT to NF_DROP when
we lose race. Basically, the confirmation happens for the first packet
that we see in a flow. If you just invoked conntrack -F once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the past).
This should be hard to trigger, but better drop this packet, otherwise
we leave things in inconsistent state since the destination will likely
reply to this packet, but it will find no conntrack, unless the origin
retransmits.

The change of the verdict has been discussed in:
https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-06 22:27:45 +01:00
Johannes Berg 023e2cfa36 netlink/genetlink: pass network namespace to bind/unbind
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.

To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.

Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-27 03:07:50 -05:00
leroy christophe 7b5bca4676 netfilter: nf_tables: fix port natting in little endian archs
Make sure this fetches 16-bits port data from the register.
Remove casting to make sparse happy, not needed anymore.

Signed-off-by: leroy christophe <christophe.leroy@c-s.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-23 15:34:28 +01:00
Pablo Neira Ayuso 70314fc684 Merge tag 'ipvs2-for-v3.19' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next into ipvs-next
Simon Horman says:

====================
Second round of IPVS Updates for v3.19

please consider these IPVS updates for v3.19 or alternatively v3.20.

The single patch in this series fixes a long standing bug that
has not caused any trouble and thus is not being prioritised as a fix.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-18 20:54:26 +01:00
Linus Torvalds 70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Linus Torvalds cbfe0de303 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS changes from Al Viro:
 "First pile out of several (there _definitely_ will be more).  Stuff in
  this one:

   - unification of d_splice_alias()/d_materialize_unique()

   - iov_iter rewrite

   - killing a bunch of ->f_path.dentry users (and f_dentry macro).

     Getting that completed will make life much simpler for
     unionmount/overlayfs, since then we'll be able to limit the places
     sensitive to file _dentry_ to reasonably few.  Which allows to have
     file_inode(file) pointing to inode in a covered layer, with dentry
     pointing to (negative) dentry in union one.

     Still not complete, but much closer now.

   - crapectomy in lustre (dead code removal, mostly)

   - "let's make seq_printf return nothing" preparations

   - assorted cleanups and fixes

  There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  copy_from_iter_nocache()
  new helper: iov_iter_kvec()
  csum_and_copy_..._iter()
  iov_iter.c: handle ITER_KVEC directly
  iov_iter.c: convert copy_to_iter() to iterate_and_advance
  iov_iter.c: convert copy_from_iter() to iterate_and_advance
  iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
  iov_iter.c: convert iov_iter_zero() to iterate_and_advance
  iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
  iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
  iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
  iov_iter.c: iterate_and_advance
  iov_iter.c: macros for iterating over iov_iter
  kill f_dentry macro
  dcache: fix kmemcheck warning in switch_names
  new helper: audit_file()
  nfsd_vfs_write(): use file_inode()
  ncpfs: use file_inode()
  kill f_dentry uses
  lockd: get rid of ->f_path.dentry->d_sb
  ...
2014-12-10 16:10:49 -08:00
Dan Carpenter 3b05ac3824 ipvs: uninitialized data with IP_VS_IPV6
The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-12-10 17:36:47 +09:00
Hannes Frederic Sowa dbfc4fb7d5 dst: no need to take reference on DST_NOCACHE dsts
Since commit f886497212 ("ipv4: fix dst race in sk_dst_get()")
DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
reference on them when we are in rcu protected sections.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:08:17 -05:00
Al Viro ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
David S. Miller 244ebd9f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains netfilter updates for net-next. Basically,
enhancements for xt_recent, skip zeroing of timer in conntrack, fix
linking problem with recent redirect support for nf_tables, ipset
updates and a couple of cleanups. More specifically, they are:

1) Rise maximum number per IP address to be remembered in xt_recent
   while retaining backward compatibility, from Florian Westphal.

2) Skip zeroing timer area in nf_conn objects, also from Florian.

3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using
   using meta l4proto and transport layer header, from Alvaro Neira.

4) Fix linking problems in the new redirect support when CONFIG_IPV6=n
   and IP6_NF_IPTABLES=n.

And ipset updates from Jozsef Kadlecsik:

5) Support updating element extensions when the set is full (fixes
   netfilter bugzilla id 880).

6) Fix set match with 32-bits userspace / 64-bits kernel.

7) Indicate explicitly when /0 networks are supported in ipset.

8) Simplify cidr handling for hash:*net* types.

9) Allocate the proper size of memory when /0 networks are supported.

10) Explicitly add padding elements to hash:net,net and hash:net,port,
    because the elements must be u32 sized for the used hash function.

Jozsef is also cooking ipset RCU conversion which should land soon if
they reach the merge window in time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05 20:56:46 -08:00
Jozsef Kadlecsik cac3763967 netfilter: ipset: Explicitly add padding elements to hash:net, net and hash:net, port, net
The elements must be u32 sized for the used hash function.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 77b4311d20 netfilter: ipset: Allocate the proper size of memory when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 25a76f3463 netfilter: ipset: Simplify cidr handling for hash:*net* types
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik 59de79cf57 netfilter: ipset: Indicate when /0 networks are supported
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:36 +01:00
Jozsef Kadlecsik a51b9199b1 netfilter: ipset: Alignment problem between 64bit kernel 32bit userspace
Sven-Haegar Koch reported the issue:

sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.

In syslog:
x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32

which was introduced by the counter extension in ipset.

The patch fixes the alignment issue with introducing a new set match
revision with the fixed underlying 'struct ip_set_counter_match'
structure.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:35 +01:00
Jozsef Kadlecsik 86ac79c7be netfilter: ipset: Support updating extensions when the set is full
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation.

Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03 12:43:34 +01:00
David S. Miller 60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Pablo Neira Ayuso b59eaf9e28 netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module
This resolves linking problems with CONFIG_IPV6=n:

net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'

Reported-by: Andreas Ruprecht <rupran@einserver.de>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 13:08:42 +01:00
Florian Westphal c41884ce05 netfilter: conntrack: avoid zeroing timer
add a __nfct_init_offset annotation member to struct nf_conn to make
it clear which members are covered by the memset when the conntrack
is allocated.

This avoids zeroing timer_list and ct_net; both are already inited
explicitly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:41:06 +01:00
Florian Westphal abc86d0f99 netfilter: xt_recent: relax ip_pkt_list_tot restrictions
The maximum value for the hitcount parameter is given by
"ip_pkt_list_tot" parameter (default: 20).

Exceeding this value on the command line will cause the rule to be
rejected.  The parameter is also readonly, i.e. it cannot be changed
without module unload or reboot.

Store size per table, then base nstamps[] size on the hitcount instead.

The module parameter is retained for backwards compatibility.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27 12:40:31 +01:00
Pablo Neira 43612d7c04 Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
This reverts commit 5195c14c8b.

If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.

Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-25 14:14:51 -05:00
David S. Miller 958d03b016 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/ipvs updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:

1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
   patches from Arturo Borrero.

2) Introduce the nf_tables redir expression, from Arturo Borrero.

3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
   Patch from Alex Gartrell.

4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
   from Marcelo Leitner.

5) Add some extra validation when registering logger families, also
   from Marcelo.

6) Some spelling cleanups from stephen hemminger.

7) Fix sparse warning in nf_logger_find_get().

8) Add cgroup support to nf_tables meta, patch from Ana Rey.

9) A Kconfig fix for the new redir expression and fix sparse warnings in
   the new redir expression.

10) Fix several sparse warnings in the netfilter tree, from
    Florian Westphal.

11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
    nothing when this situation occurs.

12) Add conntrack zone support to xt_connlimit, again from Florian.

13) Add netnamespace support to the h323 conntrack helper, contributed
    by Vasily Averin.

14) Remove unnecessary nul-pointer checks before free_percpu() and
    module_put(), from Markus Elfring.

15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24 16:00:58 -05:00
David S. Miller 1459143386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 22:28:24 -05:00
Marcelo Leitner beacd3e8ef netfilter: nfnetlink_log: Make use of pr_fmt where applicable
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-20 14:09:01 +01:00
Markus Elfring 982f405136 netfilter: Deletion of unnecessary checks before two function calls
The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-20 13:08:43 +01:00
Vasily Averin 2c7b5d5dac netfilter: nf_conntrack_h323: lookup route from proper net namespace
Signed-off-by: Vasily Averin <vvs@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:47:14 +01:00
Florian Westphal e59ea3df3f netfilter: xt_connlimit: honor conntrack zone if available
Currently all the conntrack lookups are done using default zone.
In case the skb has a ct attached (e.g. template) we should use this zone
for lookups instead.  This makes connlimit work with connections assigned
to other zones.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:44:20 +01:00
Pablo Neira Ayuso 97840cb67f netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.

[  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[  180.204249] index 9 is out of range for type 'int [9]'
[  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[  180.208639] Call Trace:
[  180.208857] dump_stack (lib/dump_stack.c:52)
[  180.209370] ubsan_epilogue (lib/ubsan.c:174)
[  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[  180.211495] SYSC_bind (net/socket.c:1541)

Moreover, define the missing nf_tables and nf_acct multicast groups too.

Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-17 12:01:13 +01:00
bill bonaparte 5195c14c8b netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse
After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().

The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit.  In case
race occured, re-add the CT to the dying list.

While at this, fix coding style of the comment that has been
updated.

Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-14 17:43:05 +01:00
Thomas Graf 6eba82248e rhashtable: Drop gfp_flags arg in insert/remove functions
Reallocation is only required for shrinking and expanding and both rely
on a mutex for synchronization and callers of rhashtable_init() are in
non atomic context. Therefore, no reason to continue passing allocation
hints through the API.

Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
for silent fall back to vzalloc() without the OOM killer jumping in as
pointed out by Eric Dumazet and Eric W. Biederman.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:18:40 -05:00
Herbert Xu 7b4ce23534 rhashtable: Add parent argument to mutex_is_held
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments.  This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.

This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Herbert Xu 1f501d6252 netfilter: Move mutex_is_held under PROVE_LOCKING
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled.  This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:13:05 -05:00
Pablo Neira Ayuso 8225161545 netfilter: nfnetlink_log: remove unnecessary error messages
In case of OOM, there's nothing userspace can do.

If there's no room to put the payload in __build_packet_message(),
jump to nla_put_failure which already performs the corresponding
error reporting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-13 13:13:00 +01:00
Florian Westphal 5676864431 netfilter: fix various sparse warnings
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
  no; add include
net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
  yes
net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
  no; add include
net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
  yes
net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
  no; add include
net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
  add __force, 3 is what we want.
net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
  yes
net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
  no; add include

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-13 12:14:42 +01:00
Pablo Neira Ayuso b326dd37b9 netfilter: nf_tables: restore synchronous object release from commit/abort
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.

Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.

So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.

This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso afefb6f928 netfilter: nft_compat: use the match->table to validate dependencies
Instead of the match->name, which is of course not relevant.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso c918687f5e netfilter: nft_compat: relax chain type validation
Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.

This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso 2daf1b4d18 netfilter: nft_compat: use current net namespace
Instead of init_net when using xtables over nftables compat.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:06:24 +01:00
Pablo Neira Ayuso baf4750d92 netfilter: nft_redir: fix sparse warnings
>> net/netfilter/nft_redir.c:39:26: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_redir.c:39:26:    expected unsigned int [unsigned] [usertype] nla_be32
   net/netfilter/nft_redir.c:39:26:    got restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:40:40: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:46:34: sparse: incorrect type in assignment (different base types)
   net/netfilter/nft_redir.c:46:34:    expected unsigned int [unsigned] [usertype] nla_be32
   net/netfilter/nft_redir.c:46:34:    got restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32
>> net/netfilter/nft_redir.c:47:48: sparse: cast to restricted __be32

Fixes: e9105f1 ("netfilter: nf_tables: add new expression nft_redir")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 12:00:04 +01:00
Pablo Neira Ayuso f6c6339d5e netfilter: fix unmet dependencies in NETFILTER_XT_TARGET_REDIRECT
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV4 which has unmet direct dependencies (NET && INET && NETFILTER && NF_NAT_IPV4)

warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_NAT_IPV6)

Fixes: 8b13edd ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables")
Fixes: 9de920e ("netfilter: refactor NAT redirect IPv6 code to use it from nf_tables")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-12 11:54:12 +01:00
Calvin Owens 50656d9df6 ipvs: Keep skb->sk when allocating headroom on tunnel xmit
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.

This patch properly copies ->sk and increments its reference count.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-11-12 11:03:04 +09:00
Dan Carpenter 2196937e12 netfilter: ipset: small potential read beyond the end of buffer
We could be reading 8 bytes into a 4 byte buffer here.  It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-11 13:46:37 +01:00