Commit Graph

563631 Commits

Author SHA1 Message Date
One Thousand Gnomes 60aa3b080a 6pack: fix free memory scribbles
commit acf673a318 fixed a user triggerable free
memory scribble but in doing so replaced it with a different one that allows
the user to control the data and scribble even more.

sixpack_close is called by the tty layer in tty context. The tty context is
protected by sp_get() and sp_put(). However network layer activity via
sp_xmit() is not protected this way. We must therefore stop the queue
otherwise the user gets to dump a buffer mostly of their choice into freed
kernel pages.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 01:25:01 -05:00
Ido Schimmel f0138e2596 mlxsw: pci: Adjust value of CPU egress traffic class
During initialization, when creating the send descriptor queues (SDQs),
we specify the CPU egress traffic class of each SDQ. The maximum number
of classes of this type is different in the two ASICs supported by this
PCI driver.

New firmware versions check this value is set correctly, which causes
errors on the Spectrum ASIC, as its max exposed egress traffic class is
lower than 7.

Solve this by setting this field to 3, which is an acceptable value for
both ASICs.

Note that we currently do not expose the QoS capabilities of the ASICs,
so setting this to an hardcoded value is OK for now.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:48:50 -05:00
Rabin Vincent 55795ef546 net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.  This was found using american fuzzy
lop.

Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs.  Except for ARM, the
rest have only been compile-tested.

Fixes: 3480593131 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:43:52 -05:00
David S. Miller 56b87180cc brcfmac
* fix IBSS which got broken over time
 * new USB id for bcm43242 dongle
 * arp offload configuration through inet notifier
 
 ath9k
 
 * add random number generator support (CONFIG_ATH9K_HWRNG)
 
 iwlwifi
 
 * Make scan parameters low latency aware
 * Fix in the NL80211_FEATURE_FULL_AP_CLIENT_STATE state case
 * Fix enable injection mode (Chaya Rachel)
 * Various cleanups (Dan / Julia / myself)
 * Allow to stay more time on popular channels (David Spinadel)
 * Bug fixes for D0i3 (Eliad / Luca)
 * Fixes for GO uAPSD (myself)
 * Start of TSO support (myself)
 * Rate control bug fixes (Eyal / Gregory)
 * Start the work on 9000 devices (Johannes / Sara / Oren)
 * Start the work on a new Tx queue allocation model (Liad)
 * Debug infrastructure enhancements (Golan)
 
 mwifiex
 
 * add a debugfs file for chip reset
 * advertise SMS4 cipher suite
 * increase ap and station interface limit to 3
 * enable MSI support on newer pcie devices (8897 onwards)
 
 rtlwifi
 
 * fix lots of module parameter usage
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJWi5RxAAoJEG4XJFUm622bujQH/Rti4fRBnJcRt5EZAv/pvdIx
 AGPcYQQfFbFRjzxjrcY30AYCC6OoID9n7BNtELIWabA6dQHWGJd2aebZGH85O4Eh
 xgAty5CRf9E4+9MNiv+OqkL64/E6/YmSeOYiNSWl6SgRE4VYmexLXttpppImmoER
 zowmtqzLSN/4l4yX5p2BVy3NAXN0R1WjYKNUXAJvJj59814upXf+XrOKCghzHIs4
 2NE/aFoscscKWCSS3+MIIW4q9QWw7f8YvAdrO6k47LLkjm4+C/I5BeAlHUOEQUS7
 xyoB/7KoHXlzRen8LmoL9SsqopTL2tKEjE/1f0QgLKOHXqpPRb4iKBFjsR94HJg=
 =aeCM
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-for-davem-2016-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
brcfmac

* fix IBSS which got broken over time
* new USB id for bcm43242 dongle
* arp offload configuration through inet notifier

ath9k

* add random number generator support (CONFIG_ATH9K_HWRNG)

iwlwifi

* Make scan parameters low latency aware
* Fix in the NL80211_FEATURE_FULL_AP_CLIENT_STATE state case
* Fix enable injection mode (Chaya Rachel)
* Various cleanups (Dan / Julia / myself)
* Allow to stay more time on popular channels (David Spinadel)
* Bug fixes for D0i3 (Eliad / Luca)
* Fixes for GO uAPSD (myself)
* Start of TSO support (myself)
* Rate control bug fixes (Eyal / Gregory)
* Start the work on 9000 devices (Johannes / Sara / Oren)
* Start the work on a new Tx queue allocation model (Liad)
* Debug infrastructure enhancements (Golan)

mwifiex

* add a debugfs file for chip reset
* advertise SMS4 cipher suite
* increase ap and station interface limit to 3
* enable MSI support on newer pcie devices (8897 onwards)

rtlwifi

* fix lots of module parameter usage
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:05:04 -05:00
Arnd Bergmann be78a690f5 net: hns: avoid uninitialized variable warning:
gcc fails to see that the use of the 'last_offset' variable
in hns_nic_reuse_page() is used correctly and issues a bogus
warning:

drivers/net/ethernet/hisilicon/hns/hns_enet.c: In function 'hns_nic_reuse_page':
drivers/net/ethernet/hisilicon/hns/hns_enet.c:541:6: warning: 'last_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]

This simplifies the function to make it more obvious what is
going on to both readers and compilers, which makes the warning
go away.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:00:11 -05:00
Florian Westphal a72a5e2d34 inet: kill unused skb_free op
The only user was removed in commit
029f7f3b87 ("netfilter: ipv6: nf_defrag: avoid/free clone operations").

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 22:25:57 -05:00
Hannes Frederic Sowa ff62198553 bridge: Only call /sbin/bridge-stp for the initial network namespace
[I stole this patch from Eric Biederman. He wrote:]

> There is no defined mechanism to pass network namespace information
> into /sbin/bridge-stp therefore don't even try to invoke it except
> for bridge devices in the initial network namespace.
>
> It is possible for unprivileged users to cause /sbin/bridge-stp to be
> invoked for any network device name which if /sbin/bridge-stp does not
> guard against unreasonable arguments or being invoked twice on the
> same network device could cause problems.

[Hannes: changed patch using netns_eq]

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 16:46:17 -05:00
xypron.glpk@gmx.de 2fbf575867 include/uapi/linux/sockios.h: mark SIOCRTMSG unused
IOCTL SIOCRTMSG does nothing but return EINVAL.

So comment it as unused.

SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h

inet_ioctl calls ip_rt_ioctl.
ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 16:44:06 -05:00
Linus Torvalds ee9a7d2cb0 Two more fixes.
1. The recordmcount change had an output that used sprintf() (incorrectly)
     when it should have been a fprintf() to stderr.
 
  2. The printk_formats file could crash if someone added a trace_printk()
     in the core kernel, and also added one in a module. This does not
     affect production kernels. Only kernels where developers add trace_printk()
     for debugging can crash.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWi+jEAAoJEKKk/i67LK/8at0IAIMkbBMbRVjCM0tWTMM/2rZr
 nD+2UOVZHDI7rSdhOC0yfRL041uiu/wI4DV6FAJZX8D5BumS7Wwv/GItwwNU3+TD
 ZI6OG9f/6OxoC1jFUY8CvpSqAeV6uoro4heSzjprirSUsGwrFlTuHMt2NyEl0FvO
 985HIzfTcb3yVFvsjm7Uyv1SsOdPL+BldDc46mgo8fXv3VYvvbqTP5NMkx7YyMdm
 Dlo90b1nQ8bk3bjG4RvYmlnfK+HfbB2TD+rz3xJ+YaFRoJIov0/BzimeZaI3Aw/R
 9TjLqwBN8ASVxc3A+/AQdUEserzXl7RSJHT/92YIQc8FkaS50cXX80Xk7ez0JVk=
 =nb9G
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Two more fixes:

  1. The recordmcount change had an output that used sprintf()
     (incorrectly) when it should have been a fprintf() to stderr.

  2. The printk_formats file could crash if someone added a
     trace_printk() in the core kernel, and also added one in a module.
     This does not affect production kernels.  Only kernels where
     developers add trace_printk() for debugging can crash"

* tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix setting of start_index in find_next()
  ftrace/scripts: Fix incorrect use of sprintf in recordmcount
2016-01-05 13:32:39 -08:00
Linus Torvalds 3331f99a6f Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile bugfix from Chris Metcalf:
 "This fixes a bug that Sudip's buildbot found for tilepro allmodconfig.

  I've tagged it for stable only back to 3.19, which was when most of
  the other affected architectures added their support for working
  around this issue"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro
2016-01-05 13:21:19 -08:00
David S. Miller 1633bf118b Merge branch 'mlx5e-tstamp'
Saeed Mahameed says:

====================
Introduce mlx5 ethernet timestamping

This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.

Changes from V2:
net/mlx5_core: Introduce access function to read internal_timer
	- Remove one line function
	- Change function name

net/mlx5e: Add HW timestamping (TS) support:
	- Data path performance optimization (caching tstamp struct in rq,sq)
	- Change read/write_lock_irqsave to read/write_lock
	- Move ioctl functions to en_clock file
	- Changed overflow start algorithm according to comments from Richard
	- Move timestamp init/cleanup to open/close ndos.

In details:

1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.

2nd patch adds the needed low level helpers for:
	- Fetching the hardware clock (hardware internal timer)
	- Parsing CQEs timestamps
	- Device frequency capability

3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
	- Internal clock structure initialization and other helper functions
	- Added the needed ioctl for setting/getting the current timestamping
	  configuration.
	- used this configuration in RX/TX data path to fill the SKB with
	  the timestamp.

4th patch Introduces PTP (PHC) support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 14:11:51 -05:00
Eran Ben Elisha 3d8c38af14 net/mlx5e: Add PTP Hardware Clock (PHC) support
Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 14:11:50 -05:00
Eran Ben Elisha ef9814deaf net/mlx5e: Add HW timestamping (TS) support
Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.

Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 14:11:50 -05:00
Eran Ben Elisha b084444459 net/mlx5_core: Introduce access function to read internal timer
A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 14:11:50 -05:00
Achiad Shochat 34802a42b3 net/mlx5e: Do not modify the TX SKB
If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 14:11:50 -05:00
David S. Miller 33c152972e Merge branch 'sctp-transport-rhashtable'
Xin Long says:

====================
sctp: use transport hashtable to replace association's with rhashtable

for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.

when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:

in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
       and use one of them to send data to server]

===== test on net-next
-- perf top
server:
  55.73%  [kernel]             [k] sctp_assoc_is_match
   6.80%  [kernel]             [k] sctp_assoc_lookup_paddr
   4.81%  [kernel]             [k] sctp_v4_cmp_addr
   3.12%  [kernel]             [k] _raw_spin_unlock_irqrestore
   1.94%  [kernel]             [k] sctp_cmp_addr_exact

client:
  46.01%  [kernel]                    [k] sctp_endpoint_lookup_assoc
   5.55%  libc-2.17.so                [.] __libc_calloc
   5.39%  libc-2.17.so                [.] _int_free
   3.92%  libc-2.17.so                [.] _int_malloc
   3.23%  [kernel]                    [k] __memset

-- spent time
time is 487s, send pkt is 10000000

we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.

besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.

===== test with this patchset:
-- perf top
server:
  15.98%  [kernel]                 [k] _raw_spin_unlock_irqrestore
   9.92%  [kernel]                 [k] __pv_queued_spin_lock_slowpath
   7.22%  [kernel]                 [k] copy_user_generic_string
   2.38%  libpthread-2.17.so       [.] __recvmsg_nocancel
   1.88%  [kernel]                 [k] sctp_recvmsg

client:
  11.90%  [kernel]                   [k] sctp_hash_cmp
   8.52%  [kernel]                   [k] rht_deferred_worker
   4.94%  [kernel]                   [k] __pv_queued_spin_lock_slowpath
   3.95%  [kernel]                   [k] sctp_bind_addr_match
   2.49%  [kernel]                   [k] __memset

-- spent time
time is 22s, send pkt is 10000000
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 12:24:06 -05:00
Xin Long c79c066691 sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc
sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 12:24:02 -05:00
Xin Long b5eff71283 sctp: drop the old assoc hashtable of sctp
transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 12:24:01 -05:00
Xin Long 39f66a7dce sctp: apply rhashtable api to sctp procfs
Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 12:24:01 -05:00
Xin Long 4f00878126 sctp: apply rhashtable api to send/recv path
apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 12:24:01 -05:00
Xin Long d6c0256a60 sctp: add the rhashtable apis for sctp global transport hashtable
tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-05 12:24:00 -05:00
Johan Hedberg 78b781ca0d Bluetooth: Add support for Start Limited Discovery command
This patch implements the mgmt Start Limited Discovery command. Most
of existing Start Discovery code is reused since the only difference
is the presence of a 'limited' flag as part of the discovery state.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-01-05 17:02:50 +01:00
Johan Hedberg 0d3b7f64c8 Bluetooth: Change eir_has_data_type() to more generic eir_get_data()
To make the EIR parsing helper more general purpose, make it return
the found data and its length rather than just saying whether the data
was present or not.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-01-05 17:02:49 +01:00
Chris Metcalf c1b27ab5d6 tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro
This allows the build system to know that it can't attempt to
configure the Lustre virtual block device, for example, when tilepro
is using 64KB pages (as it does by default).  The tilegx build
already provided those symbols.

Previously we required that the tilepro hypervisor be rebuilt with
a different hardcoded page size in its headers, and then Linux be
rebuilt using the updated hypervisor header.  Now we allow each of
the hypervisor and Linux to be built independently.  We still check
at boot time to ensure that the page size provided by the hypervisor
matches what Linux expects.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: stable@vger.kernel.org [3.19+]
2016-01-05 08:16:09 -05:00
Rainer Weikusat c845acb324 af_unix: Fix splice-bind deadlock
On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
system call and AF_UNIX sockets,

http://lists.openwall.net/netdev/2015/11/06/24

The situation was analyzed as

(a while ago) A: socketpair()
B: splice() from a pipe to /mnt/regular_file
	does sb_start_write() on /mnt
C: try to freeze /mnt
	wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
	lock our socket, see it not bound yet
	decide that it needs to create something in /mnt
	try to do sb_start_write() on /mnt, block (it's
	waiting for C).
D: splice() from the same pipe to our socket
	lock the pipe, see that socket is connected
	try to lock the socket, block waiting for A
B:	get around to actually feeding a chunk from
	pipe to file, try to lock the pipe.  Deadlock.

on 2015/11/10 by Al Viro,

http://lists.openwall.net/netdev/2015/11/10/4

The patch fixes this by removing the kern_path_create related code from
unix_mknod and executing it as part of unix_bind prior acquiring the
readlock of the socket in question. This means that A (as used above)
will sb_start_write on /mnt before it acquires the readlock, hence, it
won't indirectly block B which first did a sb_start_write and then
waited for a thread trying to acquire the readlock. Consequently, A
being blocked by C waiting for B won't cause a deadlock anymore
(effectively, both A and B acquire two locks in opposite order in the
situation described above).

Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 23:22:49 -05:00
David Ahern b5bdacf3bb net: Propagate lookup failure in l3mdev_get_saddr to caller
Commands run in a vrf context are not failing as expected on a route lookup:
    root@kenny:~# ip ro ls table vrf-red
    unreachable default

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    ping: Warning: source address might be selected on device other than vrf-red.
    PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.

    --- 10.100.1.254 ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 999ms

Since the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:

    root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
    connect: No route to host

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:58:30 -05:00
David S. Miller 6a5ef90c58 Merge branch 'faster-soreuseport'
Craig Gallek says:

====================
Faster SO_REUSEPORT

This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.

This series only includes the UDP path.  I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.

Changes in v4:
- pskb_may_pull is unnecessary with pskb_pull (per Alexei Starovoitov)

Changes in v3:
- skb_pull_inline -> pskb_pull (per Alexei Starovoitov)
- reuseport_attach* -> sk_reuseport_attach* and simple return statement
  syntax change (per Daniel Borkmann)

Changes in v2:
- Fix ARM build; remove unnecessary include.
- Handle case where protocol header is not in linear section (per
  Alexei Starovoitov).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:59 -05:00
Craig Gallek 3ca8e40299 soreuseport: BPF selection functional test
This program will build classic and extended BPF programs and
validate the socket selection logic when used with
SO_ATTACH_REUSEPORT_CBPF and SO_ATTACH_REUSEPORT_EBPF.

It also validates the re-programing flow and several edge cases.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:59 -05:00
Craig Gallek 538950a1b7 soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:59 -05:00
Craig Gallek e32ea7e747 soreuseport: fast reuseport UDP socket selection
Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:58 -05:00
Craig Gallek ef456144da soreuseport: define reuseport groups
struct sock_reuseport is an optional shared structure referenced by each
socket belonging to a reuseport group.  When a socket is bound to an
address/port not yet in use and the reuseport flag has been set, the
structure will be allocated and attached to the newly bound socket.
When subsequent calls to bind are made for the same address/port, the
shared structure will be updated to include the new socket and the
newly bound socket will reference the group structure.

Usually, when an incoming packet was destined for a reuseport group,
all sockets in the same group needed to be considered before a
dispatching decision was made.  With this structure, an appropriate
socket can be found after looking up just one socket in the group.

This shared structure will also allow for more complicated decisions to
be made when selecting a socket (eg a BPF filter).

This work is based off a similar implementation written by
Ying Cai <ycai@google.com> for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:58 -05:00
David S. Miller ebb3cf41c1 Merge branch 'mlxsw-fixes'
Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of fixes from Ido.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel 6c72a3d0d3 mlxsw: spectrum: Change bridge port attributes only when bridged
Bridge port attributes are offloaded to hardware when invoked with SELF
flag set, but it really makes no sense to reflect them when port is not
bridged.

Allow a user to change these attribute only when port is bridged and
initialize them correctly when joining or leaving a bridge.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel 5a8f45258e mlxsw: spectrum: Set bridge status in appropriate functions
Set the bridge status of physical ports in the appropriate functions, to
be consistent with LAG join/leave and vPorts joining/leaving bridge.

Also, remove the error messages in these two functions, as we already
emit errors in both the single functions they call.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel 78124078c4 mlxsw: spectrum: Return NOTIFY_BAD on bridge failure
It is possible for us to fail when joining or leaving a bridge, so let
the user know about that by returning NOTIFY_BAD, as already done for
LAG join/leave and 802.1D bridges.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel 7b31abe70b mlxsw: spectrum: Initialize PVID only once
We set PVID to 1 in mlxsw_sp_port_vlan_init(), so we can remove this
statement.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
hayeswang 7ec2541aa7 r8152: add reset_resume function
When the reset_resume() is called, the flag of SELECTIVE_SUSPEND should be
cleared and reinitialize the device, whether the SELECTIVE_SUSPEND is set
or not. If reset_resume() is called, it means the power supply is cut or the
device is reset. That is, the device wouldn't be in runtime suspend state and
the reinitialization is necessary.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:02:08 -05:00
Julia Lawall 46f85a9215 chelsio: constify cphy_ops structures
The cphy_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 21:58:28 -05:00
Arnd Bergmann 46678612e1 fsl/fman: allow modular build
ARM allmodconfig fails because of the addition of the FMAN driver:

drivers/built-in.o: In function `dtsec_restart_autoneg':
binder.c:(.text+0x173328): undefined reference to `mdiobus_read'
binder.c:(.text+0x173348): undefined reference to `mdiobus_write'
drivers/built-in.o: In function `dtsec_config':
binder.c:(.text+0x173d24): undefined reference to `of_phy_find_device'
drivers/built-in.o: In function `init_phy':
binder.c:(.text+0x1763b0): undefined reference to `of_phy_connect'
drivers/built-in.o: In function `stop':
binder.c:(.text+0x176014): undefined reference to `phy_stop'
drivers/built-in.o: In function `start':
binder.c:(.text+0x176078): undefined reference to `phy_start'

The reason is that the driver uses PHYLIB, but that is a loadable
module here, and fman itself is built-in.

This patch makes it possible to configure fman as a module as well
so we don't change the status of PHYLIB in an allmodconfig kernel,
and it adds a 'select PHYLIB' statement to ensure that phylib is
always built-in when fman is.

The driver uses "builtin_platform_driver(fman_driver);", which means
it cannot be unloaded, but it's still possible to have it as a loadable
module that gets loaded once and never removed.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5adae51a64 ("fsl/fman: Add FMan MURAM support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 21:53:42 -05:00
Arnd Bergmann 0efeff2905 net: make ip6tunnel_xmit definition conditional
Moving the caller of iptunnel_xmit_stats causes a build error in
randconfig builds that disable CONFIG_INET:

In file included from ../net/xfrm/xfrm_input.c:17:0:
../include/net/ip6_tunnel.h: In function 'ip6tunnel_xmit':
../include/net/ip6_tunnel.h:93:2: error: implicit declaration of function 'iptunnel_xmit_stats' [-Werror=implicit-function-declaration]
  iptunnel_xmit_stats(dev, pkt_len);

The reason is that the iptunnel_xmit_stats definition is hidden
inside #ifdef CONFIG_INET but the caller is not. We can change
one or the other to fix it, and this patch adds a second #ifdef
around ip6tunnel_xmit() to avoid seeing the invalid call.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 039f50629b ("ip_tunnel: Move stats update to iptunnel_xmit()")
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 21:50:51 -05:00
David S. Miller 15ab90f400 NFC 4.5 pull request
This is the first NFC pull request for 4.5 and it brings:
 
 - A new driver for the STMicroelectronics ST95HF NFC chipset.
   The ST95HF is an NFC digital transceiver with an embedded analog
   front-end and as such relies on the Linux NFC digital
   implementation. This is the 3rd user of the NFC digital stack.
 
 - ACPI support for the ST st-nci and st21nfca drivers.
 
 - A small improvement for the nfcsim driver, as we can now tune
   the Rx delay through sysfs.
 
 - A bunch of minor cleanups and small fixes from Christophe Ricard,
   for a few drivers and the NFC core code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWgsv0AAoJEIqAPN1PVmxKvukP/3eJwA+chAUF89/fqwqFTaJN
 fffLoOxx2OBIbTXD2VV36yw4bAo9tbKDAYBZiot3Ig7Kg0SeCJ5oPA9xCVnWPrEY
 hxFAldvl+lWQs8nrUOgUZItiFBeUPdfW9YX/yKhUZVc+602nUG/e/+6x8B5MhIce
 SAfgCyd0c16DApltP7sw1muyZMvsO6Ow6dyNzDUVYZuabvEhe3SLSj9KFJi7Thsp
 h41Iv+bwPLhwF4RXGA6rei/gdEDSMRohprdj3uTDiTarGW+OpcAO0zWACS5m4eR0
 zF19+HjGPUk/LpFRaU31xX0ZQQjOTmmfsOt4FBb3P7oJx47egycsadLYPexeG7nj
 ruyS6ezlRX1I/tZsnyLNJK92mK5TXYLz2uJ8r2ii/BgPNE+AErB3zKCC+EjXzWhh
 AvClGu5b88WJLxoq3I3l5evPwGhebGZ8N/1uiFsHOxvzKVLgxwOmNLRGN4XXxB2i
 UbIHgBb6smsu/l+3q9R83kfoMaoWnr+OUIi2QPQVDt/K7t1LfsCuIhzcGSgo1VuW
 fGlA1iu+CNDknofeCl4JDo2UXAETO4gdKWw87GXeUcbbraLUczZeO7FFLZqxbMYc
 OCaPYshmVFeZRypYdRWDHw67ivj0/h+9iq4PP1XOROkRFH746dD/p4yamJwVi20B
 samZ8VPwzgH3/ohQJyX3
 =VFUH
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.5 pull request

This is the first NFC pull request for 4.5 and it brings:

- A new driver for the STMicroelectronics ST95HF NFC chipset.
  The ST95HF is an NFC digital transceiver with an embedded analog
  front-end and as such relies on the Linux NFC digital
  implementation. This is the 3rd user of the NFC digital stack.

- ACPI support for the ST st-nci and st21nfca drivers.

- A small improvement for the nfcsim driver, as we can now tune
  the Rx delay through sysfs.

- A bunch of minor cleanups and small fixes from Christophe Ricard,
  for a few drivers and the NFC core code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 21:48:15 -05:00
Florian Westphal 55285bf094 connector: bump skb->users before callback invocation
Dmitry reports memleak with syskaller program.
Problem is that connector bumps skb usecount but might not invoke callback.

So move skb_get to where we invoke the callback.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 21:46:45 -05:00
Eric Dumazet 197c949e77 udp: properly support MSG_PEEK with truncated buffers
Backport of this upstream commit into stable kernels :
89c22d8c3b ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 17:23:36 -05:00
Insu Yun 3934aa4c1f cxgb4: correctly handling failed allocation
Since t4_alloc_mem can be failed in memory pressure,
if not properly handled, NULL dereference could be happened.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 17:18:42 -05:00
Insu Yun b77357b692 qlcnic: correctly handle qlcnic_alloc_mbx_args
Since qlcnic_alloc_mbx_args can be failed,
return value should be checked.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 17:14:30 -05:00
David S. Miller 815bc580fe Merge branch 'r8169-hw-programming-typo-fixes'
Chunhao Lin says:

====================
Fix some typos in setting hardware parameter

The typos are in setting RTL8168DP, RTL8168EP and RTL8168H hardware parameters.
This series of patch fix these typos.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 16:50:50 -05:00
Chun-Hao Lin 1016a4a1f4 r8169:Correct the way of setting RTL8168DP ephy
The original way is wrong, it always writes ephy reg 0x03.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 16:50:49 -05:00
Chun-Hao Lin c832c35f5f r8169:Fix typo in setting RTL8168H PHY PFM mode.
The PHY PFM register is in PHY page 0x0a44 register 0x11, not 0x14.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 16:50:49 -05:00
Chun-Hao Lin 69f3dc379b r8169:Fix typo in setting RTL8168EP and RTL8168H D3cold PFM mode
The register for setting D3code PFM mode is  MISC_1, not DLLPR.

Signed-off-by: Chunhao Lin <hau@realtek.com>
Reviewed-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 16:50:48 -05:00
Guillaume Nault 98f40b3e22 l2tp: rely on ppp layer for skb scrubbing
Since 79c441ae50 ("ppp: implement x-netns support"), the PPP layer
calls skb_scrub_packet() whenever the skb is received on the PPP
device. Manually resetting packet meta-data in the L2TP layer is thus
redundant.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 16:45:24 -05:00